DCTTS (Deep Convolutional TTS) - pytorch implementation

Prerequisite

LJ Speech 1.1, female single speaker dataset.
I follow Kyubyong's DCTTS repo with TensorFlow for preprocessing speech signal data. It actually worked well.

Download the above dataset and modify the path in config.py. And then run the below command. 1st arg: signal prepro, 2nd arg: metadata (train/test split)
```
python prepro.py 1 1
```
DCTTS has two models. Firstly, you should train the model Text2Mel. I think that 20k step is enough (for only an hour). But you should train the model more and more with decaying guided attention loss.
```
python train.py 1 <gpu_id>
```
Secondly, train the SSRN. The outputs of SSRN are many high resolution data. So training SSRN is slower than training Text2Mel
```
python train.py 2 <gpu_id>
```
After training, you can synthesize some speech from text.
```
python synthesize.py <gpu_id>
```

In speech synthesis, the attention module is important. If the model is normally trained, then you can see the monotonic attention like the follow figures.

To do: previous attention for inference.
To do: Alleviate the overfitting.
In the paper, they did not refer normalization. So I used weight normalization like DeepVoice3.
Some hyperparameters are different.
If you want to improve the performance, you should use all of the data. For some various experiments, I seperated the training set and the validation set.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md
config.py		config.py
data.py		data.py
lj_eval_idx.npy		lj_eval_idx.npy
model.py		model.py
module.py		module.py
network.py		network.py
prepro.py		prepro.py
synthesize.py		synthesize.py
test_sents.txt		test_sents.txt
train.py		train.py
utils.py		utils.py