Skip to content

Yangyangii/DeepConvolutionalTTS-pytorch

Repository files navigation

DCTTS (Deep Convolutional TTS) - pytorch implementation

Prerequisite

  • python 3.6
  • pytorch 1.0
  • librosa, scipy, tqdm, tensorboardX

Dataset

Usage

  1. Download the above dataset and modify the path in config.py. And then run the below command. 1st arg: signal prepro, 2nd arg: metadata (train/test split)

    python prepro.py 1 1
    
  2. DCTTS has two models. Firstly, you should train the model Text2Mel. I think that 20k step is enough (for only an hour). But you should train the model more and more with decaying guided attention loss.

    python train.py 1 <gpu_id>
    
  3. Secondly, train the SSRN. The outputs of SSRN are many high resolution data. So training SSRN is slower than training Text2Mel

    python train.py 2 <gpu_id>
    
  4. After training, you can synthesize some speech from text.

    python synthesize.py <gpu_id>
    

Attention

  • In speech synthesis, the attention module is important. If the model is normally trained, then you can see the monotonic attention like the follow figures.

Notes

  • To do: previous attention for inference.
  • To do: Alleviate the overfitting.
  • In the paper, they did not refer normalization. So I used weight normalization like DeepVoice3.
  • Some hyperparameters are different.
  • If you want to improve the performance, you should use all of the data. For some various experiments, I seperated the training set and the validation set.

Other Codes

About

Deep Convolutional TTS pytorch implementation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages