Diffusion-Based Any-to-Any Voice Conversion

Introduction

This repository is a derivative of the Official implementation of the paper "Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme" Link. It builds upon their work and incorporates additional features and modifications specific to this project.
The Official Demo Page.

Pre-trained models

Please check inference.ipynb for the detailed instructions.
The pre-trained speaker encoder we use is available at https://drive.google.com/file/d/1Y8IO2_OqeT85P1kks9I9eeAq--S65YFb/view?usp=sharing Please put it to /checkpts/spk_encoder/
The pre-trained universal HiFi-GAN vocoder we use is available at https://drive.google.com/file/d/10khlrM645pTbQ4rc2aNEYPba8RFDBkW-/view?usp=sharing. It is taken from the official HiFi-GAN repository. Please put it to /checkpts/vocoder/
You have to download voice conversion model trained on LibriTTS from here: https://drive.google.com/file/d/18Xbme0CTVo58p2vOHoTQm8PBGW7oEjAy/view?usp=sharing
Additionally, we provide voice conversion model trained on VCTK: https://drive.google.com/file/d/12s9RPmwp9suleMkBCVetD8pub7wsDAy4/view?usp=sharing . Please put models to /checkpts/vc/

Build docker environment

To build image, run:

Docker build -t diffvc .

To run a container for develop, run:

bash run-container.sh

Training your own model

To train model on your data, first create a data directory with three folders: "wavs", "mels" and "embeds". Put raw audio files sampled at 22.05kHz to "wavs" directory. The functions for calculating mel-spectrograms and extracting 256-dimensional speaker embeddings with the pre-trained speaker verification network located at checkpts/spk_encoder/ can be found at inference.ipynb notebook (get_mel and get_embed correspondingly). Please put these data to "mels" and "embeds" folders respectively. Note that all the folders in your data directory should have subfolders corresponding to particular speakers and containing data only for corresponding speakers.
If you want to train the encoder, create "logs_enc" directory and run train_enc.py. Before that, you have to prepare another folder "mels_mode" with mel-spectrograms of the "average voice" (i.e. target mels for the encoder) in the data directory. To obtain them, you have to run Montreal Forced Aligner on the input mels, get .TextGrid files and put them to "textgrids" folder in the data directory. Once you have "mels" and "textgrids" folders, run get_avg_mels.ipynb. python3 -m scenario.train_enc
Alternatively, you may load the encoder trained on LibriTTS from https://drive.google.com/file/d/1JdoC5hh7k6Nz_oTcumH0nXNEib-GDbSq/view?usp=sharing and put it to "logs_enc" directory.
Once you have the encoder enc.pt in "logs_enc" directory, create "logs_dec" directory and run train_dec.py to train the diffusion-based decoder. python3 -m scenario.train_dec
Please check params.py for the most important hyperparameters.

Demo

To launch gradio demo app, run:

python3 app_gradio.py

Serve model (developing)

Convert model from .pt to .onnx

python3 -m export_onnx.export_hifigan

python3 -m export_onnx.export_spk_enc

Deploy pipeline using Triton Inference Server:

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
checkpts		checkpts
deploy		deploy
example		example
export_onnx		export_onnx
filelists		filelists
hifi-gan		hifi-gan
model		model
scenario		scenario
speaker_encoder		speaker_encoder
.dockerignore		.dockerignore
.gitignore		.gitignore
Docker-compose.yml		Docker-compose.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
THIRD_PARTY_NOTICE		THIRD_PARTY_NOTICE
api.py		api.py
app_gradio.py		app_gradio.py
get_avg_mels.ipynb		get_avg_mels.ipynb
inference.py		inference.py
inference_pipeline.ipynb		inference_pipeline.ipynb
params.py		params.py
requirements.txt		requirements.txt
run-container.sh		run-container.sh
utils.py		utils.py
var.env		var.env

License

trinhtuanvubk/Diff-VC

Folders and files

Latest commit

History

Repository files navigation

Diffusion-Based Any-to-Any Voice Conversion

Introduction

Pre-trained models

Build docker environment

Training your own model

Demo

Serve model (developing)

About

Topics

Resources

License

Stars

Watchers

Forks

Languages