You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying out Open Voice (v1), and it mechanically worked, but the cloned voice is far from its reference speaker. Sometimes, I gave a male reference speaker mp3, and got back a female voice.
I run the code from "demo_part1.ipynb" and I only changed reference speaker's mp3.
I suspect the torch/embedding version is not compatible, and I am using:
(Speech2Rag) OpenVoice> pip show torch
Name: torch
Version: 2.1.2+cu121
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: C:\Users\Sean2092\miniconda3\Lib\site-packages
Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions
Required-by: pytorch-lightning, torchaudio, torchmetrics, torchvision
Could someone with success and experience help out? I am sure I got something, libs or settings, incorrect, but I cannot figure out what that might be. Pls help.
Thanks a lot,
Sean
The text was updated successfully, but these errors were encountered:
I got the following warnings, could any of those warnings make the clone similarity to drastically degrade ?
UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED
Dose source_se need to be from audio of the same person's voice as source audio to inference to get close or better clone quality?
I tried to use same (base-speaker) person's voice/mp3 for getting "source_se/tone color embedding" and "source audio to inference" , and a third male voice/mp3 as reference speaker. The resulting cloned audio, which sometime is female with a bit noise, is still far from the reference male audio. Very Bizarred !
so, to my conclusion from the experiment, the source_se and source audio to inference don't have to be from same person, or at least, it doesn't matter towards affecting/improving clone similarity.
Hi,
I am trying out Open Voice (v1), and it mechanically worked, but the cloned voice is far from its reference speaker. Sometimes, I gave a male reference speaker mp3, and got back a female voice.
I run the code from "demo_part1.ipynb" and I only changed reference speaker's mp3.
I suspect the torch/embedding version is not compatible, and I am using:
(Speech2Rag) OpenVoice> pip show torch
Name: torch
Version: 2.1.2+cu121
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: C:\Users\Sean2092\miniconda3\Lib\site-packages
Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions
Required-by: pytorch-lightning, torchaudio, torchmetrics, torchvision
Could someone with success and experience help out? I am sure I got something, libs or settings, incorrect, but I cannot figure out what that might be. Pls help.
Thanks a lot,
Sean
The text was updated successfully, but these errors were encountered: