You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Okay, so I've been testing out the demo colab notebook and tried synthesizing a few characters, but it seems like it's having a hard time preserving the speaker identity. The result audio doesn't sound like my reference audio at all.
The text was updated successfully, but these errors were encountered:
The pre-trained model is trained on VCTK dataset. It is not large enough and may not works well on data in the wild. I am working on improving the generalization of the model by modifying the network structure. You can fine-tune or train the model by yourself for better results.
@adelacvg, do you have any thoughts on using Encodec's features rather than Mel-Specs and then using Vocos to convert that into Wavs? May be that leads to better generalization.
Okay, so I've been testing out the demo colab notebook and tried synthesizing a few characters, but it seems like it's having a hard time preserving the speaker identity. The result audio doesn't sound like my reference audio at all.
The text was updated successfully, but these errors were encountered: