Issues with preserving the speaker identity #16

justinjohn0306 · 2023-08-03T07:22:57Z

Okay, so I've been testing out the demo colab notebook and tried synthesizing a few characters, but it seems like it's having a hard time preserving the speaker identity. The result audio doesn't sound like my reference audio at all.

adelacvg · 2023-08-03T08:30:28Z

The pre-trained model is trained on VCTK dataset. It is not large enough and may not works well on data in the wild. I am working on improving the generalization of the model by modifying the network structure. You can fine-tune or train the model by yourself for better results.

justinjohn0306 · 2023-08-03T10:25:34Z

alright, gotcha :)

rishikksh20 · 2024-04-30T08:08:56Z

@adelacvg, do you have any thoughts on using Encodec's features rather than Mel-Specs and then using Vocos to convert that into Wavs? May be that leads to better generalization.

rishikksh20 mentioned this issue Apr 30, 2024

branch in V4 version train it's working ? #33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with preserving the speaker identity #16

Issues with preserving the speaker identity #16

justinjohn0306 commented Aug 3, 2023

adelacvg commented Aug 3, 2023

justinjohn0306 commented Aug 3, 2023

rishikksh20 commented Apr 30, 2024

Issues with preserving the speaker identity #16

Issues with preserving the speaker identity #16

Comments

justinjohn0306 commented Aug 3, 2023

adelacvg commented Aug 3, 2023

justinjohn0306 commented Aug 3, 2023

rishikksh20 commented Apr 30, 2024