Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why my 16000 speech instance get poor performance? #7

Open
JohnHerry opened this issue Nov 30, 2022 · 0 comments
Open

Why my 16000 speech instance get poor performance? #7

JohnHerry opened this issue Nov 30, 2022 · 0 comments

Comments

@JohnHerry
Copy link

JohnHerry commented Nov 30, 2022

Thanks for the good job.
We are finding a vocoder for TTS, we tried the SawSingSub, our audio sample rate is 16000, to cooperate with our acounstic model, I had changed the params in preprocess.py, set hop_length=200, win_length=800, n_mel_channels=80, I also changed the , sawsingsub.yaml, accordingly , set block_size = 200, left other settings unchanged.

I have trained the model for nearly 1 million steps, but the quality of generated waveform is far worse then that of HiFi-GAN with the same steps, Is there any parameter adjustments I had missed?

The Blured Spectrogram of SawSingSub generated speech
ad39608a8d826cd3684795591278932

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant