Training with disfluencies in speech #1701

duhtapioca · 2024-07-24T10:55:24Z

Hi

We're looking to finetune a zipformer streaming model on our custom dataset of around 100 hours that we are about to get manually annotated. The speech in that dataset may contain disfluencies. So, in this case, is it better to create the annotations with disfluencies or should we opt to ignore them in the transcripts?

From the CSJ experiments in #892, we infer that the model trained and tested on fluent transcripts is performing slightly better. Is this inference correct? In the case of zipformer, are we to expect similar results or is training with disfluent transcriptions worth a shot? If yes what would be the ideal format for annotating disfluent speech for zipformer?

Any advice on this would be of great help.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training with disfluencies in speech #1701

Training with disfluencies in speech #1701

duhtapioca commented Jul 24, 2024 •

edited

Loading

Training with disfluencies in speech #1701

Training with disfluencies in speech #1701

Comments

duhtapioca commented Jul 24, 2024 • edited Loading

duhtapioca commented Jul 24, 2024 •

edited

Loading