-
Notifications
You must be signed in to change notification settings - Fork 150
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Idea] Basic timestamp validation #82
Comments
In the meanwhile, could you describe what you actually do to get better results right now? |
Sure. I guess just releasing the code would be easier, but it's such an abomination that I won't pester the world with it :D Here's the process briefly: After all this, I still need to do some manual editing. Usually it's something very light though, like some word is missing or misspelt etc. There's certainly room for improvement. For example, I'm not using any initial prompt, but I think that could get rid of some spelling mistakes. I also haven't done anything with the confidence scores (or anything else having to do with json) yet. |
Thanks a lot @misutoneko for opening this issue. |
The recent heuristics updates seem to have made a difference -- seems much better now, thanks 👍 About the example I gave in my first post, I actually found an exception to this... EDIT: I've noticed that sometimes, the duration can be over 30 seconds for a single word. So it's sometimes really obvious. |
This comment was marked as abuse.
This comment was marked as abuse.
Thanks, nice job. The fact that it needs to be a gated model is a bit lamentable though, as it will most likely hinder adoption. |
@LaurinmyReha Great approach, but I missed how this is different than the original Whisper? Meaning, what was done to improve on that? Fine tuned on which dataset? Or is it something different that was done? Can you enlighten me? |
I'm using whisper-timestamped with a set of somewhat extensive hodgepodge of preprocessing and postprocessing scripts.
I got to thinking that some of the anomalities these scripts handle could perhaps be alleviated in whisper-timestamped itself.
Actually it would be best to have no need for pre/postprocessing at all, but I'm not sure if that's realistic.
(Well, with better models, maybe...)
So, here's one example:
In .words.srt (or .words.json) there are sometimes instances where an utterance of a single word takes almost two seconds(!).
That is imo quite obviously wrong, and so the postprocessing stage will split the file in half and reprocesses both parts. Yeah a bit crude approach perhaps, but it works well enough for me.
So that's just one perhaps the most obvious example, I have more of these corner cases if you're interested :D
(make a separate issue of each one?)
You could of course do some postprocessing in whisper-timestamped too, similar to what I now do with scripts. But maybe there are better ways to deal with these. Ofc there's always the alternative to just wait for better models that take care of petty issues like this :D
The text was updated successfully, but these errors were encountered: