Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Whisper for Chinese ASR in iOS may occasionally output illegal UTF-8 strings. #1197

Open
hasayakey opened this issue Jun 14, 2024 · 2 comments

Comments

@hasayakey
Copy link

Describe the bug
A clear and concise description of what the bug is.

I followed the document at https://github.com/microsoft/Olive/tree/main/examples/whisper using the following command to generate the Whisper model: python prepare_whisper_configs.py --model_name openai/whisper-tiny --no_audio_decoder --multilingual --enable_timestamps | olive run --config whisper_cpu_int8.json 2> /dev/null. Because using the CPUExecutionProvider on an iPhone causes the phone to overheat severely, I implemented the following strategy: I run an ORTSession every 2 seconds to get the transcribed text, and based on the timestamps in the returned text, I decide whether to discard the corresponding audio samples that have already been correctly transcribed. Most of the time, the text is output normally, but there are instances where the output of an illegal UTF8 string causes the onnxruntime-objc to crash.

crash stack microsoft/onnxruntime#21026

To Reproduce
Steps to reproduce the behavior.

Expected behavior
A clear and concise description of what you expected to happen.

Olive config
Add Olive configurations here.

Olive logs
Add logs here.

Other information

  • OS: iOS
  • Olive version: 0.7.0
  • ONNXRuntime package and version: onnxruntime-objc: 1.18.0

Additional context
Add any other context about the problem here.

@jambayk
Copy link
Contributor

jambayk commented Jun 27, 2024

Hi,

Thanks for creating the issue. Looks like you already opened a related issue in the onnxruntime repository which is a good place to ask since the model is generated using onnxruntime contrib operators. If the issue cannot be resolved from onnxruntime, the devs at https://github.com/microsoft/onnxruntime-extensions might have more insights since they created the post-processing parts of the model.

@RageAgainstTheAssembly
Copy link

Hello,
I have encountered a similar issue while trying to use Olive Whisper to transcribe in Tajik Language. The resulting model from Olive performs far worse than a basic ONNX model and suffers from severe hallucinations. The Olive model also occasionally produces illegal UTF-8 strings, as you have mentioned. I have been unable to find an explanation or a fix for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants