Whisper-large-v3 transcript is trimmed #1972

yv0vaa · 2024-07-25T12:04:18Z

System Info

optimum 1.21.2
Ubuntu 22.04.4 LTS
CUDA 12.3
cuda-toolkit 11.7
onnxruntime 1.18.1

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

import os
from transformers import WhisperForConditionalGeneration, WhisperProcessor, PretrainedConfig
import torch
import torchaudio
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq
            

model_name = 'openai/whisper-large-v3'
model_path = 'whisper-large-v3'

processor = WhisperProcessor.from_pretrained(model_name)
model = WhisperForConditionalGeneration.from_pretrained(model_name)
device = "cuda:0" if torch.cuda.is_available() else "cpu"


model_config = PretrainedConfig.from_pretrained(model_name)
sessions = ORTModelForSpeechSeq2Seq.load_model(
    os.path.join(model_path, 'encoder_model.onnx'),
    os.path.join(model_path, 'decoder_model.onnx'),
)
model = ORTModelForSpeechSeq2Seq(
    sessions[0], 
    sessions[1], 
    model_config, 
    model_path, 
    use_cache=False,
).to(device)

audio, sr = torchaudio.load("example.ogg")
audio = torchaudio.functional.resample(audio[0], sr, 16000)
input_features = processor(audio.cpu(), return_tensors="pt", sampling_rate=16000, max_new_tokens=1000).input_features.to(device)
predicted_ids = model.generate(input_features)[0]
transcription = processor.decode(predicted_ids)
print(transcription)

Expected behavior

For some reason a final transcript is incomplete and is trimmed in the middle of the speech.
I've tried to change max_tokens and max_new_tokens parameter, but nothing has changed.
Also I didn't understand how to pass compute type and batch size as parameters.
PretrainedConfig and GenerationConfig don't have such parameters. Could anyone help me?

The text was updated successfully, but these errors were encountered:

IlyasMoutawwakil · 2024-07-30T09:08:30Z

hey @yv0vaa would you have the time to try out the branch in #1971 and see if it fixes your issues ?

yv0vaa · 2024-07-30T12:11:56Z

Good afternoon @IlyasMoutawwakil, thanks, but unfortunately it didn't help.

IlyasMoutawwakil · 2024-07-30T14:45:47Z

oh.. I just noticed that you're passing max_new_tokens to the processor and not generate.
Is the behavior different than that of transformers ?

yv0vaa · 2024-07-31T08:05:00Z

Maybe I'm doing something wrong, but nothing changes. Variation of max_new_tokens in both processor.__call__ and model.generate does not affect the behavior of the model

yv0vaa added the bug Something isn't working label Jul 25, 2024

IlyasMoutawwakil mentioned this issue Jul 29, 2024

Support transformers 4.43 #1971

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper-large-v3 transcript is trimmed #1972

Whisper-large-v3 transcript is trimmed #1972

yv0vaa commented Jul 25, 2024

IlyasMoutawwakil commented Jul 30, 2024

yv0vaa commented Jul 30, 2024 •

edited

Loading

IlyasMoutawwakil commented Jul 30, 2024 •

edited

Loading

yv0vaa commented Jul 31, 2024

Whisper-large-v3 transcript is trimmed #1972

Whisper-large-v3 transcript is trimmed #1972

Comments

yv0vaa commented Jul 25, 2024

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

IlyasMoutawwakil commented Jul 30, 2024

yv0vaa commented Jul 30, 2024 • edited Loading

IlyasMoutawwakil commented Jul 30, 2024 • edited Loading

yv0vaa commented Jul 31, 2024

yv0vaa commented Jul 30, 2024 •

edited

Loading

IlyasMoutawwakil commented Jul 30, 2024 •

edited

Loading