WhisperSTTService creates TranscriptionFrame which is not compatible with LLM #458

cansik · 2024-09-13T15:00:33Z

I think I am facing the same issue as described in #197.

It seems that the WhisperSTTService only sends out TranscriptionFrame and not LLMMessagesFrame which would be needed for the LLM or LLMUserResponseAggregator. Is my assumption right?

What I did is to I created a custom FrameProcessor to convert TranscriptionFrame into LLMMessagesFrame. I have no idea if this is the right way to do it. Could you please give me feedback on how you actually should use Whisper and LocalTransport with LLM's?

class ConvertSTTToLLM(FrameProcessor):

    async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)

        if isinstance(frame, TranscriptionFrame):
            llm_frame = LLMMessagesFrame([
                {
                    "role": "user",
                    "content": frame.text,
                }
            ])
            await self.push_frame(llm_frame, direction)

        await self.push_frame(frame, direction)

And here the Pipeline:

stt = WhisperSTTService(aggregate_sentences=True)
llm = OLLamaLLMService(model="llama3.1")

stt_to_llm = ConvertSTTToLLM()

pipeline = Pipeline([
    tk_transport.input(),
    stt,
    stt_to_llm,
    tma_in,
    llm,
    tma_out,
    # and so on
])

The text was updated successfully, but these errors were encountered:

cansik · 2024-09-13T15:38:38Z

I guess I got what was happening: The WhisperSTTService is not sending UserStartedSpeakingFrame and UserStoppedSpeakingFrame. Only the VAD is sending these messages and as long as it is not activated, LLMUserResponseAggregator won't combine the messages to an LLM message.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WhisperSTTService creates TranscriptionFrame which is not compatible with LLM #458

WhisperSTTService creates TranscriptionFrame which is not compatible with LLM #458

cansik commented Sep 13, 2024 •

edited

Loading

cansik commented Sep 13, 2024

WhisperSTTService creates TranscriptionFrame which is not compatible with LLM #458

WhisperSTTService creates TranscriptionFrame which is not compatible with LLM #458

Comments

cansik commented Sep 13, 2024 • edited Loading

cansik commented Sep 13, 2024

cansik commented Sep 13, 2024 •

edited

Loading