Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WhisperSTTService creates TranscriptionFrame which is not compatible with LLM #458

Open
cansik opened this issue Sep 13, 2024 · 1 comment

Comments

@cansik
Copy link

cansik commented Sep 13, 2024

I think I am facing the same issue as described in #197.

It seems that the WhisperSTTService only sends out TranscriptionFrame and not LLMMessagesFrame which would be needed for the LLM or LLMUserResponseAggregator. Is my assumption right?

What I did is to I created a custom FrameProcessor to convert TranscriptionFrame into LLMMessagesFrame. I have no idea if this is the right way to do it. Could you please give me feedback on how you actually should use Whisper and LocalTransport with LLM's?

class ConvertSTTToLLM(FrameProcessor):

    async def process_frame(self, frame: Frame, direction: FrameDirection):
        await super().process_frame(frame, direction)

        if isinstance(frame, TranscriptionFrame):
            llm_frame = LLMMessagesFrame([
                {
                    "role": "user",
                    "content": frame.text,
                }
            ])
            await self.push_frame(llm_frame, direction)

        await self.push_frame(frame, direction)

And here the Pipeline:

stt = WhisperSTTService(aggregate_sentences=True)
llm = OLLamaLLMService(model="llama3.1")

stt_to_llm = ConvertSTTToLLM()

pipeline = Pipeline([
    tk_transport.input(),
    stt,
    stt_to_llm,
    tma_in,
    llm,
    tma_out,
    # and so on
])
@cansik
Copy link
Author

cansik commented Sep 13, 2024

I guess I got what was happening: The WhisperSTTService is not sending UserStartedSpeakingFrame and UserStoppedSpeakingFrame. Only the VAD is sending these messages and as long as it is not activated, LLMUserResponseAggregator won't combine the messages to an LLM message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant