Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLMResponseAggregator behaviour when Start, End, Partial and Transcription do not follow an ideal flow #441

Open
EdgarMMProsper opened this issue Sep 4, 2024 · 0 comments

Comments

@EdgarMMProsper
Copy link

EdgarMMProsper commented Sep 4, 2024

At

    class LLMResponseAggregator(FrameProcessor):

    async def process_frame(self, frame: Frame, direction: FrameDirection):

I have detected two problems. The reason for them might be that the same logic is used both for LLMAssistantContextAggregator, where frames are received pretty well ordered as for LLMUserResponseAggregator, where the interaction between the VAD and the STT makes things messier.

Problem one:

If we receive S T1 I E S T2. The state after each frame is:

S:

self._aggregation = ""
self._aggregating = True
self._seen_start_frame = True
self._seen_end_frame = False
self._seen_interim_results = False

T1: send_aggregation = False so the state is:

self._aggregation = T1
self._aggregating = True
self._seen_start_frame = True
self._seen_end_frame = False
self._seen_interim_results = False

I: (interim of T2)

self._aggregation = T1
self._aggregating = True
self._seen_start_frame = True
self._seen_end_frame = False
self._seen_interim_results = True

E: since _seen_interim_results=True, self._aggregating=True and send_aggregation=False, so

self._aggregation = T1
self._aggregating = True
self._seen_start_frame = False
self._seen_end_frame = True
self._seen_interim_results = True

S:

self._aggregation = "" # Lost T1!!!
self._aggregating = True
self._seen_start_frame = True
self._seen_end_frame = False
self._seen_interim_results = True

T2:

self._aggregation = T1
self._aggregating = True
self._seen_start_frame = True
self._seen_end_frame = False
self._seen_interim_results = False

Proposed solution:

if isinstance(frame, self._start_frame):
            self._aggregation = ""

We do not thing that self._aggregation = "" is sensible here. Every time there is a self.pushAggregation there is a reset. If we really want the self.aggregation **to be empty we can do a self.pushaggregation before.

Second problem:

This one is by design, but for my tests impacts the interaction between the user and bot.

It is about the not supported case:

# The following case would not be supported:
#
#    S I E T1 I T2 -> X
#
# and T2 would be dropped.

I think that in:

elif isinstance(frame, self._accumulator_frame):
            if self._aggregating:
                self._aggregation += f" {frame.text}"
                # We have recevied a complete sentence, so if we have seen the
                # end frame and we were still aggregating, it means we should
                # send the aggregation.
                send_aggregation = self._seen_end_frame

            # We just got our final result, so let's reset interim results.
            self._seen_interim_results = False

the if self.aggregating is not needed, so the case above would be supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant