LLMResponseAggregator behaviour when Start, End, Partial and Transcription do not follow an ideal flow #441

EdgarMMProsper · 2024-09-04T07:13:12Z

At

    class LLMResponseAggregator(FrameProcessor):

    async def process_frame(self, frame: Frame, direction: FrameDirection):

I have detected two problems. The reason for them might be that the same logic is used both for LLMAssistantContextAggregator, where frames are received pretty well ordered as for LLMUserResponseAggregator, where the interaction between the VAD and the STT makes things messier.

Problem one:

If we receive S T1 I E S T2. The state after each frame is:

S:

self._aggregation = ""
self._aggregating = True
self._seen_start_frame = True
self._seen_end_frame = False
self._seen_interim_results = False

T1: send_aggregation = False so the state is:

self._aggregation = T1
self._aggregating = True
self._seen_start_frame = True
self._seen_end_frame = False
self._seen_interim_results = False

I: (interim of T2)

self._aggregation = T1
self._aggregating = True
self._seen_start_frame = True
self._seen_end_frame = False
self._seen_interim_results = True

E: since _seen_interim_results=True, self._aggregating=True and send_aggregation=False, so

self._aggregation = T1
self._aggregating = True
self._seen_start_frame = False
self._seen_end_frame = True
self._seen_interim_results = True

S:

self._aggregation = "" # Lost T1!!!
self._aggregating = True
self._seen_start_frame = True
self._seen_end_frame = False
self._seen_interim_results = True

T2:

self._aggregation = T1
self._aggregating = True
self._seen_start_frame = True
self._seen_end_frame = False
self._seen_interim_results = False

Proposed solution:

if isinstance(frame, self._start_frame):
            self._aggregation = ""

We do not thing that self._aggregation = "" is sensible here. Every time there is a self.pushAggregation there is a reset. If we really want the self.aggregation **to be empty we can do a self.pushaggregation before.

Second problem:

This one is by design, but for my tests impacts the interaction between the user and bot.

It is about the not supported case:

# The following case would not be supported:
#
#    S I E T1 I T2 -> X
#
# and T2 would be dropped.

I think that in:

elif isinstance(frame, self._accumulator_frame):
            if self._aggregating:
                self._aggregation += f" {frame.text}"
                # We have recevied a complete sentence, so if we have seen the
                # end frame and we were still aggregating, it means we should
                # send the aggregation.
                send_aggregation = self._seen_end_frame

            # We just got our final result, so let's reset interim results.
            self._seen_interim_results = False

the if self.aggregating is not needed, so the case above would be supported.

EdgarMMProsper mentioned this issue Sep 10, 2024

LLM user frame processor with tests #450

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLMResponseAggregator behaviour when Start, End, Partial and Transcription do not follow an ideal flow #441

LLMResponseAggregator behaviour when Start, End, Partial and Transcription do not follow an ideal flow #441

EdgarMMProsper commented Sep 4, 2024 •

edited

Loading

LLMResponseAggregator behaviour when Start, End, Partial and Transcription do not follow an ideal flow #441

LLMResponseAggregator behaviour when Start, End, Partial and Transcription do not follow an ideal flow #441

Comments

EdgarMMProsper commented Sep 4, 2024 • edited Loading

Problem one:

Second problem:

EdgarMMProsper commented Sep 4, 2024 •

edited

Loading