Size of message problems with hopskotch? #87

habig · 2023-12-15T17:17:57Z

When trying to push through the timing tier messages, we got some kafka warnings suggesting that we were trying to jam through too much.

We need to replicate that with a well-specified test case, collect the exact error message, and take it to SCIMMA to see what they think is happening.

justinvasel · 2024-02-02T20:32:44Z

It would be good to know which error was thrown. Does anyone have a log file?

I suspect this is unrelated, but just in case, scimma included a bug fix in hop-client v0.9.0 (snews_pt uses v0.8.0) for when hopskotch expects JSON data but receives a binary data instead and chokes.
scimma/hop-client#198

justinvasel · 2024-05-10T18:29:54Z

I have an idea for a solution.

According to Kafka documentation, there is a maximum message size of 1,048,588 bytes (~1 MB), but it is configurable.

For the example that @KaraMelih mentioned in PR #98 of 100,000 integers, the size of those integers are 800,056 bytes, which is less that the threshold above, but we're sending multiple SNEWS messages in one go, so it quickly adds up.

There are multiple paths in between SNEWSMessageBuilder and kafka, which looks like this:

snews_pt.messages.SNEWSMessageBuilder.send_messages
└── snews_pt.messages.Publisher.send
    └── hop.io.Stream.open
        └── hop.io.Stream.Producer
            └── adc.producer.ProducerConfig

That last class—adc.proder.ProducerConfig—takes message_max_bytes as a keyword argument, and the methods in the aforementioned chain starting at hop.io.Stream.open pass **kwargs the rest of the way down.

So... We should be able to increase the maximum message size by passing the kwarg into here:

SNEWS_Publishing_Tools/snews_pt/messages.py

Lines 51 to 53 in f9b990a

    
           def __enter__(self): 
        
               self.stream = Stream(until_eos=True, auth=self.auth).open(self.obs_broker, 'w') 
        
               return self

This would be the appropriate change (for 5x max message size):

self.stream = Stream(until_eos=True, auth=self.auth).open(
    self.obs_broker, 
    'w', 
    message_max_bytes=5242940  # <--- New
)

I have not tested this.

habig · 2024-05-13T17:24:31Z

The scimma people report that the default max message size is 1MB. They're leery to increase that to head off future scaling problems, and are working on a large file offload service. But: now knowing this we can plan for it, with efficiencies like this PR helping. Could imagine daisy-chaining things together like SMS messages that go over 140 characters, too.

justinvasel · 2024-05-13T18:41:28Z

@habig: and if that default is something we can override, like in the example above, then problem solved! No need for them to change the default on their end.

habig · 2024-08-14T15:29:07Z

We were just discussing this now.

If raising the infrastructure limit is the answer (we need to test this), then can we raise it enough to solve the problem? If not, then we'd need to "packetize" things anyway. If we have to write that code, then living within our 1MB per packet means is probably the way to go.

Marta says about 1s of data in the last test fit in the 1MB. So we'd need an order of magnitude more overhead to make ~10s of light curve go in one shot.

KaraMelih · 2024-08-14T19:16:08Z

We should in any case, warn the user about the large message size and what we do to process. Something like; "Input data is larger than >1mb, we will chunk it into pieces and send several messages" etc. So that if something goes wrong, it is more obvious where it might have gone wrong.

Alternative to chunking, @justinvasel looked into compression options. We should probably check if kafka offers any of these internally. Then, we also need to validate whatever chunked/compressed gets unchunked/decompressed properly on the other end.

justinvasel added bug Something isn't working help wanted Extra attention is needed labels Feb 2, 2024

KaraMelih mentioned this issue Apr 8, 2024

Float timeseries #98

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Size of message problems with hopskotch? #87

Size of message problems with hopskotch? #87

habig commented Dec 15, 2023

justinvasel commented Feb 2, 2024

justinvasel commented May 10, 2024 •

edited

Loading

habig commented May 13, 2024

justinvasel commented May 13, 2024

habig commented Aug 14, 2024

KaraMelih commented Aug 14, 2024

Size of message problems with hopskotch? #87

Size of message problems with hopskotch? #87

Comments

habig commented Dec 15, 2023

justinvasel commented Feb 2, 2024

justinvasel commented May 10, 2024 • edited Loading

habig commented May 13, 2024

justinvasel commented May 13, 2024

habig commented Aug 14, 2024

KaraMelih commented Aug 14, 2024

justinvasel commented May 10, 2024 •

edited

Loading