[QUESTION] Need some more context on cached_response_duration_seconds #443

shivagupta-byte · 2024-06-21T09:50:12Z

Which feature your question relates to?
Can I get some more context what does cached_response_duration_seconds metric is calculating.
From what I understand, it is calculating the time, from the time it received request to the time it sent the response, incase of a cache it.
And if it is correct, the cached_response_duration_seconds I'm getting is way more that I would expect

Average time is sometimes going over 3s.

mga-chka · 2024-06-21T22:21:10Z

cached_response_duration_seconds is only the time it took to fetch the query response from the cache.
To get the end to end processing time you'd need the time it took:

to start processing the query
to fetch the response from the cache (or from clickhouse then put in the cache if it was not present in the cache)
to answer the query back

you should assess the performance of your redis to see if it's the bottleneck

shivagupta-byte · 2024-06-24T13:25:56Z

all the calls in redis are taking less than 50 us. redis does not seem to be the bottleneck

mga-chka · 2024-06-25T11:35:49Z

are you storing large results in redis (lmore than 1 MBytes )?
In some edge cases, the large results are first put into the disk of chproxy before being sent to the client, which might explain why the redis call are fast but the cached_response_duration_seconds is big. In this edge cases, it's the time to fetch multiple times from redis a small part of the result then put it on disk.

shivagupta-byte · 2024-06-25T16:55:56Z

yes some results are big, more than 1MB.
I checked the code and saw that we fetch in chunks of 2MB, and put the data in chunks of 2MB as well. Any particular reason for this design choice?
Can we increase it to let's say 10MB. it might solve the issue

mga-chka · 2024-06-25T17:26:27Z

Here is the rational of this choice:

If the remaining TTL in Redis is too small, we took the decision to first put the data into a file before streaming it to the client. It's to prevent the case where the client is too slow (because it's doing some business logic while consuming the stream or because its network connection is bad) and the object in Redis expire before the end of the stream, which would lead to an error from CH proxy. With this design choice, the client is guaranteed to have an answer from chproxy because if the transfer into the file fails, the retry mechanism will be triggered and the SQL query will be done in live from clickhouse then put into cache and given to the client.
In order to transfer the data from Redis to the file (or from Redis/[the file] to the client), we decided to do it in a streaming way (by bulks of 2MB max) to avoid getting out of memory errors if the cached results are too big to fit in the golang runtime. The bigger the size of the bulk the bigger the risk of OOM.

If you want, you can add a PR to allow the modification of the hardcoded 2MB bulk size using a configuration variable (that would be at 2MB by default).

shivagupta-byte · 2024-06-25T17:35:24Z

sure, let me raise that. this might fix the slowness we are facing

mga-chka · 2024-06-25T17:39:20Z

ok, if it helps here is an example of a previous PR that was aiming to adding a new configuration variable

mga-chka added the good first issue label Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Need some more context on cached_response_duration_seconds #443

[QUESTION] Need some more context on cached_response_duration_seconds #443

shivagupta-byte commented Jun 21, 2024 •

edited

Loading

mga-chka commented Jun 21, 2024

shivagupta-byte commented Jun 24, 2024

mga-chka commented Jun 25, 2024

shivagupta-byte commented Jun 25, 2024

mga-chka commented Jun 25, 2024

shivagupta-byte commented Jun 25, 2024

mga-chka commented Jun 25, 2024

[QUESTION] Need some more context on cached_response_duration_seconds #443

[QUESTION] Need some more context on cached_response_duration_seconds #443

Comments

shivagupta-byte commented Jun 21, 2024 • edited Loading

mga-chka commented Jun 21, 2024

shivagupta-byte commented Jun 24, 2024

mga-chka commented Jun 25, 2024

shivagupta-byte commented Jun 25, 2024

mga-chka commented Jun 25, 2024

shivagupta-byte commented Jun 25, 2024

mga-chka commented Jun 25, 2024

shivagupta-byte commented Jun 21, 2024 •

edited

Loading