Return correct dataschema for empty results #13831

vvivekiyer · 2024-08-17T01:02:21Z

When all segments are pruned on the server, we construct a empty responseBlock . As this response block is constructed without access to segment data, we default to STRING as the datatype for the column - here is the code link . This problem does not exist for pure Aggregation queries.

This PR fixes that by just processing one segment.

Added tests to verify the code changes.

Jackie-Jiang · 2024-08-17T06:56:08Z

#13057 (all segments pruned on broker) is similar to the problem solved by this PR. We should be able to construct the empty response based on the schema. Do you want to also help fix that?

vvivekiyer · 2024-08-17T19:23:04Z

@Jackie-Jiang Sure, I can address it.

We have two options to solve it:

If all segments are pruned on broker, send the query to 1 server with 1 segment ( with optimization to not send for aggregation only queries) . The code change already made in this PR will help return the correct schema. This will automatically handle deriving schema types for group-by, select, and transform result types. This will add some overhead to these empty queries but should be negligle.
Construct an empty result-table and derive the data-types for the query using schema. For transform types, would need to look into how to get the type.

I'm thinking of going with option (1). Thoughts?

Additionally a followup could be to add a broker-side LIMIT_0_PRUNER that will help short-circuit faster instead of routing queries to all servers.

Jackie-Jiang · 2024-08-18T19:35:37Z

1 is not always possible in the following cases:

There is no segment at all
There are only empty segments

I prefer 2 because we should be able to derive the data type for transform based on input types in order to be SQL compatible. If we cannot get that right now, we can leave a TODO and put a type as placeholder.

jadami10 · 2024-08-18T20:39:20Z

+1 to jackie's idea of leaving the column types as TODOs.

We've noticed internally that the bigger issue is we're strictly missing the result table. We don't even use the column types from Pinot since the code usually 1) assumes the types ahead of time or 2) infers it from the json result. But the column names and rows are typically assumed to be there.

vvivekiyer · 2024-09-06T23:16:56Z

I'm working on the broker side changes as well. I'll create that as a separate PR.

Return correct dataschema for empty results

514c218

vvivekiyer force-pushed the fix_empty_results_schema branch from 377ea73 to 514c218 Compare August 17, 2024 01:05

Jackie-Jiang added the bugfix label Aug 17, 2024

Udit107710 mentioned this pull request Aug 18, 2024

Queries completely pruned by the broker should still return a result table #13057

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return correct dataschema for empty results #13831

Return correct dataschema for empty results #13831

vvivekiyer commented Aug 17, 2024

Jackie-Jiang commented Aug 17, 2024

vvivekiyer commented Aug 17, 2024

Jackie-Jiang commented Aug 18, 2024

jadami10 commented Aug 18, 2024

vvivekiyer commented Sep 6, 2024

Return correct dataschema for empty results #13831

Are you sure you want to change the base?

Return correct dataschema for empty results #13831

Conversation

vvivekiyer commented Aug 17, 2024

Jackie-Jiang commented Aug 17, 2024

vvivekiyer commented Aug 17, 2024

Jackie-Jiang commented Aug 18, 2024

jadami10 commented Aug 18, 2024

vvivekiyer commented Sep 6, 2024