Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return correct dataschema for empty results #13831

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vvivekiyer
Copy link
Contributor

When all segments are pruned on the server, we construct a empty responseBlock . As this response block is constructed without access to segment data, we default to STRING as the datatype for the column - here is the code link . This problem does not exist for pure Aggregation queries.

This PR fixes that by just processing one segment.

Added tests to verify the code changes.

@Jackie-Jiang
Copy link
Contributor

#13057 (all segments pruned on broker) is similar to the problem solved by this PR. We should be able to construct the empty response based on the schema. Do you want to also help fix that?

@vvivekiyer
Copy link
Contributor Author

@Jackie-Jiang Sure, I can address it.

We have two options to solve it:

  1. If all segments are pruned on broker, send the query to 1 server with 1 segment ( with optimization to not send for aggregation only queries) . The code change already made in this PR will help return the correct schema. This will automatically handle deriving schema types for group-by, select, and transform result types. This will add some overhead to these empty queries but should be negligle.

  2. Construct an empty result-table and derive the data-types for the query using schema. For transform types, would need to look into how to get the type.

I'm thinking of going with option (1). Thoughts?

Additionally a followup could be to add a broker-side LIMIT_0_PRUNER that will help short-circuit faster instead of routing queries to all servers.

@Jackie-Jiang
Copy link
Contributor

1 is not always possible in the following cases:

  • There is no segment at all
  • There are only empty segments

I prefer 2 because we should be able to derive the data type for transform based on input types in order to be SQL compatible. If we cannot get that right now, we can leave a TODO and put a type as placeholder.

@jadami10
Copy link
Contributor

+1 to jackie's idea of leaving the column types as TODOs.

We've noticed internally that the bigger issue is we're strictly missing the result table. We don't even use the column types from Pinot since the code usually 1) assumes the types ahead of time or 2) infers it from the json result. But the column names and rows are typically assumed to be there.

@vvivekiyer
Copy link
Contributor Author

I'm working on the broker side changes as well. I'll create that as a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants