[Multi-stage] Support lookup join #13966

Jackie-Jiang · 2024-09-09T22:24:03Z

Add lookup join strategy as a hint (e.g. /*+ joinOptions(join_strategy='lookup') */)
Right table should be leaf stage projection only
TODO: Improve planner rules to not push down expression/filter into the dimension table side

codecov-commenter · 2024-09-09T23:37:09Z

Codecov Report

Attention: Patch coverage is 39.85507% with 166 lines in your changes missing coverage. Please review.

Project coverage is 57.84%. Comparing base (59551e4) to head (b9b7561).
Report is 1029 commits behind head on master.

Files with missing lines	Patch %	Lines
...not/query/runtime/operator/LookupJoinOperator.java	0.00%	102 Missing ⚠️
.../query/planner/logical/RelToPlanNodeConverter.java	52.63%	14 Missing and 4 partials ⚠️
.../org/apache/pinot/query/routing/WorkerManager.java	57.14%	12 Missing and 3 partials ⚠️
.../runtime/plan/server/ServerPlanRequestVisitor.java	65.00%	3 Missing and 4 partials ⚠️
...ite/rel/rules/PinotJoinExchangeNodeInsertRule.java	80.00%	2 Missing and 1 partial ⚠️
...inot/query/planner/serde/PlanNodeDeserializer.java	50.00%	2 Missing and 1 partial ⚠️
.../pinot/query/planner/serde/PlanNodeSerializer.java	50.00%	2 Missing and 1 partial ⚠️
...e/pinot/query/runtime/InStageStatsTreeBuilder.java	25.00%	2 Missing and 1 partial ⚠️
...e/operator/LeafStageTransferableBlockOperator.java	0.00%	3 Missing ⚠️
...not/query/runtime/operator/MultiStageOperator.java	25.00%	3 Missing ⚠️
... and 3 more

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #13966      +/-   ##
============================================
- Coverage     61.75%   57.84%   -3.91%     
- Complexity      207      219      +12     
============================================
  Files          2436     2613     +177     
  Lines        133233   143366   +10133     
  Branches      20636    22010    +1374     
============================================
+ Hits          82274    82930     +656     
- Misses        44911    53943    +9032     
- Partials       6048     6493     +445

Flag	Coverage Δ
custom-integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration	`<0.01% <0.00%> (-0.01%)`	⬇️
integration1	`<0.01% <0.00%> (-0.01%)`	⬇️
integration2	`0.00% <0.00%> (ø)`
java-11	`57.82% <39.85%> (-3.89%)`	⬇️
java-21	`57.73% <39.85%> (-3.90%)`	⬇️
skip-bytebuffers-false	`57.84% <39.85%> (-3.91%)`	⬇️
skip-bytebuffers-true	`57.70% <39.85%> (+29.98%)`	⬆️
temurin	`57.84% <39.85%> (-3.91%)`	⬇️
unittests	`57.84% <39.85%> (-3.91%)`	⬇️
unittests1	`40.69% <39.85%> (-6.20%)`	⬇️
unittests2	`27.92% <0.00%> (+0.18%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gortiz · 2024-09-10T09:31:53Z

...lanner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotJoinExchangeNodeInsertRule.java

+    if (PinotHintOptions.JoinHintOptions.LOOKUP_JOIN_STRATEGY.equals(joinStrategy)) {
+      // Lookup join
+      Preconditions.checkArgument(!joinInfo.leftKeys.isEmpty(), "Lookup join requires join keys");
+      newLeftInput = PinotLogicalExchange.create(leftInput, RelDistributions.hash(joinInfo.leftKeys));
+      // Right table should be a dimension table, and the right input should be an identifier only ProjectNode over
+      // TableScanNode.
+      Preconditions.checkState(rightInput instanceof Project, "Right input for lookup join must be a Project, got: %s",
+          rightInput.getClass().getSimpleName());
+      Project project = (Project) rightInput;
+      for (RexNode node : project.getProjects()) {
+        Preconditions.checkState(node instanceof RexInputRef,
+            "Right input for lookup join must be an identifier (RexInputRef) only Project, got: %s in project",
+            node.getClass().getSimpleName());
+      }
+      RelNode projectInput = PinotRuleUtils.unboxRel(project.getInput());
+      Preconditions.checkState(projectInput instanceof TableScan,
+          "Right input for lookup join must be a Project over TableScan, got Project over: %s",
+          projectInput.getClass().getSimpleName());
+      newRightInput = rightInput;


Maybe not needed for this PR, but in the future I think we should need to start moving logic to new rules. For example, we can have a rule that only applies when the hint is enabled and the right hand side is a project and (...all conditions). If the rule applies, we change the node to a DimJoin + exchanges.

gortiz · 2024-09-10T09:34:53Z

...lanner/src/main/java/org/apache/pinot/calcite/rel/rules/PinotJoinToDynamicBroadcastRule.java

-        ? ((HepRelVertex) join.getLeft()).getCurrentRel() : join.getLeft());
-    PinotLogicalExchange right = (PinotLogicalExchange) (join.getRight() instanceof HepRelVertex
-        ? ((HepRelVertex) join.getRight()).getCurrentRel() : join.getRight());
+    PinotLogicalExchange left = (PinotLogicalExchange) PinotRuleUtils.unboxRel(join.getLeft());


Note for future PRs: I think we can change PinotRuleUtils.unboxRel to return T extends PinotLogicalExcange. So we don't need to add the cast every time

gortiz · 2024-09-10T10:12:03Z

pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/plan/PhysicalPlanVisitor.java

+    // Validation
+    JoinRelType joinType = node.getJoinType();
+    int numLeftColumns = leftSchema.size();
+    int numResultColumns = node.getDataSchema().size();
+    if (joinType.projectsRight()) {
+      int numRightColumns = right.getDataSchema().size();
+      Preconditions.checkState(numLeftColumns + numRightColumns == numResultColumns,
+          "Invalid number of columns for join type: %s, left: %s, right: %s, result: %s", joinType, numLeftColumns,
+          numRightColumns, numResultColumns);
+    } else {
+      Preconditions.checkState(numLeftColumns == numResultColumns,
+          "Invalid number of columns for join type: %s, left: %s, result: %s", joinType, numLeftColumns,
+          numResultColumns);
+    }
+
+    PlanNode.NodeHint nodeHint = node.getNodeHint();
+    String joinStrategy = null;
+    Map<String, String> joinHints = nodeHint.getHintOptions().get(PinotHintOptions.JOIN_HINT_OPTIONS);
+    if (joinHints != null) {
+      joinStrategy = joinHints.get(PinotHintOptions.JoinHintOptions.JOIN_STRATEGY);
+    }
+    if (PinotHintOptions.JoinHintOptions.LOOKUP_JOIN_STRATEGY.equals(joinStrategy)) {
+      return new LookupJoinOperator(context, leftOperator, rightOperator, node);
+    } else {
+      return new HashJoinOperator(context, leftOperator, leftSchema, rightOperator, node);
+    }


Not blocker, but I would prefer to have another Calcite operator for lookup joins

gortiz · 2024-09-10T10:13:53Z

...ntime/src/main/java/org/apache/pinot/query/runtime/plan/server/ServerPlanRequestVisitor.java

+    if (right instanceof MailboxReceiveNode
+        && ((MailboxReceiveNode) right).getExchangeType() == PinotRelExchangeType.PIPELINE_BREAKER) {


Same here. Don't you think we are repeating checks all the way down to the physical plan? Instead we should decide the type of join we will use at broker level and just blindly transform the calcite PinotPipelineBreakerJoin, PinotHashJoin or PinotLookupJoin into the physical operator.

Again, not blocker

Are you suggesting using our customized Rel over the calcite LogicalJoin so that we can differentiate different join types? Then we can add different JoinNode accordingly for the ser/de?

Short answer: Yes.

Long answer:

Calcite itself expects that. The root idea in Calcite is that rules should optimize logical rules (ie pushing filters into joins), apply distribution, etc and then some final rules will transform logical rules (ie LogicalJoin) into specific joins. Calcite for example includes Enumerable operators that implement most logical operators. For example, EnumerableMergeJoin implements a nested loop join while EnumerableHashJoin implements a hash join. EnumerableJoinRule can be used to decide which one should be used. We wouldn't use EnumerableJoinRule, instead we should have our own rule that decides whether to use hash join, semi join, lookup join... but even more advanced joins like ones that merge a join+limit or join+aggregate.

For example imagine a tree like:

Aggregate (count by A.col1) Join select A select B

Right now Join emits (and allocate) a lot of rows just to be aggregated by its parent. It would be more efficient to count at the same time we build the blocks. Obviously we are not going to apply this kind of optimizations in the short term, but in the medium/large will be very effective and at the same time would be very error prone to repeat the logic that creates these optimizations in both Calcite (to prioritize the plans that can be optimized) and then in ServerPlanRequestVisitor.

Instead the Calcite's way should be to generate the AggregateJoinRel node and then we should be able to blindly generate the executable Pinot Operator whenever a AggregateJoinRel is received, without having to check conditions again (because we assume a friendly Broker that doesn't generate incorrect plans).

About serialization:

Due to our own decisions we decided to add an extra layer of JoinNodes (which I don't think they are necessary) and a layer of GRPC (which makes more sense, but we could just remove the JoinNode layer and transform Calcite operators directly into GRPC). We could also use the JSON representation of Calcite, but AFAIR we decided to use GRPC to do not depend on Calcite breaking backward compatibility.

Added JoinStrategy enum into the JoinNode. It is quite hard to change the Calcite LogicalJoin directly without breaking backward compatibility, so left that as a follow up

gortiz · 2024-09-10T10:15:46Z

.../src/main/resources/examples/batch/colocated/userGroups/userGroups_offline_table_config.json

@@ -1,6 +1,7 @@
 {
  "tableName": "userGroups",
  "tableType": "OFFLINE",
+  "isDimTable": true,


Given we also use colocated quickstart to test colocated joins, I think we should have another table like userGroups that is not a dim table.

This is not an actual problem right now that both colocated and dim joins have to be enabled explicitly with hints, but will be an issue in the future when both are going to be applied by default.

* A problem in the sense that we will need to specify hints to use one or the other mode

gortiz

As said a couple of times during the review, I would prefer to move the fact that we have a LookupJoin into Calcite planning phase instead of having to decide whether we use one or the other at both calcite level (generating different exchanges) and then in the server (when the physical plan is generated).

Anyway that is something we can discuss about in the future. For now the current solution is good enough for me.

I think it would be also cool to have a pinot property and a query option we can set to enable this feature by default. I'm already doing that in #13943

Jackie-Jiang added feature documentation multi-stage Related to the multi-stage query engine labels Sep 9, 2024

Jackie-Jiang requested review from xiangfu0 and gortiz September 9, 2024 22:24

Jackie-Jiang force-pushed the v2_lookup_join branch 3 times, most recently from 3d1beea to 31df207 Compare September 9, 2024 23:00

Jackie-Jiang mentioned this pull request Sep 9, 2024

Improve dimension table handing #13967

Merged

Jackie-Jiang force-pushed the v2_lookup_join branch 2 times, most recently from e272c60 to 04493c3 Compare September 10, 2024 03:48

gortiz reviewed Sep 10, 2024

View reviewed changes

gortiz approved these changes Sep 10, 2024

View reviewed changes

Jackie-Jiang force-pushed the v2_lookup_join branch from 04493c3 to dfc80c2 Compare September 12, 2024 06:00

[Multi-stage] Support lookup join

b9b7561

Jackie-Jiang force-pushed the v2_lookup_join branch from dfc80c2 to b9b7561 Compare September 13, 2024 00:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Multi-stage] Support lookup join #13966

[Multi-stage] Support lookup join #13966

Jackie-Jiang commented Sep 9, 2024 •

edited

Loading

codecov-commenter commented Sep 9, 2024 •

edited

Loading

gortiz Sep 10, 2024

gortiz Sep 10, 2024 •

edited

Loading

gortiz Sep 10, 2024

gortiz Sep 10, 2024

gortiz Sep 10, 2024

Jackie-Jiang Sep 10, 2024

gortiz Sep 11, 2024

Jackie-Jiang Sep 12, 2024

gortiz Sep 10, 2024 •

edited

Loading

gortiz left a comment

		if (right instanceof MailboxReceiveNode
		&& ((MailboxReceiveNode) right).getExchangeType() == PinotRelExchangeType.PIPELINE_BREAKER) {

[Multi-stage] Support lookup join #13966

Are you sure you want to change the base?

[Multi-stage] Support lookup join #13966

Conversation

Jackie-Jiang commented Sep 9, 2024 • edited Loading

codecov-commenter commented Sep 9, 2024 • edited Loading

Codecov Report

gortiz Sep 10, 2024

Choose a reason for hiding this comment

gortiz Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

gortiz Sep 10, 2024

Choose a reason for hiding this comment

gortiz Sep 10, 2024

Choose a reason for hiding this comment

gortiz Sep 10, 2024

Choose a reason for hiding this comment

Jackie-Jiang Sep 10, 2024

Choose a reason for hiding this comment

gortiz Sep 11, 2024

Choose a reason for hiding this comment

Jackie-Jiang Sep 12, 2024

Choose a reason for hiding this comment

gortiz Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

gortiz left a comment

Choose a reason for hiding this comment

Jackie-Jiang commented Sep 9, 2024 •

edited

Loading

codecov-commenter commented Sep 9, 2024 •

edited

Loading

gortiz Sep 10, 2024 •

edited

Loading

gortiz Sep 10, 2024 •

edited

Loading