Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Implement single inequality joins for join_where #18727

Merged
merged 4 commits into from
Sep 13, 2024

Conversation

adamreeve
Copy link
Contributor

This implements the "piecewise merge join" algorithm (described in this DuckDB article) to handle join_where with a single inequality, without using a cross join and filter.

I've reused the IEJoin join type internally due to the similarities in how these two join types are handled. Technically this isn't using the IEJoin algorithm but it is another type of inequality join, so this seems OK, but it could be pulled out into its own join type if needed.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Sep 13, 2024
Copy link

codspeed-hq bot commented Sep 13, 2024

CodSpeed Performance Report

Merging #18727 will degrade performances by 25.3%

Comparing adamreeve:iejoin_single (f45d106) with main (54218e7)

Summary

❌ 1 regressions
✅ 38 untouched benchmarks

🆕 1 new benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main adamreeve:iejoin_single Change
test_groupby_h2oai_q3 2.3 ms 3.1 ms -25.3%
🆕 test_single_inequality N/A 80.8 ms N/A

Copy link

codecov bot commented Sep 13, 2024

Codecov Report

Attention: Patch coverage is 86.95652% with 24 lines in your changes missing coverage. Please review.

Project coverage is 79.86%. Comparing base (dddf0b7) to head (f45d106).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-ops/src/frame/join/iejoin/mod.rs 85.36% 24 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18727      +/-   ##
==========================================
- Coverage   79.88%   79.86%   -0.02%     
==========================================
  Files        1513     1513              
  Lines      203466   203631     +165     
  Branches     2892     2892              
==========================================
+ Hits       162546   162640      +94     
- Misses      40372    40443      +71     
  Partials      548      548              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one @adamreeve. Great that we can reuse the parallelism logic. Makes a lot of sense.

All this work can also greatly be re-used in the streaming engine.

@ritchie46 ritchie46 merged commit 759dd3b into pola-rs:main Sep 13, 2024
27 checks passed
@adamreeve adamreeve deleted the iejoin_single branch September 13, 2024 07:04
nameexhaustion pushed a commit to nameexhaustion/polars that referenced this pull request Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants