Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add how="leftanti" support for cudf-backed merge #1073

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

charlesbluca
Copy link
Member

Looks like we should unblocked to support left anti joins when dataframe.backend="cudf", similar to the case in legacy Dask dataframe:

https://github.com/dask/dask/blob/df4de6ea53054790b09006c8ea68ef8725d39025/dask/dataframe/multi.py#L565

Note that like the legacy code, we'll fail somewhere down in the comptutation stack if we try this on CPU - not sure if it makes sense to check the backend if how="leftanti" and eagerly raise a NotImplementedError if dataframe.backend != "cudf".

cc @rjzamora

Copy link
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @charlesbluca - Seems reasonable to add "leftanti" support given that the necessary logic is pretty simple, and the legacy dask.dataframe API supports it.

dask_expr/_collection.py Outdated Show resolved Hide resolved
df2 = df2.rename(columns={"aa": "dd"})
assert_eq(
df1.merge(df2, how="leftanti", left_on="aa", right_on="dd"),
pdf1[~pdf1.aa.isin(pdf2.aa)],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we could just do this in merge_chunk for pandas data to support how="leftanti" for cpu as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point, can look into this a bit more

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed some commits to dask/dask#11150 that, in conjunction with this PR, should unblock left anti/semi joins on CPU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants