Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEDONA-655] DBSCAN #1589

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

[SEDONA-655] DBSCAN #1589

wants to merge 5 commits into from

Conversation

james-willis
Copy link
Contributor

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

This PR add a DBSCAN function in the scala and python APIs of the spark implementation of sedona.

How was this patch tested?

unit tests

Did this PR include necessary documentation updates?

TODO

- Yes, I am adding a new API. I am using the current SNAPSHOT version number in vX.Y.Z format.

@jiayuasu
Copy link
Member

@james-willis The memory consumption of new tests seem to be very high?

@james-willis
Copy link
Contributor Author

@james-willis The memory consumption of new tests seem to be very high?

This is something I had worked with Kristin on in the past. We had thought it was related to cached dataframes holding references to broadcast relations. But this committed version of the code does not contain any caching, just checkpoints.

There is some persisting inside of the connected component implementation, but those should be getting cleaned up. I'll check if something here is leaking persisted dataframes

@james-willis
Copy link
Contributor Author

Theres some dataframes inside of the connected components algo that dont get unpersisted. I will look into raising a PR in that repo tomorrow to fix that.

@jiayuasu jiayuasu linked an issue Sep 20, 2024 that may be closed by this pull request
@james-willis
Copy link
Contributor Author

I submitted a fix to graphframes for this. I can clear the spark catalog if the disable broadcast fix doesnt resolve

graphframes/graphframes#459

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ST_ClusterDBSCAN Feature Request
2 participants