-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: funnel actors queries on udf #25013
base: master
Are you sure you want to change the base?
Conversation
2fce29a
to
13f8a8a
Compare
…to aspicer/funnel_actors
…to aspicer/funnel_actors
…to aspicer/funnel_actors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything is perfect! I didn't manage to test associated session recordings locally. It's always empty, but it's also empty without UDFs. One potential low-hanging fruit is using the orjson module instead of json since we deal with massive data arrays. When Ansible scripts for UDFs are in place, modules should be easier to use.
select: list[ast.Expr] = [ | ||
ast.Alias(alias="actor_id", expr=ast.Field(chain=["aggregation_target"])), | ||
*self._get_funnel_person_step_events(), | ||
# *self._get_timestamp_outer_select(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant line
@skoob13 Just did a quick benchmark to see what's what. In the benchmark, I parsed a json blob, iterated through a 100 item array and multipled each item by 2, to emulate doing some logic in the UDF, and then printed out the result, 1,000,000 times. Python JSON: 14.8s A little surprised about the node performance, thought it would be a bit faster! Using orjson definitely makes sense, but can you think of an easy way to administer it? Would we have to set up ansible to maintain a virtualenv at each clickhouse node? Or maybe their distro provides a package they can install natively.
Python
JS
Rust
|
…to aspicer/funnel_actors
Move funnel actors queries over to UDF if you have the feature flag on.
Passes the UUID of each event to the UDF. The UDF returns, for each step, a list of event UUIDs that correspond to the funnel matches for each user.
Adds in timing fields used by correlation and user paths.
Also a tiny bit of cleanup of the superfluous leftovers in SQL that were hanging on from the old queries.
Testing
Tested with tests. Added UDF actors query unit tests. Also tested quite a bit on dev.