-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement more vectorized aggregate functions #7200
base: main
Are you sure you want to change the base?
Conversation
It's a little confusing because they only live during the creation of decompression plan. Put them into a separate struct instead.
also unroll the loop
This reverts commit 7852d55f3061b82a3ce1cf8d7575f64c2d14aa0b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitting the comments I have so far. Haven't finished a full review, but I haven't found anything major yet. But I figured I shouldn't sit on this for too long, so let's start with some feedback.
|
||
if (valid1 == NULL && valid2 == NULL) | ||
{ | ||
FUNCTION_NAME(vector_no_validity)(agg_state, vector, agg_extra_mctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this (no_validity) mean there are no valid values or there is no validity array (all valid?)?
Can you add comments to clarify each case here and below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the function to vector_all_valid
and added some comments.
} | ||
|
||
/* | ||
* This code follows float8_accum(), see the comments there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this means float8_accum() in PG? Perhaps clarify that this is a function in the PG source code and not implemented here. It is non-obvious to people not deeply into this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the comment.
*/ | ||
#ifdef NEED_SXX | ||
Assert(*N > 0.0); | ||
const double tmp = newval * (*N + 1.0) - (*Sx + newval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it matters, but would it make sense to save N in a variable and then do N + 1.0 calc only once instead of three times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
#undef AGG_NAME | ||
#undef NEED_SXX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why these two are udnef:ed here. I don't see them being defined anywhere above or in included files. Is it just a precaution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are defined in float48_accum_templates.c
before including these files. Generally, the idea is that the template file undefines the macros that it is parameterized by. The float48_accum_types.c
is parameterized by these two macros to generate the two families of transition functions that either require or not the Sxx state. I'm following this approach for template files in vectorized filters as well. Not sure where to best describe this.
#undef PG_TYPE | ||
#undef CTYPE | ||
#undef DATUM_TO_CTYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some inconsistency w.r.t. where these macros are defined and undefined.
You first define
the macros listed here outside the file, including AGG_NAME
, but then you don't undef
all of the macros here. For example, AGG_NAME
is still defined.
It would be good to have some kind of consistent approach/rule for this. For example, let's say all the defines and undefs would be handled outside. Otherwise it is a bit difficult to understand the intention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, there's only a single instantiation of this template for a particular type, so the type is un-defined at the end, but multiple instantiations for the particular aggregate (AVG), so that one is un-defined above.
/* | ||
* Vector registers can be up to 512 bits wide. | ||
*/ | ||
#define UNROLL_SIZE ((int) (512 / 8 / sizeof(CTYPE))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a global, hardware-specific limit. It is used in multiple places so I suggest moving it into a header where you can define it only once, as a macro on CTYPE.
Also wondering if there's a compiler header file that defines the vector register size available that can be used instead of hard-coding 512 here?
#define UNROLL_SIZE(CTYPE) ((int) 512 / 8 / sizeof(CTYPE)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a guesstimate that is specific to a particular unrolled loop, no deep meaning. So I'd keep it local. Maybe we can make it more architecture-specific, for now I just wrote a "least-effort" version that is vectorized at least somehow.
const uint64 *restrict validity = (uint64 *) vector->buffers[0]; | ||
/* First, process the full words. */ | ||
for (int i = 0; i < n / 64; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens here on 32-bit platforms? I guess it is just slower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They still have the uint64 support, so it should work just as well. Probably slower, I didn't test.
@@ -0,0 +1,100 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed that sum float files (and minmax files) are named <agg>-<type>-<suffix>
while many other files are named <type>-<agg>-<suffix>
. Is this intentional for some reason or does it make sense to try to be consistent w.r.t. naming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming follows the respective Postgres aggregate transition functions.
Vectorize common aggregate functions like
min
,max
,sum
,avg
,stddev
,variance
for arithmetic types, for no grouping and grouping on segmentby columns.Tsbench shows up to 11x improvement and 2x on average on affected queries: https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3730&var-run2=3728&var-threshold=0&var-use_historical_thresholds=true&var-threshold_expression=2.5%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false&from=now-2d&to=now
Depends on: