Implement more vectorized aggregate functions #7200

akuzm · 2024-08-14T20:02:58Z

Vectorize common aggregate functions like min, max, sum, avg, stddev, variance for arithmetic types, for no grouping and grouping on segmentby columns.

Tsbench shows up to 11x improvement and 2x on average on affected queries: https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3730&var-run2=3728&var-threshold=0&var-use_historical_thresholds=true&var-threshold_expression=2.5%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false&from=now-2d&to=now

Depends on:

It's a little confusing because they only live during the creation of decompression plan. Put them into a separate struct instead.

also unroll the loop

This reverts commit 7852d55f3061b82a3ce1cf8d7575f64c2d14aa0b.

erimatnor

Submitting the comments I have so far. Haven't finished a full review, but I haven't found anything major yet. But I figured I shouldn't sit on this for too long, so let's start with some feedback.

erimatnor · 2024-09-16T09:22:32Z

tsl/src/nodes/vector_agg/function/agg_vector_validity_helper.c

+
+	if (valid1 == NULL && valid2 == NULL)
+	{
+		FUNCTION_NAME(vector_no_validity)(agg_state, vector, agg_extra_mctx);


Does this (no_validity) mean there are no valid values or there is no validity array (all valid?)?

Can you add comments to clarify each case here and below?

I renamed the function to vector_all_valid and added some comments.

erimatnor · 2024-09-16T14:06:56Z

tsl/src/nodes/vector_agg/function/float48_accum_single.c

+	}
+
+	/*
+	 * This code follows float8_accum(), see the comments there.


I guess this means float8_accum() in PG? Perhaps clarify that this is a function in the PG source code and not implemented here. It is non-obvious to people not deeply into this code.

Added the comment.

erimatnor · 2024-09-16T14:24:38Z

tsl/src/nodes/vector_agg/function/float48_accum_single.c

+	 */
+#ifdef NEED_SXX
+	Assert(*N > 0.0);
+	const double tmp = newval * (*N + 1.0) - (*Sx + newval);


Not sure it matters, but would it make sense to save N in a variable and then do N + 1.0 calc only once instead of three times?

erimatnor · 2024-09-17T07:26:42Z

tsl/src/nodes/vector_agg/function/float48_accum_types.c

+#undef AGG_NAME
+#undef NEED_SXX


Not sure why these two are udnef:ed here. I don't see them being defined anywhere above or in included files. Is it just a precaution?

They are defined in float48_accum_templates.c before including these files. Generally, the idea is that the template file undefines the macros that it is parameterized by. The float48_accum_types.c is parameterized by these two macros to generate the two families of transition functions that either require or not the Sxx state. I'm following this approach for template files in vectorized filters as well. Not sure where to best describe this.

erimatnor · 2024-09-17T07:32:42Z

tsl/src/nodes/vector_agg/function/int24_avg_accum_single.c

+#undef PG_TYPE
+#undef CTYPE
+#undef DATUM_TO_CTYPE


There's some inconsistency w.r.t. where these macros are defined and undefined.

You first define the macros listed here outside the file, including AGG_NAME, but then you don't undef all of the macros here. For example, AGG_NAME is still defined.

It would be good to have some kind of consistent approach/rule for this. For example, let's say all the defines and undefs would be handled outside. Otherwise it is a bit difficult to understand the intention.

Same comment as above, there's only a single instantiation of this template for a particular type, so the type is un-defined at the end, but multiple instantiations for the particular aggregate (AVG), so that one is un-defined above.

erimatnor · 2024-09-17T07:37:24Z

tsl/src/nodes/vector_agg/function/float48_accum_single.c

+	/*
+	 * Vector registers can be up to 512 bits wide.
+	 */
+#define UNROLL_SIZE ((int) (512 / 8 / sizeof(CTYPE)))


This seems like a global, hardware-specific limit. It is used in multiple places so I suggest moving it into a header where you can define it only once, as a macro on CTYPE.

Also wondering if there's a compiler header file that defines the vector register size available that can be used instead of hard-coding 512 here?

#define UNROLL_SIZE(CTYPE) ((int) 512 / 8 / sizeof(CTYPE)))

This is just a guesstimate that is specific to a particular unrolled loop, no deep meaning. So I'd keep it local. Maybe we can make it more architecture-specific, for now I just wrote a "least-effort" version that is vectorized at least somehow.

erimatnor · 2024-09-17T08:07:08Z

tsl/src/nodes/vector_agg/function/functions.c

+	const uint64 *restrict validity = (uint64 *) vector->buffers[0];
+	/* First, process the full words. */
+	for (int i = 0; i < n / 64; i++)


What happens here on 32-bit platforms? I guess it is just slower?

They still have the uint64 support, so it should work just as well. Probably slower, I didn't test.

erimatnor · 2024-09-17T08:11:35Z

tsl/src/nodes/vector_agg/function/sum_float_single.c

@@ -0,0 +1,100 @@
+/*


Noticed that sum float files (and minmax files) are named <agg>-<type>-<suffix> while many other files are named <type>-<agg>-<suffix>. Is this intentional for some reason or does it make sense to try to be consistent w.r.t. naming?

The naming follows the respective Postgres aggregate transition functions.

akuzm added 30 commits March 27, 2024 19:21

Merge remote-tracking branch 'origin/main' into HEAD

8cb97e0

something works

b27d2b1

full switch

13ba173

fix the build

be203fd

remove the old planning approach

ee8b1f4

remove more of old planning

e146937

typos

753bf0d

use enum indexes for settings

4a4f20b

cleanup

beba737

benchmark separate vectorized agg (2024-03-28 no. 1)

2dbda15

split out common code

175cbf2

show costs in explain

21faf6e

wrong prefix

fa2fb4d

Merge remote-tracking branch 'akuzm/vector-separate' into HEAD

0ed166f

Remove temporary data from DecompressChunkPath

30a6069

It's a little confusing because they only live during the creation of decompression plan. Put them into a separate struct instead.

rename

e25267d

typo

5e6221d

benchmark separate vectorized agg (2024-03-29 no. 2)

5c4af48

produce partials for each batch

4130683

also unroll the loop

benchmark separate vectorized agg (2024-03-29 no. 3)

e7f01ab

more generic interface

cff844d

fix outer_var resolution

209838e

Revert "disable filters"

dfb92af

This reverts commit 7852d55f3061b82a3ce1cf8d7575f64c2d14aa0b.

support filters?

6ef84c1

fix outer_var resolution

4db7cea

fix ref

287f3b4

Merge remote-tracking branch 'akuzm/vector-separate' into HEAD

a50069f

fix for filtered out batches

9bdae30

benchmark vectorized agg with filter (2024-03-29 no. 4)

398f317

fix build on windows

eaca282

akuzm added 25 commits September 5, 2024 15:30

Merge remote-tracking branch 'origin/main' into HEAD

5993c05

Merge remote-tracking branch 'origin/main' into HEAD

1014ff2

validity helper

d9fdebe

roll back together

fc78f1e

benchmark validity helpers (2024-09-05 no. 6)

0d9a53f

remove const implementation

4aced13

memory context for extra agg data

e8f57e6

Merge remote-tracking branch 'origin/main' into HEAD

6ee98e5

fix

8d69aa6

fix

308c210

fix

8e275b5

int24_avg_accum

5ebe303

renames

e24859c

Merge remote-tracking branch 'origin/main' into HEAD

89b00e7

minmax for date

d09d663

int128avgstate

fafeb6a

fix 14

f8d11d9

Merge remote-tracking branch 'origin/main' into HEAD

79828b3

benchmark aggregate functions (2024-09-10 no. 1)

153d4cc

some improvements to float sum

71ec5ae

benchmark aggregate functions (2024-09-11 no. 2)

71a99ca

separate translation units

087193a

cleanup

5f2a5d4

Merge remote-tracking branch 'origin/main' into HEAD

2591e61

changelog

b84dd50

akuzm marked this pull request as ready for review September 12, 2024 13:25

akuzm added 2 commits September 12, 2024 15:35

changelog

8b3cab1

fix

15cc10d

erimatnor reviewed Sep 18, 2024

View reviewed changes

review comments

00716dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement more vectorized aggregate functions #7200

Implement more vectorized aggregate functions #7200

akuzm commented Aug 14, 2024 •

edited

Loading

erimatnor left a comment

erimatnor Sep 16, 2024

akuzm Sep 18, 2024

erimatnor Sep 16, 2024

akuzm Sep 18, 2024

erimatnor Sep 16, 2024

akuzm Sep 18, 2024

erimatnor Sep 17, 2024

akuzm Sep 18, 2024

erimatnor Sep 17, 2024

akuzm Sep 18, 2024

erimatnor Sep 17, 2024

akuzm Sep 18, 2024

erimatnor Sep 17, 2024

akuzm Sep 18, 2024

erimatnor Sep 17, 2024

akuzm Sep 18, 2024

		#undef AGG_NAME
		#undef NEED_SXX

Implement more vectorized aggregate functions #7200

Are you sure you want to change the base?

Implement more vectorized aggregate functions #7200

Conversation

akuzm commented Aug 14, 2024 • edited Loading

erimatnor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akuzm commented Aug 14, 2024 •

edited

Loading