perf(fixed-deque): avoid costly modulo op in hot code path #221

jerome-benoit · 2024-01-17T10:22:26Z

No description provided.

Signed-off-by: Jérôme Benoit <[email protected]>

Yomguithereal · 2024-01-17T12:49:15Z

@jerome-benoit did you benchmark this? You are indeed getting rid of the modulo operation but you are adding and if condition, so branching, which could also cost more than the modulo operation.

jerome-benoit · 2024-01-17T13:38:59Z

The fastpath in the modulo algorithm is doing more than one basic ops, use a loop on a condition (branch multiple times).
I've benchmarked it some times ago in https://github.com/poolifier/poolifier/blob/master/benchmarks/worker-selection/round-robin.mjs.

The results are still on node 18.x:

~/SAPDevelop/poolifier-git/benchmarks/worker-selection (master ✔) ᐅ node round-robin.mjs 
Ternary off by one x 126,453,435 ops/sec ±0.11% (98 runs sampled)
Ternary with negation x 125,223,674 ops/sec ±0.16% (98 runs sampled)
Ternary with pre-choosing x 126,727,332 ops/sec ±0.12% (96 runs sampled)
Increment+Modulo x 126,534,434 ops/sec ±0.11% (100 runs sampled)
Fastest is Ternary with pre-choosing

Yomguithereal · 2024-01-17T14:58:28Z

@jerome-benoit I don't understand. You benchmark results are showing that there is no significant difference between any of the tested methods at all.

jerome-benoit · 2024-01-17T15:22:14Z

It's just a micro-optimisation that is consistently behaving because can't have slowpath like the euclidean division algo used in modulo. No more worst case and the usual case is still a little faster.

jerome-benoit · 2024-01-17T15:31:49Z

Results with an integer value triggering the worst case euclidian division code path:

~/SAPDevelop/poolifier-git/benchmarks/worker-selection (master ✔) ᐅ node round-robin.mjs
Ternary off by one x 126,177,714 ops/sec ±0.21% (95 runs sampled)
Ternary with negation x 125,207,891 ops/sec ±0.19% (98 runs sampled)
Ternary with pre-choosing x 124,669,268 ops/sec ±0.81% (97 runs sampled)
Increment+Modulo x 124,325,521 ops/sec ±0.44% (95 runs sampled)
Fastest is Ternary off by one

Results are consistent across run between values triggering the worst case or not on a non noisy env, unlike the one I'm currently using. That's was the base that made us used a ternary condition in the RR poolifier implementation some times ago. Latest node version seems to have a better euclidian division algo than in 16x and previous. But still, if a code path can avoid to trigger the euclidian division division algo, even done in C and optimized, I guess it's a good idea to replace it ;)

perf(fixed-deque): avoid costly modulo op in hot code path

989ba52

Signed-off-by: Jérôme Benoit <[email protected]>

Yomguithereal merged commit 608ec41 into Yomguithereal:master Feb 2, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(fixed-deque): avoid costly modulo op in hot code path #221

perf(fixed-deque): avoid costly modulo op in hot code path #221

jerome-benoit commented Jan 17, 2024 •

edited

Loading

Yomguithereal commented Jan 17, 2024

jerome-benoit commented Jan 17, 2024 •

edited

Loading

Yomguithereal commented Jan 17, 2024

jerome-benoit commented Jan 17, 2024

jerome-benoit commented Jan 17, 2024 •

edited

Loading

perf(fixed-deque): avoid costly modulo op in hot code path #221

perf(fixed-deque): avoid costly modulo op in hot code path #221

Conversation

jerome-benoit commented Jan 17, 2024 • edited Loading

Yomguithereal commented Jan 17, 2024

jerome-benoit commented Jan 17, 2024 • edited Loading

Yomguithereal commented Jan 17, 2024

jerome-benoit commented Jan 17, 2024

jerome-benoit commented Jan 17, 2024 • edited Loading

jerome-benoit commented Jan 17, 2024 •

edited

Loading

jerome-benoit commented Jan 17, 2024 •

edited

Loading

jerome-benoit commented Jan 17, 2024 •

edited

Loading