Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[perf] WASM hashing function #3241

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Conversation

romgrk
Copy link
Contributor

@romgrk romgrk commented Aug 21, 2024

I work at MUI and I've been doing some performance optimization. I've noticed that murmur2 is at the top of the stack traces for the test case benchmark I'm using, so I wanted to open a PR here to discuss options to improve the performance.

image

I've extracted some actual style strings that are hashed at runtime by emotion (from the test case above) and created a benchmark to compare emotion's JS murmur2 implementation to alternatives. I used rust+wasm to implement those alternatives, because the use-case is a good match for the performance benefits of wasm. Here is a summary of different versions compared to the current one

Passing strings from JS to WASM implies encoding them in an array buffer, so the performance scales differently on the input string length. The original JS version performs about 25% faster than the rest for small strings (up until a length of around 64 characters), but after that the WASM versions take the lead. The table below shows the results for the strings extracted at runtime described above which is imho close enough to what happens in production for real-word users.

version ops/s result
murmur2_original 1 259 ops/s, ±0.67% 74.38% slower
murmur2_rust() 3 635 ops/s, ±0.60% 26.04% slower
xxh_rust() 4 915 ops/s, ±0.44% fastest
xxh3_rust() 4 095 ops/s, ±0.55% 16.68% slower
city_hash_rust() 4 109 ops/s, ±0.68% 16.4% slower
adaptive() 4 102 ops/s, ±0.77% 16.54% slower

The fastest option is xxHash, which is newer and faster than murmur2, with a similar collision rate. adaptive is a combination of using murmur2_original for small strings and xxh_rust for large strings, but the additional branching instruction (if (string.length > 64) ...) makes it slower.

Using a WASM module also doesn't need to add any build/bundle time complexity for users, as we can ship the module inline, something like this. The compiled wasm module is around 500 bytes, so a realistic cost for the wrapper and base64 bytes would be around 800 bytes.

const wasm = WebAssembly.Module(atob('/* base64 encoded bytes */'))

I have also benchmarked the MUI test case again after replacing the hashing function. The results aren't as impressive as I expected (I was hoping for a 70% reduction of hashing time based on the table above), but this change improves MUI's rendering performance by about 2-4%.

before after
Screenshot from 2024-08-21 00-16-24 Screenshot from 2024-08-21 00-15-57

The code in this PR is a draft but I'd be happy to have an early review/opinion before doing more work.

Copy link

changeset-bot bot commented Aug 21, 2024

⚠️ No Changeset found

Latest commit: 84220c3

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

codesandbox-ci bot commented Aug 21, 2024

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

@Andarist
Copy link
Member

Using a WASM module also doesn't need to add any build/bundle time complexity for users, as we can ship the module inline, something like this. The compiled wasm module is around 500 bytes, so a realistic cost for the wrapper and base64 bytes would be around 800 bytes.

800 bytes is still a sizeable % of the overall bundlesize IIRC. I also don't think we can assume that web assembly is available on the host platform so we'd have to include both versions to avoid breaking changes. cc @emmatown

@romgrk
Copy link
Contributor Author

romgrk commented Aug 21, 2024

For more precise numbers, XXH is 519 bytes of binary, and murmur2 is 386 bytes of binary, so total 800 and 650 after wrapping & base64 is a good figure.

@romgrk
Copy link
Contributor Author

romgrk commented Aug 21, 2024

I have improvements for the JS implementation but I have split them in #3242.

@Andarist
Copy link
Member

Andarist commented Aug 21, 2024

How much faster is this WASM version than the improved JS function you prepared? Also, what would be the numbers for JS version of xxHash?

@romgrk
Copy link
Contributor Author

romgrk commented Aug 21, 2024

The performance numbers are all in the top comment of that PR, but roughly 2x faster than the improved javascript implementation, which is itself roughly 2x faster than the current one (though TextEncoder is the most crucial part of that improvement).

I don't know for a JS xxHash, the only existing port I found, https://github.com/pierrec/js-xxhash, is written in an OOP style, so that won't fly for a performance improvement. I took a look at the source code but it's much more dense than murmur2 so I haven't tried porting it yet.

On the same topic, would you be open to switch the algorithm if it increases the collision probability (reasonably)? I've been wondering if I could find a simpler algo that maintains a good enough collision rate. I'm not familiar with your codebase, how do you handle collisions?

@Andarist
Copy link
Member

Before considering the WASM version, I'd love to see what could be squeezed out of it using a performant JS port of xxHash.

On the same topic, would you be open to switch the algorithm if it increases the collision probability (reasonably)?

Potentially yes.

I'm not familiar with your codebase, how do you handle collisions?

We pray they won't happen.

@romgrk
Copy link
Contributor Author

romgrk commented Aug 21, 2024

I've found a performant port of xxh but it can't beat murmur2 in javascript:

Running "Benchmark hash, string length = 1245" suite...

  murmur2_original():
    1 258 ops/s, ±0.53%   | slowest, 74.04% slower
  murmur2_improved():
    2 770 ops/s, ±0.68%   | 42.84% slower
  xxh_javascript():
    2 634 ops/s, ±0.64%   | 45.65% slower
  murmur2_rust():
    3 640 ops/s, ±0.75%   | 24.89% slower
  xxh_rust():
    4 846 ops/s, ±0.85%   | fastest

Having looked at the implementation for a few hashing functions, I think it's going to be hard to find anything faster than murmur2 in JS. The reason why xxh can be faster than murmur2 in WASM is that its algorithm is easy to optimize & auto-vectorize, so it can use better instructions than murmur2 can, in particular SIMD instructions that can only be accessed via WASM in the browser.

@romgrk
Copy link
Contributor Author

romgrk commented Aug 23, 2024

Follow-up of @emmatown comment in the other PR, the algo needs to be stable in case of server-rendering hydration, which means either murmur2 or xxh in both versions. You have all the numbers here, please let me know if/how you want to proceed.

Both algos are neck and neck in JS, but xxh is substantially faster in WASM, so I would recommend switching to that one.

Copy link

codecov bot commented Aug 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.83%. Comparing base (f4640f6) to head (230f8eb).
Report is 67 commits behind head on main.

Additional details and impacted files

see 44 files with indirect coverage changes

@oliviertassinari
Copy link

Goober seems to use a really simple hash algorithm: https://github.com/cristianbote/goober/blob/master/src/core/to-hash.js, and seems to be really fast on the server: https://twind.dev/handbook/introduction.html#opportunities

SCR-20240825-ouje

@cristianbote has been iterating on it for some time. But maybe it has a too high collision probability.

@romgrk
Copy link
Contributor Author

romgrk commented Aug 25, 2024

I have been spending way too much time researching hashing functions so here is some data.

Performance

First, here are the performance results across engines. The _unmodified variants are copy/pasted (from emotion and goober), and the other ones are ported/modified by me; those ones all go through TextEncoder which is imo best for performance most of the time (but not always, as some SpiderMonkey entries indicate). This uses the MUI dashboard dataset.

So what I said above:

I've found a performant port of xxh but it can't beat murmur2 in javascript:

Is false, and xxh can beat murmur2 in javascript depending on the dataset. Anyway this still supports my previous recommendation to switch to xxh. The results on SpiderMonkey are however puzzling to me, but tbh the engine seems to have weird performance cliffs in many situations, I've given up on trying to understand why.

Collision rate

I have plotted the distribution of hashes for the "Numbers" dataset as described in this stackoverflow answer, for the different functions:

DJB2a goober fnv1-a murmur2 xxh
djb2 goober fnv1a murmur2_improved xxh

Nothing to add here, the plots speak for themselves.

So summary is, goober not good for us.

asm.js annotations

@cristianbote Quick win for goober though, you can add asm.js annotations to speed it up, JSC seems to need those:

image

// original
while (i < str.length) out = (101 * out + str.charCodeAt(i++)) >>> 0;

// asm.js
while (i < str.length) out = (Math.imul(101, out) + str.charCodeAt(i++) | 0) >>> 0;

Links

All benchmarking code here: https://github.com/romgrk/js-benchmark

# benchmarks
node ./index.js ./benchmarks/hashing.js
# plots
node ./hash-testing.js

@romgrk
Copy link
Contributor Author

romgrk commented Aug 25, 2024

Funny thought I had but I took the fast hex formatting function from mui/material-ui#43289 and replaced the final n.toString(36) by formatHex(n), substantial performance improvement as well. So @Andarist @emmatown if you're willing to replace your classnames from css-[base-36] to css-[base-16], this can also speed up the hashing.

image

@cristianbote
Copy link

wow, nice! Love these insights @romgrk

Took them for a spin locally on my end and I can confirm your results. Don't wanna side-track the conversation here about goober 😅 so I've opened up this PR to continue the discussion there if you'd like. cristianbote/goober#588

@Andarist
Copy link
Member

FYI, I'm on PTO this week. I can continue the conversation when I get back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants