Big performance regression between 0.31 and 0.32 #914

tomaka · 2024-01-29T11:00:45Z

Calling the Metadata_metadata runtime function of the latest Westend runtime with wasmi 0.31 takes around 750ms.

Calling the same runtime function with wasmi 0.32 takes around 4.5 seconds in CompilationMode::Eager. With CompilationMode::Lazy, it's around 5 to 7 seconds.

The text was updated successfully, but these errors were encountered:

tomaka · 2024-01-29T11:16:38Z

Take this with a grain of salt, but I wouldn't be surprised if this was the "runtime inefficiencies" in the "rare case" that you mention in #860

The way Metadata_metadata is implemented in Rust is that there's a trait implementation on tons of structs of the runtime code, and Metadata_metadata calls a function on this trait implementation of every single struct.
For this reason, I suspect that Metadata_metadata calls tons of small functions that are spread out everywhere within the Wasm.

Robbepop · 2024-01-29T11:21:41Z

@tomaka Thanks for reporting!

Calling many very small functions is the worst case situation for the Wasmi (register) implementation. Wasmi (stack) actually performs better for many small function calls. Aggressive inlining could maybe help if this indeed is the problem so it is worth a try if possible. The linked relaxed br+cmp encoding shouldn't play a big role here since it isn't used for calls and only plays a role for ultra large Wasm functions that have jumps within the same function over very long distances (>32k Wasmi instruction words). It would be great if you could get the Wasm input into a minimize example that has similar performance behaviors but I suppose this is going to be very tricky to extract.

I agree that a 6x performance regression (750ms vs 4.5s) is not reasonable.

Robbepop · 2024-01-29T11:41:51Z

@tomaka The exact Wasm blob that was used and ideally a trace of executed instructions might be helpful if you have them.

tomaka · 2024-01-30T10:27:44Z

runtime.hex.txt

Here's the runtime in hexadecimal.

Robbepop · 2024-01-31T12:20:47Z

@tomaka

I just noticed that the given westend.wasm binary was apparently not properly optimized using wasm-opt (version 116). I was under the assumption that Polkadot runtimes are always optimized with this tool since it really improves performance and especially binary size quite a bit.

7.5M Jan 31 13:14 westend.wasm
5.9M Jan 31 13:18 westend.opt.o3.wasm  ;; compiled with -O3
5.5M Jan 31 13:23 westend.opt.oz.wasm  ;; compiled with -Oz

It could be worth a try to see if this westend.opt.oX.wasm does not have this performance issue with Wasmi. The reason for this is that Wasmi relies on proper optimizations to run at full speed since (unlike for example Wasmtime) it cannot do its own proper optimizations due to its linear time compilation promise.

Besides that I found a new opportunity to improve performance: #920
However, I have absolutely no clue how much this will really affect the performance of westend.wasm or execution in general and I am pretty sure that this is not the main offender with respect to performance here.

Robbepop · 2024-02-02T13:13:28Z

While inspecting the westend.wat file I saw this:

(func $memcpy (;8628;) (type 5) (param i32 i32 i32) (result i32)
  local.get 0
  local.get 1
  local.get 2
  call $_ZN17compiler_builtins3mem6memcpy17h7c9cbc967e51fdc6E
)
(func $memset (;8629;) (type 5) (param i32 i32 i32) (result i32)
  local.get 0
  local.get 1
  local.get 2
  call $_ZN17compiler_builtins3mem6memset17h7a0db38d43a2ca78E
)

Which indicates to me that the Westend runtime is compiled without the Wasm bulk-memory proposal although it has been evaluated in the past that this may yield serious performance improvements: paritytech/polkadot-sdk#36

Especially for Wasmi being an interpreter the speed-ups gained could be even way more massive since the built-in memcpy and memset instructions in bulk-memory are not themselves interpreted but run natively via Wasmi. So we should test if enabling bulk-memory Wasm proposal fixes the performance issues reported in this issue.

cc @tomaka

tomaka · 2024-02-06T13:18:44Z

After a relatively long investigation and a lot of DMs, this turned out to be caused by debug-assertions = true in the Cargo.toml.
Disabling debug assertions is critical in order to get decent performances.

tomaka mentioned this issue Jan 29, 2024

Revert wasmi to v0.31 again smol-dot/smoldot#1624

Merged

Robbepop added the bug Something isn't working label Jan 29, 2024

tomaka closed this as not planned Won't fix, can't repro, duplicate, stale Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big performance regression between 0.31 and 0.32 #914

Big performance regression between 0.31 and 0.32 #914

tomaka commented Jan 29, 2024

tomaka commented Jan 29, 2024

Robbepop commented Jan 29, 2024 •

edited

Loading

Robbepop commented Jan 29, 2024

tomaka commented Jan 30, 2024 •

edited

Loading

Robbepop commented Jan 31, 2024 •

edited

Loading

Robbepop commented Feb 2, 2024

tomaka commented Feb 6, 2024

Big performance regression between 0.31 and 0.32 #914

Big performance regression between 0.31 and 0.32 #914

Comments

tomaka commented Jan 29, 2024

tomaka commented Jan 29, 2024

Robbepop commented Jan 29, 2024 • edited Loading

Robbepop commented Jan 29, 2024

tomaka commented Jan 30, 2024 • edited Loading

Robbepop commented Jan 31, 2024 • edited Loading

Robbepop commented Feb 2, 2024

tomaka commented Feb 6, 2024

Robbepop commented Jan 29, 2024 •

edited

Loading

tomaka commented Jan 30, 2024 •

edited

Loading

Robbepop commented Jan 31, 2024 •

edited

Loading