Improve performance of memoized validator functions. #122

leblowl · 2024-08-06T19:20:49Z

The memoize cache key was based on a hash of the entire graph. This hash was being generated each time a link was validated. On my system, validating each link took around 100ms with a graph of around 2000 links. With this change it appears that validating each link takes less than 1ms.

Of course, we don't want cache key collisions. Do you think using link.hash and graph.root should be unique enough for the expected use cases?

The cache key was based on a hash of the entire graph. This hash was being generated each time a link was validated. On my system, validating each link took around 100ms with a graph of around 2000 links. With this change it appears that validating each link takes less than 1ms.

HerbCaudill · 2024-09-12T14:08:22Z

Hi, Lucas - thanks for all these PRs! Sorry for not responding earlier - I was afk for the past month or so. I'll go through these now.

I'd love to see your benchmarks and maybe add them to the repo - would that code be easy for you to pull up?

leblowl · 2024-09-12T14:44:11Z

You're welcome. I've been testing end-to-end by adding console.log statements in the code and then running a scenario. Would that test code setup be something you are interested in? I could probably come up with some unit tests as well.

HerbCaudill · 2024-09-12T14:53:52Z

vitest has built-in benchmarking support using tinybench - could be cool to simulate some of these scenarios within the test suite as a way of documenting these performance improvements, and also providing a baseline for further optimization work.

leblowl · 2024-09-12T15:03:55Z

Sure, I can see what I can do

leblowl · 2024-09-12T18:04:44Z

I just added a benchmark. How does that look to you?

Before PR:

 ✓ packages/auth/src/test/auth.benchmark.ts (1) 47002ms
   ✓ auth (1) 47000ms
     name                      hz       min       max      mean       p75       p99      p995      p999     rme  samples
   · a new member joining  0.3610  2,711.03  2,829.56  2,770.39  2,814.50  2,829.56  2,829.56  2,829.56  ±1.05%       10

After PR:

 ✓ packages/auth/src/test/auth.benchmark.ts (1) 4631ms
   ✓ auth (1) 4629ms
     name                      hz     min     max    mean     p75     p99    p995    p999      rme  samples
   · a new member joining  3.6959  236.43  367.57  270.57  282.42  367.57  367.57  367.57  ±10.20%       10

leblowl · 2024-09-13T15:40:12Z

It looks like some tests are failing due to the mutable nature of the graph. For example, in this test:

auth/packages/crdx/src/validator/test/validate.test.ts

Lines 49 to 50 in 6ab7e9c

    
           rootLink.body.prev = graph.head 
        
           expect(validate(graph)).not.toBeValid()

We modify the root entry, but the entry's hash field doesn't change. So given this, the memoization changes in this PR won't work.

One option is to remove memoization on validation. It looks like validation is only used when creating the Team and then when receiving a message:

auth/packages/crdx/src/sync/receiveMessage.ts

Line 73 in 6ab7e9c

const validation = validate(mergedGraph)

And upon receiving a message, if the graph changes at all, the original memoization approach isn't going to help because hash('memoize', graph) will change:

auth/packages/crdx/src/validator/validators.ts

Line 89 in 6ab7e9c

return `${hash('memoize', link)}:${hash('memoize', graph)}`

Another option is to keep the original memoization, but compute the hash of the graph only once before we call the individual validators:

auth/packages/crdx/src/validator/validate.ts

Line 60 in 6ab7e9c

const result = validator(currentLink, graph)

So that we are not computing the hash of the graph when validating each link. In this scenario we would still need to compute the hash of each link. But if the graph ever changes because of a new message, from what I can tell, the memoization will not help and will increase memory usage.

If you have any thoughts on the direction you'd like to go, I'm happy to code something up.

HerbCaudill · 2024-09-13T18:32:52Z

Yeah - I've done some experiments and I think we should memoize the validate function rather than each of the individual validators. This gives us performance comparable to what you were getting and gets the tests passing again. I'll push some changes for you to review.

HerbCaudill · 2024-09-13T18:33:36Z

can you set this PR to allow changes by maintainers?

leblowl · 2024-09-13T19:52:19Z

Sounds good. Unfortunately, since this PR is using the TryQuiet/auth repo, it doesn't look like I have the option to allow changes by maintainers. I opened another PR from my personal repo that should allow edits: #129

Add benchmark for member joining

6f47f94

leblowl mentioned this pull request Sep 13, 2024

Improve performance of memoized validator functions. #129

Merged

leblowl closed this Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of memoized validator functions. #122

Improve performance of memoized validator functions. #122

leblowl commented Aug 6, 2024

HerbCaudill commented Sep 12, 2024

leblowl commented Sep 12, 2024

HerbCaudill commented Sep 12, 2024

leblowl commented Sep 12, 2024

leblowl commented Sep 12, 2024

leblowl commented Sep 13, 2024 •

edited

Loading

HerbCaudill commented Sep 13, 2024

HerbCaudill commented Sep 13, 2024

leblowl commented Sep 13, 2024

Improve performance of memoized validator functions. #122

Improve performance of memoized validator functions. #122

Conversation

leblowl commented Aug 6, 2024

HerbCaudill commented Sep 12, 2024

leblowl commented Sep 12, 2024

HerbCaudill commented Sep 12, 2024

leblowl commented Sep 12, 2024

leblowl commented Sep 12, 2024

leblowl commented Sep 13, 2024 • edited Loading

HerbCaudill commented Sep 13, 2024

HerbCaudill commented Sep 13, 2024

leblowl commented Sep 13, 2024

leblowl commented Sep 13, 2024 •

edited

Loading