Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of memoized validator functions. #122

Closed
wants to merge 2 commits into from

Conversation

leblowl
Copy link
Contributor

@leblowl leblowl commented Aug 6, 2024

The memoize cache key was based on a hash of the entire graph. This hash was being generated each time a link was validated. On my system, validating each link took around 100ms with a graph of around 2000 links. With this change it appears that validating each link takes less than 1ms.

Of course, we don't want cache key collisions. Do you think using link.hash and graph.root should be unique enough for the expected use cases?

The cache key was based on a hash of the entire graph. This hash was being
generated each time a link was validated. On my system, validating each link
took around 100ms with a graph of around 2000 links. With this change it appears
that validating each link takes less than 1ms.
@HerbCaudill
Copy link
Member

Hi, Lucas - thanks for all these PRs! Sorry for not responding earlier - I was afk for the past month or so. I'll go through these now.

I'd love to see your benchmarks and maybe add them to the repo - would that code be easy for you to pull up?

@leblowl
Copy link
Contributor Author

leblowl commented Sep 12, 2024

You're welcome. I've been testing end-to-end by adding console.log statements in the code and then running a scenario. Would that test code setup be something you are interested in? I could probably come up with some unit tests as well.

@HerbCaudill
Copy link
Member

vitest has built-in benchmarking support using tinybench - could be cool to simulate some of these scenarios within the test suite as a way of documenting these performance improvements, and also providing a baseline for further optimization work.

@leblowl
Copy link
Contributor Author

leblowl commented Sep 12, 2024

Sure, I can see what I can do

@leblowl
Copy link
Contributor Author

leblowl commented Sep 12, 2024

I just added a benchmark. How does that look to you?

Before PR:

 ✓ packages/auth/src/test/auth.benchmark.ts (1) 47002ms
   ✓ auth (1) 47000ms
     name                      hz       min       max      mean       p75       p99      p995      p999     rme  samples
   · a new member joining  0.3610  2,711.03  2,829.56  2,770.39  2,814.50  2,829.56  2,829.56  2,829.56  ±1.05%       10

After PR:

 ✓ packages/auth/src/test/auth.benchmark.ts (1) 4631ms
   ✓ auth (1) 4629ms
     name                      hz     min     max    mean     p75     p99    p995    p999      rme  samples
   · a new member joining  3.6959  236.43  367.57  270.57  282.42  367.57  367.57  367.57  ±10.20%       10

@leblowl
Copy link
Contributor Author

leblowl commented Sep 13, 2024

It looks like some tests are failing due to the mutable nature of the graph. For example, in this test:

rootLink.body.prev = graph.head
expect(validate(graph)).not.toBeValid()

We modify the root entry, but the entry's hash field doesn't change. So given this, the memoization changes in this PR won't work.

One option is to remove memoization on validation. It looks like validation is only used when creating the Team and then when receiving a message:

const validation = validate(mergedGraph)

And upon receiving a message, if the graph changes at all, the original memoization approach isn't going to help because hash('memoize', graph) will change:

return `${hash('memoize', link)}:${hash('memoize', graph)}`

Another option is to keep the original memoization, but compute the hash of the graph only once before we call the individual validators:

const result = validator(currentLink, graph)

So that we are not computing the hash of the graph when validating each link. In this scenario we would still need to compute the hash of each link. But if the graph ever changes because of a new message, from what I can tell, the memoization will not help and will increase memory usage.

If you have any thoughts on the direction you'd like to go, I'm happy to code something up.

@HerbCaudill
Copy link
Member

Yeah - I've done some experiments and I think we should memoize the validate function rather than each of the individual validators. This gives us performance comparable to what you were getting and gets the tests passing again. I'll push some changes for you to review.

@HerbCaudill
Copy link
Member

can you set this PR to allow changes by maintainers?

@leblowl
Copy link
Contributor Author

leblowl commented Sep 13, 2024

Sounds good. Unfortunately, since this PR is using the TryQuiet/auth repo, it doesn't look like I have the option to allow changes by maintainers. I opened another PR from my personal repo that should allow edits: #129

@leblowl leblowl closed this Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants