Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize and omit duplicate pattern matches #66

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ViRb3
Copy link
Contributor

@ViRb3 ViRb3 commented Aug 25, 2024

Continuation of #64, this time with an included test to showcase the problem and why the changes are necessary.

  • Optimize pattern sub-matches

With the current implementation of needle search + "truncate right" to handle sub-matches, we end up re-scanning the same regions multiple times. In some cases, this is negligible, in others, it's really bad. There's probably a better way to handle this, but to fix the most basic cases, we now cache each region (start + end address), and skip regex matching if the exact same address was processed before.

The included test, without my changes, returns:

[[0 4] [1 4] [4 7] [7 9] [2 4] [4 7] [7 9] [4 7] [7 9] [5 7] [7 9] [7 9] [4 7] [5 7] [7 9] [7 9] [7 9] [1 4] [2 4] [4 7] [7 9] [4 7] [7 9] [5 7] [7 9] [7 9] [4 7] [5 7] [7 9] [7 9] [7 9] [2 4] [4 7] [7 9] [5 7] [7 9] [7 9] [4 7] [5 7] [7 9] [7 9] [7 9] [4 7] [5 7] [7 9] [7 9] [7 9] [5 7] [7 9] [7 9] [7 9]]

Meanwhile, the expected result, returned after my changes, is:

[[0 4] [1 4] [2 4] [4 7] [5 7] [7 9]]

  • Return end index

Changes the matching function's signature to also return end indexes. This is used for unit tests but would also be useful for users in general, as there is otherwise no way to get the end index with variable length patterns.

  • Deduplicate and sort results

This is workaround for the 1st issue above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant