Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix leftMostLongestMatch returning all overlapping matches #14

Closed
wants to merge 1 commit into from

Conversation

semvis123
Copy link
Contributor

@semvis123 semvis123 commented Jul 27, 2023

#13 Caused overlapping patterns to return multiple matches.
This is fixed by only using the position new calculation for matches that are being cancelled due to MatchOnlyWholeWords.

Edit: Just realized that this only accounts for the cases where the begin position is different. (so, this is not a complete fix)
Example case where it will still fail:

func TestOverlappingPatterns4(t *testing.T) {
	trieBuilder := NewAhoCorasickBuilder(Opts{
		MatchOnlyWholeWords: true,
		MatchKind:           LeftMostLongestMatch,
		DFA:                 false,
	})

	patterns := []string{"testing", "testing 123"}

	trie := trieBuilder.Build(patterns)
	result := trie.FindAll("testing 12345")
	if len(result) != 1 {
		t.Logf("%v", result)
		t.Error("Did not find match in string")
		t.FailNow()
	}
}

@semvis123 semvis123 marked this pull request as draft July 28, 2023 12:31
@semvis123
Copy link
Contributor Author

After some thoughts I think that this isn't the way to fix the issue. I suspect that the issue is that the leftmostFindAtNoStateImp overrides the lastmatch with an invalid match (one that is not a whole word for example). This makes it impossible for the iterator to get the match before the invalid match.

Any ideas on how to fix this properly @petar-dambovaliev ?

@peter7891
Copy link

After some thoughts I think that this isn't the way to fix the issue. I suspect that the issue is that the leftmostFindAtNoStateImp overrides the lastmatch with an invalid match (one that is not a whole word for example). This makes it impossible for the iterator to get the match before the invalid match.

Any ideas on how to fix this properly @petar-dambovaliev ?

I will take a look one of these weekends.
I need to remember what i did there because i haven't touched this codebase since i wrote it 2 years ago.

@petar-dambovaliev
Copy link
Owner

petar-dambovaliev commented Apr 11, 2024

@semvis123 it seems, the MatchOnlyWholeWords option is incompatible with the way LeftMostLongestMatch works. I'll probably need to disable them being used together or make it clear how it works.
MatchOnlyWholeWords is not part of the matching algorithm, it is built on top and things are filtered after.
I am tempted to remove MatchOnlyWholeWords

@petar-dambovaliev
Copy link
Owner

I've added some docs explaining this. I'll need to think about a way forward.
Thanks for bringing this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants