Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FTS5 wildcard query does not work as Intended #2373

Open
kj-9 opened this issue Jul 23, 2024 · 0 comments
Open

FTS5 wildcard query does not work as Intended #2373

kj-9 opened this issue Jul 23, 2024 · 0 comments

Comments

@kj-9
Copy link

kj-9 commented Jul 23, 2024

I attempted to use the wildcard query method introduced in "The table page and table view API". Due to the difference in wildcard query syntax between FTS4 and FTS5, it seems that the current escape_fts function is unable to perform wildcard queries for FTS5.

Issue

As FARA_All_ShortForms_fts seemed to be using FTS5, I tried searching for laur* (which should match records containing "laura").

The following URL, which uses where FARA_All_ShortForms_fts match escape_fts(:search)) for searching, does not yield any results:
fara.datasettes.com/fara/FARA_All_ShortForms_fts?_search=laur%2A&_sort=rowid

However, when slightly modifying the query to avoid using escape_fts, it successfully returns matches. here is link.

where I editted:

-where FARA_All_ShortForms_fts match escape_fts(:search)
+where FARA_All_ShortForms_fts match "laur*"

Cause

escape_fts("laur*") embeds double quotes at the beginning and end. As a result, when using FTS5, it seems that the asterisk is removed, preventing it from becoming a wildcard query.

In FTS5, asterisks within double quotes are passed to the tokenizer and may not be recognized as wildcard expressions:

...
... MATCH '"one two thr*"'      **-- May not work as expected!**

The final query in the block above may not work as expected. Because the "*" character is inside the double-quotes, it will be passed to the tokenizer, which will likely discard it (or perhaps, depending on the specific tokenizer in use, include it as part of the final token) instead of recognizing it as a special FTS character.

FTS5 documentation

On the other hand, the FTS4 documentation does not mention such specifications for wildcard queries.

In fact, even searching for "laur" with the wildcard removed does not yield any results.

Expected Behavior

It would be beneficial if wildcard queries could function consistently when accessing this endpoint, regardless of whether FTS4 or FTS5 is in use:
fara.datasettes.com/fara/FARA_All_ShortForms_fts?_search=laur%2A&_sort=rowid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant