Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement "Match Case" and "Whole Words" search #110

Open
xybei opened this issue Aug 28, 2024 · 2 comments
Open

Implement "Match Case" and "Whole Words" search #110

xybei opened this issue Aug 28, 2024 · 2 comments

Comments

@xybei
Copy link
Contributor

xybei commented Aug 28, 2024

The search function can not match line-wrapped words (as pdf.js does in the figure below).
It also does not support "Match Case" or "Whole Words".
Hope it can be improved, thanks!

test.pdf

image

@jamie-lemon
Copy link
Collaborator

Agree there is no case sensitivity for the search at present. However I do believe MuPDF.js detects line-wrapped words, If I use your file with:
let results = page.search("Hello world")

It delivers 3 results as follows:

[
    [
        [
            72,
            75.22499084472656,
            149.67677307128906,
            75.22499084472656,
            72,
            91.24498748779297,
            149.67677307128906,
            91.24498748779297
        ]
    ],
    [
        [
            287.1199951171875,
            75.22499084472656,
            360.5367431640625,
            75.22499084472656,
            287.1199951171875,
            91.24498748779297,
            360.5367431640625,
            91.24498748779297
        ]
    ],
    [
        [
            505.17999267578125,
            75.22499084472656,
            540.9913940429688,
            75.22499084472656,
            505.17999267578125,
            91.24498748779297,
            540.9913940429688,
            91.24498748779297,
            72,
            96.28498840332031,
            109.5767593383789,
            96.28498840332031,
            72,
            112.30498504638672,
            109.5767593383789,
            112.30498504638672
        ]
    ]
]

These are QuadPoints which represent the areas with the text found ( see: https://mupdfjs.readthedocs.io/en/latest/how-to-guide/node/document/index.html#searching-a-document ).

I also tested here: https://casper.mupdf.com/wasm/demo/ and uploaded your test.pdf file and performed a search for "hello world" the UI then highlighted these areas on the document:
Screenshot 2024-08-28 at 13 09 34

So I think the search method does find of the three instances that you expect with the correct bounding box data with the QuadPoints!

@xybei
Copy link
Contributor Author

xybei commented Aug 29, 2024

Sorry, I made a mistake. MuPDF.js does support searching for line-wrapped words.

@xybei xybei changed the title Improved search function Improve search function Aug 30, 2024
@xybei xybei changed the title Improve search function Implement "Match Case" and "Whole Words" search Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants