Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set dynamic memory limit in pdfalto_server #1038

Merged
merged 2 commits into from
Aug 8, 2023
Merged

Set dynamic memory limit in pdfalto_server #1038

merged 2 commits into from
Aug 8, 2023

Conversation

kermitt2
Copy link
Owner

@kermitt2 kermitt2 commented Aug 6, 2023

This is a fix for #1036

The memory limit for the pdfalto subprocess was only set for batch mode, not for server mode.
This PR pass the memory limit defined in the Grobid configuration file to the pdfalto_server script, which set the ulimit before creating the pdfalto subprocess.

Tested successfully with the java server.

todo: to be tested with Docker.

@coveralls
Copy link

coveralls commented Aug 6, 2023

Coverage Status

coverage: 39.98% (+0.003%) from 39.977% when pulling b6675db on fix-1036 into bbf1e85 on master.

@kermitt2
Copy link
Owner Author

kermitt2 commented Aug 6, 2023

Tested successfully with docker too. The memory limit is used by the docker service and a pdfalto subprocess for a large PDF is killed as expected.

@kermitt2
Copy link
Owner Author

kermitt2 commented Aug 6, 2023

@lfoppiano I made the change to the pdfalto_server script for macos too, but I can't test it. Could you check maybe?

@lfoppiano
Copy link
Collaborator

lfoppiano commented Aug 8, 2023

I tested on M1, it seems to work by setting 6M as maximum memory for pdfalto.

Aug 08 09:00:59 falcon docker[26816]: ERROR [2023-08-08 00:00:59,218] org.grobid.core.process.ProcessPdfToXml: pdfalto process finished with error code: 139. [/opt/grobid/grobid-home/pdfalto/lin-64/pdfalto_server, -fullFontName, -noLineNumbers, -noImage, -annotation, -filesLimit, 2000, /opt/grobid/grobid-home/tmp/origin12466301919498765093.pdf, /opt/grobid/grobid-home/tmp/ahxv9U7ABP.lxml, --timeout, 60, --ulimit, 6144]
Aug 08 09:00:59 falcon docker[26816]: ERROR [2023-08-08 00:00:59,218] org.grobid.core.process.ProcessPdfToXml: pdfalto return message:
Aug 08 09:00:59 falcon docker[26816]: ERROR [2023-08-08 00:00:59,219] org.grobid.service.process.GrobidRestProcessFiles: An unexpected exception occurs.
Aug 08 09:00:59 falcon docker[26816]: ! org.grobid.core.exceptions.GrobidException: [BAD_INPUT_DATA] PDF to XML conversion failed with error code: 139
[...]

Does this correspond to the expected behaviour?

@kermitt2
Copy link
Owner Author

kermitt2 commented Aug 8, 2023

Does this correspond to the expected behaviour?

Thank you ! yes this is the expected failing exception when subprocess is OOM and killed

@kermitt2 kermitt2 merged commit 4ecce3b into master Aug 8, 2023
1 check passed
@lfoppiano
Copy link
Collaborator

I just realised that I did not really test on the M1 macos but on Linux 😅
On the M1 the script was not modified so no PDF could be processed. I fixed it in 2c720dd.

However, the ulimit is commented because it fails to set the limit:

WARN  [2023-08-15 01:47:02,132] org.grobid.core.process.ProcessPdfToXml: pdfalto stderr: /Users/lfoppiano/development/projects/grobid/grobid-home/pdfalto/mac_arm-64/pdfalto_server: line 38: ulimit: virtual memory: cannot modify limit: Invalid argument

@kermitt2
Copy link
Owner Author

Ha yes I modified only the old mac ones grobid-home/pdfalto/mac-64/pdfalto_server.

Maybe changing the ulimit on latest macos requires sudo?

@lfoppiano
Copy link
Collaborator

I need to investigate more, however, for mac on intel works without issue. Just tested.

@lfoppiano lfoppiano added this to the 0.8.0 milestone Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants