Large projects might cause "Argument list too long" error on pre-render scripts #10828

cscheid · 2024-09-17T23:00:42Z

Discussed in #10823

^{Originally posted by Analect September 17, 2024}

Description

I'm hitting this problem with a private gitlab-hosted repo, containing circa 1500 documents that get rendered with quarto. I'm not able to share this set-up, however the approach I'm taking is similar to what is shown publically here, where I'm scraping some document meta-data and pushing this to files in a data folder which are then published as resources in the rendered docs. I'm experimenting with alternative ways to work with the document meta-data and wanted to leverage the pre-render capability within quarto.

project:
  type: website
  resources:
    - "data/**/*"
    - "package/**/*"
    - "coi-serviceworker.min.js"
  pre-render:
    - scripts/metadata-scrape.py
    - scripts/load_data_kuzu.py
  render:
    - "*.qmd"
    ...

The gitlab-runner that is doing the rendering is based on this docker image, a debian 12 OS, with quarto 1.5.57 on-board. If I comment-out the pre-render scripts, then things run fine. However, when I enable the scripts/metadata-scrape.py on the gitlab repo (similar in pattern to this scripts/metadata-scrape.py, only longer to handle custom meta-data), I'm getting this Argument list too long error. Can you shed any light on why this might be happening when quarto is handling pre-render scripts.

Per this, I tried to set a longer command-line buffer with ulimit -s 65536 on the VM running this dockerized gitlab-runner, but also included in the .gitlab-ci.yml so that it gets applied within the runner itself (see image above), but none of these have helped.

The text was updated successfully, but these errors were encountered:

cscheid · 2024-09-17T23:25:39Z

The only way I can see this happening is that we're passing the list of input files as the QUARTO_PROJECT_INPUT_FILES env variable, and that is triggering the error (even though the error talks about the argument list being too long instead of an env variable being too long).

I'm not sure how to fix this in a backwards-compatible way. What we need to do for large files is to pass the path of a temporary file that contains the list of input files; but if we do that, we'll break the very many existing pre-render scripts that work just fine.

Analect · 2024-09-18T12:35:04Z

Found these on Gitlab. Perhaps it's relevant.

I'm not sure how to fix this in a backwards-compatible way.

One suggested fix in the second link above is to use a file-type variable. Maybe creating both a variable-type variable and file-type variable would allow users facing the Argument list too long problem to revert to using the file-type variable, somehow.

Use a .gitlab-ci.yml variable as a file type variable

Analect · 2024-09-18T13:12:24Z

Also, for avoidance of doubt, if I disable the pre-render scripts section of _quarto.yml, then the render proceeds, per below (file names and folders are fictitious), for now, but I do feel like I'm at the upper-end of file count handled, since I'm sometimes bumping up against this problem which I know you are addressing separately.

$ quarto render --output-dir public
WARN: The file /xxx/xxx/xxx.qmd contains a theme property which is being ignored. Website projects do not support per document themes since all pages within a website share the website's theme.
WARN: The file /xxx/xxx/xxx.qmd contains a theme property which is being ignored. Website projects do not support per document themes since all pages within a website share the website's theme.
....
[   1/1516] docs/folder1/folder2/folder3/folder4/01_file.ipynb
[   2/1516] docs/folder1/folder2/folder3/folder4/02_file.ipynb
[   3/1516] docs/folder1/folder2/folder3/folder4/03_file.ipynb
[   4/1516] docs/folder1/folder2/folder3/folder4/04_file.ipynb
...

I'm not sure what form QUARTO_PROJECT_INPUT_FILES takes and how this is influenced by whether pre-render scripts are enabled or not. Just for further colour, one of my scripts generates an array of file-paths to process meta-data for documents .. that looks something like what's depicted below. Not sure if QUARTO_PROJECT_INPUT_FILES contains a richer set of data beyond file paths or not.

[docs/folder1/folder2/folder3/folder4/01_file.ipynb, docs/folder1/folder2/folder3/folder4/02_file.ipynb, docs/folder1/folder2/folder3/folder4/03_file.ipynb, docs/folder1/folder2/folder3/folder4/04_file.ipynb ..., docs/folder1/folder2/folder3/folder4/1516_file.ipynb]

That array size, in my case, is 12728 (see getsizeof below) with length of 1564.

print(sys.getsizeof(file_paths)) 
print(len(file_paths))

cscheid added bug Something isn't working project-scripts labels Sep 17, 2024

cscheid added this to the v1.6 milestone Sep 17, 2024

cscheid self-assigned this Sep 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large projects might cause "Argument list too long" error on pre-render scripts #10828

Large projects might cause "Argument list too long" error on pre-render scripts #10828

cscheid commented Sep 17, 2024

Description

cscheid commented Sep 17, 2024

Analect commented Sep 18, 2024 •

edited

Loading

Analect commented Sep 18, 2024

Large projects might cause "Argument list too long" error on pre-render scripts #10828

Large projects might cause "Argument list too long" error on pre-render scripts #10828

Comments

cscheid commented Sep 17, 2024

Discussed in #10823

Description

cscheid commented Sep 17, 2024

Analect commented Sep 18, 2024 • edited Loading

Analect commented Sep 18, 2024

Analect commented Sep 18, 2024 •

edited

Loading