Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add range-based partial fetch for h5wasm provider #1264

Open
bmaranville opened this issue Nov 7, 2022 · 4 comments
Open

Add range-based partial fetch for h5wasm provider #1264

bmaranville opened this issue Nov 7, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@bmaranville
Copy link
Contributor

Is your feature request related to a problem?

Reading very large files with the h5wasm provider is not possible, for several reasons:

  1. maximum size of ArrayBuffer and also "file" in emscripten is often < about 2GB
  2. maximum size of memory in the browser is a limitiation for in-memory file representation
  3. unreasonable demands on network/infrastructure to download entire huge files.

Requested solution or feature

For web file servers with HDF5/NeXus files that support range requests, on-demand loading could enable access to very large NeXus files that would be infeasible to read as a whole, using emscripten's lazyFile functionality

Alternatives you've considered

HSDS and grove providers already allow this type of random access to parts of a NeXus file.

Additional context

Because sync file access is required, this might require refactoring the h5wasm provider to operate from a worker.
Note that it could potentially be refactored to a service worker that uses the same API as a grove server, if that simplifies things.

@bmaranville
Copy link
Contributor Author

Note that for local files, the emscripten WORKERFS interface could be used to get random access to huge local files from a worker without copying the whole file into memory, which is another benefit of moving the provider to a worker.

@axelboc
Copy link
Contributor

axelboc commented Apr 21, 2023

Note that it could potentially be refactored to a service worker that uses the same API as a grove server, if that simplifies things.

This would be brilliant! However, it seems that synchronous XHR requests inside Service Workers are currently not supported in Chrome and Safari—only in Firefox.

@imathews
Copy link

Does the recent work on the H5wasmLocalFileProvider #1604 perhaps provide a pathway for something similar to be implemented with Range request headers in URLs?

This would be of huge benefit to our use case, where multi-gigabyte files are stored remotely and loading the entire file is both memory and network prohibitive.

@axelboc
Copy link
Contributor

axelboc commented Jun 28, 2024

It's definitely going to help. @bmaranville also developed a lazyFileLRU demo to show feasibility. However, the amount of code required and its complexity has me worried a bit; it's not going to be trivial to make a production service out of this. I need to look into it more to better understand what's going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants