Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial Zarr Directory Updates in Dandi and LINC #1474

Open
aaronkanzer opened this issue Jul 18, 2024 · 3 comments
Open

Partial Zarr Directory Updates in Dandi and LINC #1474

aaronkanzer opened this issue Jul 18, 2024 · 3 comments
Labels

Comments

@aaronkanzer
Copy link
Member

aaronkanzer commented Jul 18, 2024

Cc @dstansby @kabilar @satra @yarikoptic @waxlamp @balbasty

In the LINC project, @dstansby encountered a scenario where an update was requested for a portion of a Zarr directory.
Currently, DANDI and LINC treat a Zarr directory as a single object tree, requiring the entire directory to be downloaded even for updates that only modify specific pieces.

Downloading the entire Zarr directory can be inefficient, especially for large datasets where only a small portion needs updating.

This issue's purpose is to capture the need for mechanism to allow for partial updates of Zarr directories within Dandi and LINC.

Analagous, @satra suggested the initial usage of zarrita to explore elements of sharding, with perhaps the LINC project as a place to test

@dstansby
Copy link

I think there are two separate, but related issues here (and solving 2. depends on solving 1. first):

  1. Updating a single file within a dandiset. It would be very useful to document the workflow for doing this. So far the best I have come up with is:
  • Download the dandiset: dandi download --download dandiset.yaml <dandiset-url>
  • Download the file I want to change
  • Manually re-create the directory structure that already exists in the remote dandiset (this is very tedious and feels like it should be automatic!)
  • Put the file in the right place in the directory structure.
  • Make changes
  • Upload the file
  1. The same as above, but for editing the metadata of a zarr directory. As @aaronkanzer says, it's treated as a single object tree so there's no obvious way to only download the metadata file and then re-upload it.

@yarikoptic
Copy link
Member

  1. non-zarr case: so it is possible but just inconvenient as for "Manually re-create the directory structure". I have created a dedicated issue

to boil down/implement desired convenience.

NB upon trying different URI schemas I found that there is a "workaround side-effect" if path is used as a glob (might not be generally applicable/desired), then we would get leading path too

❯ dandi download https://dandiarchive.org/dandisets/000027/versions/0.210831.2033/assets/\?glob\=sub-RAT123/sub-RAT123.nwb
PATH                      SIZE     DONE    DONE% CHECKSUM STATUS          MESSAGE
sub-RAT123/sub-RAT123.nwb 18.8 kB  18.8 kB  100%    ok    done                   
Summary:                  18.8 kB  18.8 kB                1 done                 
                                   100.00%      
  • 1.a. There is also a datalad way, which to a degree most convenient since it provides you locally full dandiset filetree hierarchy and thus flexibility/convenience.
❯ datalad clone https://github.com/dandisets/000027
[INFO   ] Remote origin not usable by git-annex; setting annex-ignore                                                                                                               
[INFO   ] https://github.com/dandisets/000027/config download failed: Not Found                                                                                                     
[INFO   ] access to 2 dataset siblings dandi-dandisets-dropbox, dandiapi not auto-enabled, enable with:
| 		datalad siblings -d "/tmp/000027" enable -s SIBLING 
install(ok): /tmp/000027 (dataset)
❯ cd 000027
❯ datalad get sub-RAT123/sub-RAT123.nwb
get(ok): sub-RAT123/sub-RAT123.nwb (file) [from web...]                                                                                                                             
❯ ls -lL sub-RAT123/sub-RAT123.nwb
-r--r--r-- 1 yoh yoh 18792 Jul 18 07:51 sub-RAT123/sub-RAT123.nwb
# now edit / dandi upload
  1. zarr. In general it is possible, but very inconvenient as would require download of a full zarr first.

For an "ultimate" solution, we need to add some basic zarr navigator, related

to make it easier for a user to get desired "full" URL to specific zarr component.

As for update of metadata only it would be quite tricky AFAIK to implement correctly but indeed editing metadata is a valid use case. ATM it is 'possible' only via full zarr download, and I believe we would avoid reuploading any file which was not modified (@jwodder might correct me if I am wrong).
As for partial download and upload of zarr -- I think we would also need support for that in the client:

@kabilar
Copy link
Member

kabilar commented Jul 31, 2024

Thanks team. Moving this issue to the DANDI Client repo, as it doesn't seem like we would need changes to the web app or REST API.

@kabilar kabilar transferred this issue from dandi/dandi-archive Jul 31, 2024
@yarikoptic yarikoptic added the zarr label Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants