Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add text instructions on download/access #179

Open
yarikoptic opened this issue Aug 7, 2024 · 10 comments
Open

Add text instructions on download/access #179

yarikoptic opened this issue Aug 7, 2024 · 10 comments
Labels
ui:web Relating to the web interface

Comments

@yarikoptic
Copy link
Member

prompted by @jwodder in dandi/dandi-archive#1993 (comment) it would be a nice UX , similarly to how we have on https://datasets.datalad.org/ informing user about datalad install instructions, here we could provide wget invocation to download entire zarr, or otherwise specific dandiset or its folder. We also have already

which similarly suggests integration with external services to instruct users on how to interact with particular files or zarrs.

@jwodder
Copy link
Member

jwodder commented Aug 7, 2024

@yarikoptic Problem: wget's "recursive" mode is limited to a maximum depth of 5 directories by default. Possible ways to address this are:

  • Include --level=inf in the displayed wget commands to disable the maximum depth

    • This seems like it could have the potential to go horribly wrong, especially if a user somehow omits the --no-parent option, which I think would result in wget trying to download everything listed on dandidav.
  • Determine the maximum depth of the hierarchy the message is displayed for and use that number as the --level value

    • I do not believe there is an efficient way to do this.
  • Don't include a --level option in the displayed wget commands

    • This would result in some (many?) listed wget commands not fetching everything.
  • Pick a relatively large fixed depth (10? 20?) and use that as the --level value in all displayed wget commands

    • There would doubtless be some hierarchies deeper than the chosen limit that therefore wouldn't be completely downloaded by the displayed commands.
    • Users are likely to wonder where the level value is coming from.

@jwodder
Copy link
Member

jwodder commented Aug 7, 2024

@yarikoptic Further problems:

  • Because the actual files are on different domains, wget downloads them under different directory hierarchies, and there doesn't seem to be an option to place them "together".

  • When downloading a Dandiset version or folder therein, asset metadata also gets downloaded (because there's a link to such metadata in the web view), and the --reject "index.html*" option needed to not save directory listings also results in the metadata being deleted after it's downloaded, leaving behind a tree of empty directories. There may be a way to prevent this with the --exclude-directories option, but I can't get it to work for this.

At the moment, my best wget command is:

wget \
    --recursive \
    --span-hosts \
    --domains=webdav.dandiarchive.org,api.dandiarchive.org \
    --no-parent \
    --content-disposition \
    --reject "index.html*" \
    https://webdav.dandiarchive.org/dandisets/000027/releases/0.210831.2033/

which downloads:

./
├── api.dandiarchive.org/
│   └── api/
│       └── dandisets/
│           └── 000027/
│               └── versions/
│                   └── 0.210831.2033/
│                       └── assets/
│                           └── 1c095f5f-d1e2-45db-b807-fdcfea08c6de/
├── dandiarchive.s3.amazonaws.com/
│   └── blobs/
│       └── 2db/
│           └── af0/
│               └── sub-RAT123.nwb
└── webdav.dandiarchive.org/
    └── dandisets/
        └── 000027/
            └── releases/
                └── 0.210831.2033/
                    ├── dandiset.yaml
                    └── sub-RAT123/

@yarikoptic
Copy link
Member Author

@yarikoptic Problem: wget's "recursive" mode is limited to a maximum depth of 5 directories by default.

I had no idea! I think we are doomed to add/use --level=inf since we never really cared about recording/reflecting anywhere the depth of the zarr* . Indeed --no-parent would be mandatory and thus better be "near" in the line. We could also add --quota with e.g. 101% of zarr size but not sure if good idea and either adds any level of protection really.

* in a hindside might have suggested to be included in checksum but likely would be "too much" . Do you think it would be useful to discuss this aspect?

Actually -- we are in control of manifest generation, we can extract/include that info in the manifest!

@jwodder
Copy link
Member

jwodder commented Aug 7, 2024

@yarikoptic

we are in control of manifest generation, we can extract/include that info in the manifest!

I got the impression you wanted this for Dandisets and folders within them as well, not just Zarrs.

@yarikoptic
Copy link
Member Author

@yarikoptic

we are in control of manifest generation, we can extract/include that info in the manifest!

I got the impression you wanted this for Dandisets and folders within them as well, not just Zarrs.

right, I wanted indeed... for those we are indeed doomed to just hope for the --no-parent to work out and wget not crawling away from the original hierarchy.

@jwodder
Copy link
Member

jwodder commented Aug 8, 2024

@yarikoptic I did manage to figure out an rclone command to download a folder nicely:

rclone copy \
    --webdav-url https://webdav.dandiarchive.org \
    :webdav:dandisets/000027/releases/0.210831.2033/ \
    0.210831.2033/

Should we use this instead of wget? Are there any other download commands we should list or consider listing in addition or instead?

@jwodder
Copy link
Member

jwodder commented Aug 12, 2024

@yarikoptic Ping.

@yarikoptic
Copy link
Member Author

Depending on how we present it -- we might want may be both? e.g. if it could be multiple tabs (wget, rclone, dandi cli , and may be even python etc) -- then people could choose what they have/like etc. I didn't look if there is a simple HTML/CSS/JS way though to make that happen. WDYT?

@jwodder
Copy link
Member

jwodder commented Aug 22, 2024

@yarikoptic Worrying about how the data is presented is getting ahead of ourselves and ultimately not that important. I'm currently interested in what data should be presented.

@yarikoptic
Copy link
Member Author

Then let's present both -- ugly wget and neater webdav aware rclone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ui:web Relating to the web interface
Projects
None yet
Development

No branches or pull requests

2 participants