Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A discussion on needed /paths properties and metadata #1851

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

yarikoptic
Copy link
Member

Copy link
Member Author

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwodder please review / comment on / accept where my suggestion about state corresponds to what we already have or I suggested in #1837 (comment)

doc/design/paths-endpoint-webdav.md Outdated Show resolved Hide resolved
doc/design/paths-endpoint-webdav.md Outdated Show resolved Hide resolved
doc/design/paths-endpoint-webdav.md Outdated Show resolved Hide resolved
doc/design/paths-endpoint-webdav.md Outdated Show resolved Hide resolved
doc/design/paths-endpoint-webdav.md Outdated Show resolved Hide resolved
doc/design/paths-endpoint-webdav.md Outdated Show resolved Hide resolved
doc/design/paths-endpoint-webdav.md Outdated Show resolved Hide resolved
* Asset metadata used by dandidav:
* `encodingFormat` (blobs only)
* `contentUrl` (API download URL for blobs, S3 URL for Zarrs)
* `digest["dandi:dandi-etag"]` (blobs only)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what this one for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Etag (reported in PROPFIND responses)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in principle could be any other checksum I guess , right?

note: I thought that besides ETag functionality a commonly known checksum could then be used by receiving end to validate the download if that would be the target purpose for such a listing.

for versioned zarrs we would need "dandi:dandi-zarr-checksum" to use in conjunction with the zarr_id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the dandi:dandi-etag is the same as the ETag header reported by S3 when downloading a blob asset. In contrast, there's no single file that has a dandi:dandi-zarr-checksum as an ETag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that we can use dandi:dandi-zarr-checksum as a value for ETag on zarr collections (folders).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it makes sense to apply etags to collections, as GETing a collection won't give you a response with that etag. (Cf. dandi/dandidav#26)


* Asset properties used by dandidav:
* `asset_id` (PRESENT)
* `blob_id`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what for do we need blob_id if we have contentUrl?

Copy link
Member

@jwodder jwodder Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contentUrl returned in .../assets/paths/ responses is a plain S3 URL, and serving it to the user in a web browser would result in the file being downloaded and named with the blob ID. In order for the downloaded file to be named with the asset's filename instead, we need a signed S3 URL, which is obtained via the API download URL in the contentUrl field of the metadata.

EDIT: Sorry, I misinterpreted the question. For blob ID, the code just checks whether blob_id or zarr_id is non-None in order to determine whether the asset is a blob or a Zarr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so we the actually need result_type (Folder, AssetBlob, AssetZarr) as I suggested in #1837 (comment) .

* Asset properties used by dandidav:
* `asset_id` (PRESENT)
* `blob_id`
* `zarr_id`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess zarr_id is needed for traversal etc? or could contentUrl version be used? (probably we do not want to rely on parsing it).

So it feels that we might ask for asset record to include blob_id (although not sure what for yet exactly) and zarr_id, hence

Suggested change
* `zarr_id`
* `zarr_id` (DESIRED)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dandidav does parse S3 contentUrls, as that's necessary in order to discover the name of the S3 bucket without hardcoding it or requiring it to be passed on the command line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in principle we can just parse out bucket, zarr_id and blob_id from contenturl, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I'd rather dandidav not have to make too many assumptions about how our S3 URLs are structured.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. We would only need to be able to construct zarr "http s3" URLs from manifests so it would need to know that , right?
For blobs -- contentUrl should be good enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that asset metadata lists two contentUrls, an unsigned S3 URL and an API download URL of the form https://api.dandiarchive.org/api/assets/{asset_id}/download/.

  • For Zarr assets, dandidav needs the S3 URL in order to know how to query the Zarr's entries from S3.

  • However, for blob assets, dandidav currently uses the API download URL, as it provides a better UX. Specifically, if dandidav were to redirect GET requests for a blob to the blob's unsigned S3 URL, then the browser would save the response to a file named after the last component of the S3 URL, which is (always?) the blob ID. In order to get the downloaded files to be named after the filename portion of the asset's path instead, a signed S3 URL with a Content-Disposition has to be used, and that is acquired by redirecting to the API download URL, which in turn redirects to such a signed S3 URL.

Co-authored-by: John T. Wodder II <[email protected]>
@waxlamp waxlamp added the design-doc Involves creating or discussing a design document label Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-doc Involves creating or discussing a design document
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants