Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change validation of files (assets) to align with validation on the server #1448

Open
yarikoptic opened this issue Jun 3, 2024 · 0 comments

Comments

@yarikoptic
Copy link
Member

yarikoptic commented Jun 3, 2024

The underlying trigger for the issue is

where assets started to fail validation, preventing publishing of dandisets, only when already on the archive. It boiled down to the fails in validation against jsonschema export of pydantic models which is done to guarantee that meditor validation could be done online (thus in JS), and the difference in treating date-time format by jsonschema and pydantic (datetime).

Looking at the code, in dandi-archive we do

from dandischema.metadata import aggregate_assets_summary, validate
...
            metadata = asset.published_metadata()
            validate(metadata, schema_key='PublishedAsset', json_validation=True)

while in the dandi-cli we do not use dandischema..validate construct and rather

        try:
            asset = self.get_metadata(digest=self._DUMMY_DIGEST)
            BareAsset(**asset.model_dump())
        except ValidationError as e:
            if devel_debug:
                raise
            return _pydantic_errors_to_validation_results(
                e, self.filepath, scope=Scope.FILE
            )

so in effect only validating against Pydantic (and for a BareAsset).

edit 1: moreover we do not even get to above from NWBAsset.get_validation_errors due to

        if schema_version is not None:
            errors.extend(
                super().get_validation_errors(
                    schema_version=schema_version, devel_debug=devel_debug
                )
            )
        else:
            # make sure that we have some basic metadata fields we require
            try:
                origin = ValidationOrigin(
                    name="nwbinspector",
                    version=str(_get_nwb_inspector_version()),
                )

                for error in inspect_nwbfile(
...

so we solely rely on nwbinspector inspection for NWB file but that one does not care about dandischema "directly". So we might want to change that one way or another...

We should harmonize, and I feel that here we should use dandischema.metadata.validate. I think it also aligns better with our desire to separate models from intended "validity" assumptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant