Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator MUST NOT accept identical files under different extensions #1107

Open
arnodelorme opened this issue Nov 5, 2020 · 4 comments · May be fixed by #2071
Open

Validator MUST NOT accept identical files under different extensions #1107

arnodelorme opened this issue Nov 5, 2020 · 4 comments · May be fixed by #2071

Comments

@arnodelorme
Copy link

This BIDS dataset contains both .edf and .bdf file (which are very small)

https://openneuro.org/datasets/ds002034/versions/1.0.1

sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.edf
sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.bdf

I believe it should not have passed the validator since there are 2 types of binary files and the BDF file is obviously corrupted.

@sappelhoff
Copy link
Member

Thanks for the report @arnodelorme, it seems like you're going through a lot of datasets these days :-)

I agree that the validator should catch these cases. A given EEG file such as sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.<ext> MUST NOT be present more than once through using different extensions <ext>.

@sappelhoff
Copy link
Member

sappelhoff commented Nov 5, 2020

This BIDS dataset contains both .edf and .bdf file (which are very small): https://openneuro.org/datasets/ds002034/versions/1.0.1

sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.edf
sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.bdf

I believe it should not have passed the validator since there are 2 types of binary files and the BDF file is obviously corrupted.

I haven't checked whether the BDF file is corrupted, but if it truly is, that raises another, already known, concern: We are not validating the contents of binary EEG files.

This problem is hard to solve, because we would need to implement data format readers in Javascript. So that the bids-validator can go into the files and check for their validity. Currently, this is already being done for NIfTI files (and only for NIfTI files).

I tried many months ago to implement a reader/validator for the BrainVision format using Javascript here: https://github.com/sappelhoff/brainvision-validator/ ... see also #475

However, I ran into problems integrating it with the bids-validator, because it runs both on the browser, and the CLI. --> and the "file access" API for the browser is significantly different and more complicated than accessing files from the CLI (or from programs written in Matlab or Python).

But I will open this post as a separate issue and we certainly should address it as soon as we have some resources available. (And with resources, I mean people who have expertise, energy, and time)

@sappelhoff sappelhoff changed the title Validator passed corrupted BIDS EEG Validator MUST NOT accept identical files under different extensions Nov 5, 2020
@sappelhoff
Copy link
Member

In this issue, let's track our progress to prevent users from storing the same data under different extensions.

This should be some rule that:

  • IF a file sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.<ext> is present
  • AND is from the list LIST_OF_ACCEPTED_DATA_FORMAT_EXTENSIONS
  • then there MUST NOT be any other file with the same name and an ext from that list

sounds difficult but possible to implement.

@arnodelorme
Copy link
Author

arnodelorme commented Nov 5, 2020 via email

@rwblair rwblair linked a pull request Aug 7, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants