Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Juranic/visium dirschemas edits #1357

Merged
merged 6 commits into from
Aug 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
- Update Segmentation Masks directory schema
- Update Visium with probes directory schema
- Update Visium no probes directory schema
- Update Visium with probes directory schema
- Update Visium no probes directory schema

## v0.0.23
- Add token to validation_utils.get_assaytype_data, replace URL string concatenation with urllib
Expand Down
24 changes: 23 additions & 1 deletion docs/visium-no-probes/current/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,29 @@ REQUIRED - For this assay, you must also prepare and submit two additional metad
<br>

## Directory schemas
<summary><b>Version 3.0 (use this one)</b></summary>
<summary><b>Version 3.1 (use this one)</b></summary>

| pattern | required? | description |
| --- | --- | --- |
| <code>extras\/.*</code> | ✓ | Folder for general lab-specific files related to the dataset |
| <code>extras\/microscope_hardware\.json</code> | ✓ | **[QA/QC]** A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk <[email protected]> if help is required in generating this document. |
| <code>extras\/microscope_settings\.json</code> | | **[QA/QC]** A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk <[email protected]> if help is required in generating this document. |
| <code>raw\/.*</code> | ✓ | All raw data files for the experiment. |
| <code>raw\/[^\/]+\.gpr</code> | ✓ | This is a 10X Genomics layout file that's generated by 10X and individualized for each Visium slide. This is a text file and can be generated using this 10X web form <https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/using/slidefile-download> along with the unique 10X Visium slide ID. |
| <code>raw\/fastq\/.*</code> | ✓ | Raw sequencing files for the experiment |
| <code>raw\/fastq\/RNA\/.*</code> | ✓ | Directory containing fastq files pertaining to RNAseq sequencing. |
| <code>raw\/fastq\/RNA\/[^\/]+_R[^\/]+\.fastq\.gz</code> | ✓ | This is a GZip'd version of the forward and reverse fastq files from RNAseq sequencing (R1 and R2). |
| <code>raw\/images\/.*</code> | ✓ | Directory containing raw image files. This directory should include at least one raw file. |
| <code>raw\/images\/[^\/]+\.(?:xml&#124;scn&#124;vsi&#124;svs&#124;czi&#124;tiff)</code> | ✓ | Raw microscope file for the experiment |
| <code>lab_processed\/.*</code> | ✓ | Experiment files that were processed by the lab generating the data. |
| <code>lab_processed\/alignment\.json</code> | ✓ | JSON file for the manual tissue alignment created using Loupe browser and used as input to Space Ranger. |
| <code>lab_processed\/images\/.*</code> | ✓ | Processed image files |
| <code>lab_processed\/images\/[^\/]+\.ome\.tiff</code> (example: <code>lab_processed/images/HBM892.MDXS.293.ome.tiff</code>) | ✓ | OME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. For Visium this stitched file should only include the single capture area relevant to the current dataset. For GeoMx there will be one OME TIFF file per slide, with each slide including multiple AOIs. See the following link for the set of fields that are required in the OME TIFF file XML header. <https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0> |
| <code>lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv</code> | ✓ | This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed <https://docs.google.com/spreadsheets/d/1xEJSb0xn5C5fB3k62pj1CyHNybpt4-YtvUs5SUMS44o/edit#gid=0> |
| <code>lab_processed\/transformations\/.*</code> | | This directory contains transformation matrices that capture how each modality is aligned with the other and can be used to visualize overlays of multimodal data. This is needed to overlay images from the exact same tissue section (e.g., MALDI imaging mass spec, autofluorescence microscopy, MxIF, histological stains). In these cases data type may have different pixel sizes and slightly different orientations (i.e., one may be rotated relative to another). |
| <code>lab_processed\/transformations\/[^\/]+\.txt</code> | | Transformation matrices used to overlay images from the exact same tissue section (e.g., MALDI imaging mass spec, autofluorescence microscopy, MxIF, histological stains). |

<summary><b>Version 3.0</b></summary>

| pattern | required? | description |
| --- | --- | --- |
Expand Down
26 changes: 25 additions & 1 deletion docs/visium-with-probes/current/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,31 @@ REQUIRED - For this assay, you must also prepare and submit two additional metad
<br>

## Directory schemas
<summary><b>Version 3.0 (use this one)</b></summary>
<summary><b>Version 3.1 (use this one)</b></summary>

| pattern | required? | description |
| --- | --- | --- |
| <code>extras\/.*</code> | ✓ | Folder for general lab-specific files related to the dataset |
| <code>extras\/microscope_hardware\.json</code> | ✓ | **[QA/QC]** A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk <[email protected]> if help is required in generating this document. |
| <code>extras\/microscope_settings\.json</code> | | **[QA/QC]** A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk <[email protected]> if help is required in generating this document. |
| <code>raw\/.*</code> | ✓ | All raw data files for the experiment. |
| <code>raw\/[^\/]+\.gpr</code> | ✓ | This is a 10X Genomics layout file that's generated by 10X and individualized for each Visium slide. This is a text file and can be generated using this 10X web form <https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/using/slidefile-download> along with the unique 10X Visium slide ID. |
| <code>raw\/additional_panels_used\.csv</code> | | If multiple commercial probe panels were used, then the primary probe panel should be selected in the "oligo_probe_panel" metadata field. The additional panels must be included in this file. Each panel record should include:manufacturer, model/name, product code. |
| <code>raw\/custom_probe_set\.csv</code> | | This file should contain any custom probes used and must be included if the metadata field "is_custom_probes_used" is "Yes". The file should minimally include:target gene id, probe seq, probe id. The contents of this file are modeled after the 10x Genomics probe set file (see <https://support.10xgenomics.com/spatial-gene-expression-ffpe/probe-sets/probe-set-file-descriptions/probe-set-file-descriptions#probe_set_csv_file>). |
| <code>raw\/fastq\/.*</code> | ✓ | Raw sequencing files for the experiment |
| <code>raw\/fastq\/oligo\/</code> | ✓ | Directory containing fastq files pertaining to oligo sequencing. |
| <code>raw\/fastq\/oligo\/[^\/]+\.fastq\.gz</code> | ✓ | This is a gzip version of the fastq file. This file contains the cell barcode and unique molecular identifier (technical). |
| <code>raw\/images\/.*</code> | ✓ | Directory containing raw image files. This directory should include at least one raw file. |
| <code>raw\/images\/[^\/]+_tissue\.(?:tif&#124;tiff)</code> | | Raw microscope file for the experiment. For 10X Visium CytAssist, this would be the high resolution image produced. |
| <code>raw\/images\/[^\/]+_fiducial\.(?:tif&#124;tiff)</code> | ✓ | This is the low resolution image from the 10X CytAssist instrument that includes the fiduciary markings. |
| <code>raw\/images\/[^\/]+\.ndpi</code> | | Raw microscope file for the experiment |
| <code>lab_processed\/.*</code> | ✓ | Experiment files that were processed by the lab generating the data. |
| <code>lab_processed\/alignment\.json</code> | ✓ | JSON file for the manual tissue alignment created using Loupe browser and used as input to Space Ranger. |
| <code>lab_processed\/images\/.*</code> | ✓ | Processed image files |
| <code>lab_processed\/images\/[^\/]+\.ome\.tiff</code> (example: <code>lab_processed/images/HBM892.MDXS.293.ome.tiff</code>) | ✓ | OME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. For Visium this stitched file should only include the single capture area relevant to the current dataset. For GeoMx there will be one OME TIFF file per slide, with each slide including multiple AOIs. See the following link for the set of fields that are required in the OME TIFF file XML header. <https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0> |
| <code>lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv</code> | ✓ | This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed <https://docs.google.com/spreadsheets/d/1xEJSb0xn5C5fB3k62pj1CyHNybpt4-YtvUs5SUMS44o/edit#gid=0> |

<summary><b>Version 3.0</b></summary>

| pattern | required? | description |
| --- | --- | --- |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
files:
-
pattern: extras\/.*
required: True
description: Folder for general lab-specific files related to the dataset
-
pattern: extras\/microscope_hardware\.json
required: True
description: A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk <[email protected]> if help is required in generating this document.
is_qa_qc: True
-
pattern: extras\/microscope_settings\.json
required: False
description: A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk <[email protected]> if help is required in generating this document.
is_qa_qc: True
-
pattern: raw\/.*
required: True
description: All raw data files for the experiment.
-
pattern: raw\/[^\/]+\.gpr
required: True
description: This is a 10X Genomics layout file that's generated by 10X and individualized for each Visium slide. This is a text file and can be generated using this 10X web form <https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/using/slidefile-download> along with the unique 10X Visium slide ID.
is_qa_qc: False
-
pattern: raw\/fastq\/.*
required: True
description: Raw sequencing files for the experiment
-
pattern: raw\/fastq\/RNA\/.*
required: True
description: Directory containing fastq files pertaining to RNAseq sequencing.
-
pattern: raw\/fastq\/RNA\/[^\/]+_R[^\/]+\.fastq\.gz
required: True
description: This is a GZip'd version of the forward and reverse fastq files from RNAseq sequencing (R1 and R2).
is_qa_qc: False
-
pattern: raw\/images\/.*
required: True
description: Directory containing raw image files. This directory should include at least one raw file.
-
pattern: raw\/images\/[^\/]+\.(?:xml|scn|vsi|svs|czi|tiff)
required: True
description: Raw microscope file for the experiment
is_qa_qc: False
-
pattern: lab_processed\/.*
required: True
description: Experiment files that were processed by the lab generating the data.
-
pattern: lab_processed\/alignment\.json
required: True
description: JSON file for the manual tissue alignment created using Loupe browser and used as input to Space Ranger.
-
pattern: lab_processed\/images\/.*
required: True
description: Processed image files
-
pattern: lab_processed\/images\/[^\/]+\.ome\.tiff
required: True
description: OME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. For Visium this stitched file should only include the single capture area relevant to the current dataset. For GeoMx there will be one OME TIFF file per slide, with each slide including multiple AOIs. See the following link for the set of fields that are required in the OME TIFF file XML header. <https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0>
is_qa_qc: False
example: lab_processed/images/HBM892.MDXS.293.ome.tiff
-
pattern: lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv
required: True
description: This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed <https://docs.google.com/spreadsheets/d/1xEJSb0xn5C5fB3k62pj1CyHNybpt4-YtvUs5SUMS44o/edit#gid=0>
is_qa_qc: False
-
pattern: lab_processed\/transformations\/.*
required: False
description: This directory contains transformation matrices that capture how each modality is aligned with the other and can be used to visualize overlays of multimodal data. This is needed to overlay images from the exact same tissue section (e.g., MALDI imaging mass spec, autofluorescence microscopy, MxIF, histological stains). In these cases data type may have different pixel sizes and slightly different orientations (i.e., one may be rotated relative to another).
-
pattern: lab_processed\/transformations\/[^\/]+\.txt
required: False
description: Transformation matrices used to overlay images from the exact same tissue section (e.g., MALDI imaging mass spec, autofluorescence microscopy, MxIF, histological stains).
is_qa_qc: False
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
files:
-
pattern: extras\/.*
required: True
description: Folder for general lab-specific files related to the dataset
-
pattern: extras\/microscope_hardware\.json
required: True
description: A file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk <[email protected]> if help is required in generating this document.
is_qa_qc: True
-
pattern: extras\/microscope_settings\.json
required: False
description: A file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk <[email protected]> if help is required in generating this document.
is_qa_qc: True
-
pattern: raw\/.*
required: True
description: All raw data files for the experiment.
-
pattern: raw\/[^\/]+\.gpr
required: True
description: This is a 10X Genomics layout file that's generated by 10X and individualized for each Visium slide. This is a text file and can be generated using this 10X web form <https://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/using/slidefile-download> along with the unique 10X Visium slide ID.
is_qa_qc: False
-
pattern: raw\/additional_panels_used\.csv
required: False
description: If multiple commercial probe panels were used, then the primary probe panel should be selected in the "oligo_probe_panel" metadata field. The additional panels must be included in this file. Each panel record should include:manufacturer, model/name, product code.
-
pattern: raw\/custom_probe_set\.csv
required: False
description: This file should contain any custom probes used and must be included if the metadata field "is_custom_probes_used" is "Yes". The file should minimally include:target gene id, probe seq, probe id. The contents of this file are modeled after the 10x Genomics probe set file (see <https://support.10xgenomics.com/spatial-gene-expression-ffpe/probe-sets/probe-set-file-descriptions/probe-set-file-descriptions#probe_set_csv_file>).
-
pattern: raw\/fastq\/.*
required: True
description: Raw sequencing files for the experiment
-
pattern: raw\/fastq\/oligo\/
required: True
description: Directory containing fastq files pertaining to oligo sequencing.
-
pattern: raw\/fastq\/oligo\/[^\/]+\.fastq\.gz
required: True
description: This is a gzip version of the fastq file. This file contains the cell barcode and unique molecular identifier (technical).
is_qa_qc: False
-
pattern: raw\/images\/.*
required: True
description: Directory containing raw image files. This directory should include at least one raw file.
-
pattern: raw\/images\/[^\/]+_tissue\.(?:tif|tiff)
required: False
description: Raw microscope file for the experiment. For 10X Visium CytAssist, this would be the high resolution image produced.
is_qa_qc: False
-
pattern: raw\/images\/[^\/]+_fiducial\.(?:tif|tiff)
required: True
description: This is the low resolution image from the 10X CytAssist instrument that includes the fiduciary markings.
-
pattern: raw\/images\/[^\/]+\.ndpi
required: False
description: Raw microscope file for the experiment
is_qa_qc: False
-
pattern: lab_processed\/.*
required: True
description: Experiment files that were processed by the lab generating the data.
-
pattern: lab_processed\/alignment\.json
required: True
description: JSON file for the manual tissue alignment created using Loupe browser and used as input to Space Ranger.
-
pattern: lab_processed\/images\/.*
required: True
description: Processed image files
-
pattern: lab_processed\/images\/[^\/]+\.ome\.tiff
required: True
description: OME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. For Visium this stitched file should only include the single capture area relevant to the current dataset. For GeoMx there will be one OME TIFF file per slide, with each slide including multiple AOIs. See the following link for the set of fields that are required in the OME TIFF file XML header. <https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0>
is_qa_qc: False
example: lab_processed/images/HBM892.MDXS.293.ome.tiff
-
pattern: lab_processed\/images\/[^\/]*ome-tiff\.channels\.csv
required: True
description: This file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed <https://docs.google.com/spreadsheets/d/1xEJSb0xn5C5fB3k62pj1CyHNybpt4-YtvUs5SUMS44o/edit#gid=0>
is_qa_qc: False
Loading