Skip to content

Commit

Permalink
Add globus_token to get_assaytype_data (#1343)
Browse files Browse the repository at this point in the history
* adding token to assayclassifier call, switching string concatenated URLs to urllib

* fixed tests

* updated tests

* linters

* fixing query params

* forgot changed file

* General: Update Visium w/ probes documentation

* General: Modify CHANGELOG since we're on 0.0.23

---------

Co-authored-by: Juan Puerto <=>
  • Loading branch information
gesinaphillips committed Jul 2, 2024
1 parent 31853d8 commit 8a14a9a
Show file tree
Hide file tree
Showing 11 changed files with 56 additions and 45 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Changelog

## v0.0.23
- Add token to validation_utils.get_assaytype_data, replace URL string concatenation with urllib

## v0.0.22
- Fix logging length issue in Upload.multi_parent
- Minor change to Visium with probes directory schema
Expand Down
2 changes: 1 addition & 1 deletion docs/visium-with-probes/current/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Related files:
- [📝 TSV template](https://raw.githubusercontent.com/hubmapconsortium/dataset-metadata-spreadsheet/main/visium-with-probes/latest/visium-with-probes.tsv): Alternative for metadata entry.


REQUIRED - For this assay, you must also prepare and submit two additional metadata.tsv files following the metadata schemas linked here for [RNAseq](https://hubmapconsortium.github.io/ingest-validation-tools/rnaseq/current/) and [Histology](https://hubmapconsortium.github.io/ingest-validation-tools/histology/current/). [This link](https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0) lists the set of fields that are required in the OME TIFF file XML header.
REQUIRED - For this assay, you must also prepare and submit two additional metadata.tsv files following the metadata schemas linked here for [RNAseq (with probes)](https://hubmapconsortium.github.io/ingest-validation-tools/rnaseq-with-probes/current/) and [Histology](https://hubmapconsortium.github.io/ingest-validation-tools/histology/current/). [This link](https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0) lists the set of fields that are required in the OME TIFF file XML header.

## Metadata schema

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
```
Directory Errors:
? examples/dataset-examples/bad-cedar-multi-assay-visium-bad-dir-structure/upload/Visium_9OLC_A4_S1
(as visium-no-probes-v2.0)
(as visium-no-probes-v2.1)
: - Required but missing:
- lab_processed\/.*.
- lab_processed\/images\/.*.
Expand Down
2 changes: 1 addition & 1 deletion examples/dataset-examples/bad-scrnaseq-v0/fixtures.json
Original file line number Diff line number Diff line change
@@ -1 +1 @@
{"assaytype": {"scRNAseq-10xGenomics": {"assaytype": "scRNAseq-10xGenomics-v3", "contains-pii": true, "dataset-type": "RNAseq", "description": "RNAseq (10x Genomics v3)", "dir-schema": "scrnaseq-v0", "primary": true, "tbl-schema": "scrnaseq-v0", "vitessce-hints": []}}, "validation": {"scRNAseq-10xGenomics-v3": {}, "contributors": {}}}
{"assaytype": {"scRNAseq-10xGenomics": {"assaytype": "scRNAseq-10xGenomics-v3", "contains-pii": true, "dataset-type": "RNAseq", "description": "scRNAseq (10x Genomics v3)", "dir-schema": "scrnaseq-v0", "primary": true, "tbl-schema": "scrnaseq-v0", "vitessce-hints": []}}, "validation": {"scRNAseq-10xGenomics-v3": {}, "contributors": {}}}
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,14 @@ TSVs:
good-visium-assay-metadata.tsv:
Schema: visium-no-probes-v2
Metadata schema version: '2'
Directory schema version: visium-no-probes-v2.0
Directory schema version: visium-no-probes-v2.1
good-visium-histology-metadata.tsv:
Schema: h-and-e-v2
Metadata schema version: '2'
Directory schema version: visium-no-probes-v2.0
Directory schema version: visium-no-probes-v2.1
good-visium-rnaseq-metadata.tsv:
Schema: rnaseq-visium-no-probes-v2
Metadata schema version: '2'
Directory schema version: visium-no-probes-v2.0
Directory schema version: visium-no-probes-v2.1
```
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
description_md: 'REQUIRED - For this assay, you must also prepare and submit two additional metadata.tsv files following the metadata schemas linked here for [RNAseq](https://hubmapconsortium.github.io/ingest-validation-tools/rnaseq/current/) and [Histology](https://hubmapconsortium.github.io/ingest-validation-tools/histology/current/). [This link](https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0) lists the set of fields that are required in the OME TIFF file XML header.'
description_md: 'REQUIRED - For this assay, you must also prepare and submit two additional metadata.tsv files following the metadata schemas linked here for [RNAseq (with probes)](https://hubmapconsortium.github.io/ingest-validation-tools/rnaseq-with-probes/current/) and [Histology](https://hubmapconsortium.github.io/ingest-validation-tools/histology/current/). [This link](https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0) lists the set of fields that are required in the OME TIFF file XML header.'
fields:
- name: is_cedar
description: 'Identifies whether the version is hosted by CEDAR'
Expand Down
38 changes: 23 additions & 15 deletions src/ingest_validation_tools/upload.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from functools import cached_property
from pathlib import Path
from typing import DefaultDict, Dict, List, Optional, Union
from urllib.parse import urljoin, urlsplit

import requests

Expand Down Expand Up @@ -178,8 +179,14 @@ def get_errors(self, **kwargs) -> ErrorDict:
def get_app_context(self, submitted_app_context: Dict):
"""
Ensure that all default values are present, but privilege any
submitted values.
submitted values (after making a basic validity check).
"""
for url_type in ["entities_url", "ingest_url", "constraints_url"]:
if submitted_app_context.get(url_type):
split_url = urlsplit(submitted_app_context[url_type])
assert (
split_url.scheme and split_url.netloc
), f"{url_type} URL is incomplete: {submitted_app_context[url_type]}"
self.app_context = {
"entities_url": "https://entity.api.hubmapconsortium.org/entities/",
"ingest_url": "https://ingest.api.hubmapconsortium.org/",
Expand Down Expand Up @@ -371,8 +378,10 @@ def _constraint_checks(self, schema: SchemaVersion):
"Authorization": f"Bearer {self.globus_token}",
"Content-Type": "application/json",
}
url = f"{self.app_context['constraints_url']}?match=True&order={CONSTRAINTS_CHECK_METHOD}"
response = requests.post(url, headers=headers, data=data)
params = {"match": True, "order": CONSTRAINTS_CHECK_METHOD}
response = requests.post(
self.app_context["constraints_url"], headers=headers, data=data, params=params
)
if self.verbose:
print("Ancestor-Descendant pairs sent:")
self._print_constraint_pairs(payload)
Expand Down Expand Up @@ -563,27 +572,24 @@ def _get_constrained_fields(self, schema: SchemaVersion):

constrained_fields = {}
schema_name = schema.schema_name
entities_url = self.app_context.get("entities_url")

if schema_name in [
OtherTypes.SOURCE,
*Sample.with_parent_type(),
]:
constrained_fields["source_id"] = self.app_context.get("entities_url")
constrained_fields["source_id"] = entities_url
if schema_name in Sample.with_parent_type():
constrained_fields["sample_id"] = self.app_context.get("entities_url")
constrained_fields["sample_id"] = entities_url
elif schema_name in OtherTypes.ORGAN: # Deprecated, included for backward-compatibility
constrained_fields["organ_id"] = self.app_context.get("entities_url")
constrained_fields["organ_id"] = entities_url
elif schema_name == OtherTypes.CONTRIBUTORS:
if schema.is_cedar:
constrained_fields["orcid"] = (
"https://pub.orcid.org/v3.0/expanded-search/?q=orcid:"
)
constrained_fields["orcid"] = "https://pub.orcid.org/v3.0/expanded-search/"
else:
constrained_fields["orcid_id"] = (
"https://pub.orcid.org/v3.0/expanded-search/?q=orcid:"
)
constrained_fields["orcid_id"] = "https://pub.orcid.org/v3.0/expanded-search/"
else:
constrained_fields["parent_sample_id"] = self.app_context.get("entities_url")
constrained_fields["parent_sample_id"] = entities_url
return constrained_fields

def _get_url_fields(
Expand Down Expand Up @@ -611,7 +617,7 @@ def _check_url(
"""
Returns entity_type if checking a field in check_fields.
"""
url = constrained_fields[field] + value
url = urljoin(constrained_fields[field], value)
if field in self.check_fields:
headers = self.app_context.get("request_header", {})
response = get_entity_api_data(url, self.globus_token, headers)
Expand All @@ -628,7 +634,9 @@ def _check_url(
)
elif field in ["orcid_id", "orcid"]:
headers = {"Accept": "application/json"}
response = requests.get(url, headers=headers)
response = requests.get(
constrained_fields[field], headers=headers, params={"q": f"orcid:{value}"}
)
num_found = response.json().get("num-found")
if num_found == 1:
return
Expand Down
17 changes: 6 additions & 11 deletions src/ingest_validation_tools/validation_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from csv import DictReader
from pathlib import Path, PurePath
from typing import Dict, List, Optional, Union
from urllib.parse import urljoin

import requests

Expand Down Expand Up @@ -92,10 +93,7 @@ def get_schema_version(
message.append('Has "orcid_id": Contributors TSV found where metadata TSV expected.')
if message:
raise PreflightError(" ".join([msg for msg in message]))
assay_type_data = get_assaytype_data(
rows[0],
ingest_url,
)
assay_type_data = get_assaytype_data(rows[0], ingest_url, globus_token)
if not assay_type_data:
message.append(f"No match found in assayclassifier for TSV {path}.")
if "assay_type" in rows[0]:
Expand Down Expand Up @@ -151,20 +149,17 @@ def get_other_schema_data(
raise PreflightError(f"Metadata TSV {path} contains invalid field: {unique_field}")
if entity_type == OtherTypes.SAMPLE:
# Sample types require additional data
return get_entity_info_from_entity_api(url + row[unique_field], globus_token)
return get_entity_info_from_entity_api(urljoin(url, row[unique_field]), globus_token)
else:
return EntityTypeInfo(entity_type)


def get_assaytype_data(
row: Dict,
ingest_url: str,
) -> Dict:
def get_assaytype_data(row: Dict, ingest_url: str, globus_token: str) -> Dict:
if not ingest_url:
ingest_url = "https://ingest.api.hubmapconsortium.org/"
response = requests.post(
f"""{ingest_url.strip("/")}/assaytype""",
headers={"Content-Type": "application/json"},
urljoin(ingest_url, "assaytype"),
headers={"Content-Type": "application/json", "Authorization": f"Bearer {globus_token}"},
data=json.dumps(row),
)
response.raise_for_status()
Expand Down
2 changes: 1 addition & 1 deletion tests-manual/test-plugins.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def test_info_reporting(self):
):
with patch("ingest_validation_tools.upload.Upload.online_checks"):
fake_now = datetime.now()
for test_dir in glob.glob(f"examples/plugin-tests/**"):
for test_dir in glob.glob("examples/plugin-tests/**"):
upload = Upload(
Path(f"{test_dir}/upload"), **PLUGIN_EXAMPLES_OPTS, verbose=False
)
Expand Down
18 changes: 11 additions & 7 deletions tests/test_dataset_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -279,8 +279,8 @@ def test_validate_dataset_examples(self, verbose: bool = False, full_diff: bool
opts = {}
with patch(
"ingest_validation_tools.validation_utils.get_assaytype_data",
side_effect=lambda row, ingest_url: assaytype_side_effect(
test_dir, row, ingest_url
side_effect=lambda row, ingest_url, globus_token: assaytype_side_effect(
test_dir, row, ingest_url, globus_token
),
) as mock_assaytype_data:
with patch("ingest_validation_tools.upload.Upload.online_checks"):
Expand Down Expand Up @@ -332,7 +332,7 @@ def multi_dataset_assert(self, tsv_paths: List[str], mock_assaytype_data: Mock):
with open(tsv_path, encoding="ascii") as f:
rows = list(DictReader(f, dialect="excel-tab"))
f.close()
calls.append(call(rows[0], "https://ingest.api.hubmapconsortium.org/"))
calls.append(call(rows[0], "https://ingest.api.hubmapconsortium.org/", ""))
try:
mock_assaytype_data.assert_has_calls(calls, any_order=True)
except AssertionError as e:
Expand All @@ -343,7 +343,9 @@ def multi_dataset_assert(self, tsv_paths: List[str], mock_assaytype_data: Mock):
def prep_offline_upload(test_dir: str, opts: Dict) -> Upload:
with patch(
"ingest_validation_tools.validation_utils.get_assaytype_data",
side_effect=lambda row, ingest_url: assaytype_side_effect(test_dir, row, ingest_url),
side_effect=lambda row, ingest_url, globus_token: assaytype_side_effect(
test_dir, row, ingest_url, globus_token
),
):
with patch("ingest_validation_tools.upload.Upload.online_checks"):
upload = Upload(Path(f"{test_dir}/upload"), **opts)
Expand All @@ -354,7 +356,9 @@ def prep_offline_upload(test_dir: str, opts: Dict) -> Upload:
def prep_dir_schema_upload(self, test_dir: str, opts: Dict, patch_data: Dict) -> Upload:
with patch(
"ingest_validation_tools.validation_utils.get_assaytype_data",
side_effect=lambda row, ingest_url: assaytype_side_effect(test_dir, row, ingest_url),
side_effect=lambda row, ingest_url, globus_token: assaytype_side_effect(
test_dir, row, ingest_url, globus_token
),
):
with patch(
"ingest_validation_tools.validation_utils.get_possible_directory_schemas",
Expand Down Expand Up @@ -484,14 +488,14 @@ def test_bad_multi_assay_parents(self):
),
):
bad_upload = Upload(
Path(f"test_path"),
Path("test_path"),
tsv_paths=["repeated_parent_fake_path_1", "repeated_parent_fake_path_2"],
**DATASET_EXAMPLES_OPTS,
)
with self.assertRaises(PreflightError):
bad_upload.multi_parent
good_upload = Upload(
Path(f"test_path"),
Path("test_path"),
tsv_paths=["unique_parent_fake_path_1", "unique_parent_fake_path_2"],
**DATASET_EXAMPLES_OPTS,
)
Expand Down
9 changes: 5 additions & 4 deletions tests/test_single_tsv.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@
BAD_DATASET_SCHEMA_WITH_ANCESTORS,
GOOD_DATASET_EXPECTED_PAYLOAD,
GOOD_DATASET_SCHEMA_WITH_ANCESTORS,
SAMPLE_BLOCK_CONSTRAINTS_RESPONSE_BAD,
SAMPLE_BLOCK_CONSTRAINTS_RESPONSE_GOOD,
SAMPLE_BLOCK_PARTIAL_CEDAR_RESPONSE_GOOD,
SAMPLE_BLOCK_PARTIAL_ENTITY_API_RESPONSE,
Expand All @@ -26,7 +25,7 @@
TEST_GET_TSV_ERRORS_PARAMS,
)

CONSTRAINTS_URL_PARAMS = "?match=True&order=ancestors"
CONSTRAINTS_URL_PARAMS = {"match": True, "order": "ancestors"}
CONSTRAINTS_URL = "http://constraints_test/"
ENTITIES_URL = "http://entities_test/"

Expand Down Expand Up @@ -78,9 +77,10 @@ def test_constraints_bad(self, mock_request):
],
)
mock_request.assert_any_call(
CONSTRAINTS_URL + CONSTRAINTS_URL_PARAMS,
CONSTRAINTS_URL,
headers={"Authorization": "Bearer test", "Content-Type": "application/json"},
data=json.dumps(BAD_DATASET_EXPECTED_PAYLOAD),
params=CONSTRAINTS_URL_PARAMS,
)

@patch("ingest_validation_tools.upload.requests.post")
Expand All @@ -89,9 +89,10 @@ def test_constraints_good(self, mock_request):
# Shouldn't return anything
self.upload._constraint_checks(GOOD_DATASET_SCHEMA_WITH_ANCESTORS)
mock_request.assert_any_call(
CONSTRAINTS_URL + CONSTRAINTS_URL_PARAMS,
CONSTRAINTS_URL,
headers={"Authorization": "Bearer test", "Content-Type": "application/json"},
data=json.dumps(GOOD_DATASET_EXPECTED_PAYLOAD),
params=CONSTRAINTS_URL_PARAMS,
)

@patch("ingest_validation_tools.upload.requests.post")
Expand Down

0 comments on commit 8a14a9a

Please sign in to comment.