Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6_43519367_43519367_A_T not correctly parsed #275

Open
pnrobinson opened this issue Sep 19, 2024 · 1 comment
Open

6_43519367_43519367_A_T not correctly parsed #275

pnrobinson opened this issue Sep 19, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@pnrobinson
Copy link
Member

6_43519367_43519367_A_T

gets shown as None in gpsea for POLR1A

Variant Validotr (using GRCh38:6:43519367:A:T), shows

NM_203290.4:c.176A>T

BUT this is Homo sapiens RNA polymerase I and III subunit C (POLR1C), transcript variant 1, mRNA (not POLR1A)

NP_976035.1:p.(Asn59Ile)

There is some error, possibly in the upstream data, but GPSEA should probably emit a warning here? I will try to figure this out.

@pnrobinson pnrobinson added the bug Something isn't working label Sep 19, 2024
@pnrobinson
Copy link
Member Author

The transcript also leads to a crash

POLR1A_MANE_transcript = 'NM_015425.6' # Homo sapiens RNA polymerase I subunit A (POLR1A), mRNA
(...)
tx_coordinates = txc_service.fetch(POLR1A_MANE_transcript)

leads to

ValueError                                Traceback (most recent call last)
Cell In[8], [line 5](vscode-notebook-cell:?execution_count=8&line=5)
      [3](vscode-notebook-cell:?execution_count=8&line=3) txc_service = VVMultiCoordinateService(genome_build=GRCh38)
      [4](vscode-notebook-cell:?execution_count=8&line=4) pms = configure_default_protein_metadata_service()
----> [5](vscode-notebook-cell:?execution_count=8&line=5) tx_coordinates = txc_service.fetch(POLR1A_MANE_transcript)
      [6](vscode-notebook-cell:?execution_count=8&line=6) #protein_meta = pms.annotate(POLR1A_protein_id)

File ~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:164, in VVMultiCoordinateService.fetch(self, tx)
    [162](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:162) tx_id = self._parse_tx(tx)
    [163](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:163) response_json = self.get_response(tx_id)
--> [164](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:164) return self.parse_response(tx_id, response_json)

File ~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:195, in VVMultiCoordinateService.parse_response(self, tx_id, response)
    [193](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:193)     raise ValueError(error_string)
    [194](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:194) if 'transcripts' not in transcript_response:
--> [195](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:195)     VVMultiCoordinateService._handle_missing_field(
    [196](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:196)         response=response, 
    [197](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:197)         field='transcripts',
    [198](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:198)     )
    [199](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:199) tx_data = self._find_tx_data(tx_id, transcript_response['transcripts'])
    [200](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:200) if 'genomic_spans' not in tx_data:

File ~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:259, in VVMultiCoordinateService._handle_missing_field(response, field)
    [257](https://file+.vscode-resource.vscode-cdn.net/Users/robin/GIT/gpsea-cs/notebooks/POLR1A/~/GIT/gpsea/src/gpsea/preprocessing/_vv.py:257) json_formatted_str = json.dumps(response, indent=2)
...
ValueError: A required `transcripts` field is missing in the response from Variant Validator API: 
{
  "error": "Unable to recognise gene symbol LOC90784",
  "requested_symbol": "NM_015425.6"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant