Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query API returns empty summary on some genes #129

Closed
jingjingbic opened this issue Aug 24, 2022 · 3 comments
Closed

query API returns empty summary on some genes #129

jingjingbic opened this issue Aug 24, 2022 · 3 comments

Comments

@jingjingbic
Copy link

We are using this function ( https://mygene.info/v3/query?q=symbol:POLA2&size=1&species=human&fields=name,summary
) to get the summary of gene POLA2, the call returns no summary. However, if we search this gene in NCBI, we can see there is a summary for this gene on this page https://www.ncbi.nlm.nih.gov/gene/23649. Does Mygene.info pull the summary value from NCBI or from another data source? Query on DRG1 has the similar issue.

@newgene
Copy link
Member

newgene commented Aug 26, 2022

@jingjingbic thanks for reporting this to us. We did some investigation and found out why this happens.

The summary field in MyGene.info was obtained from NCBI's refseq records.

For example, summary of gene CDK2 comes from NM_001798 (under "COMMENT" section, starts with "Summary:")

This works for pretty much all genes in the past, however, as you pointed out, we now start to see some gene summary values are not coming from the corresponding refseq record.

I think there could be two reasons:

  1. There is some delay for NCBI to include summary to some RefSeq records (or potentially could be a mistake too). In this case we will just wait for RefSeq to update. MyGene.info keeps synced very closely with NCBI, once RefSeq is updated (current release 213), MyGene.info should pick up the updates in a week or so.

  2. It's likely NCBI has another place to store some gene summary data, in addition to RefSeq records. We cannot locate where the summary of gene POLA2 is from all the data files we synced with NCBI. We will have to reach out to NCBI on this.

Either way, looks like this is something we should double check with NCBI. Depending on their response, we can decide whether any changes are needed on MyGene.info side.

@newgene
Copy link
Member

newgene commented Sep 4, 2022

We contacted NCBI helpdesk and confirmed this:

We currently do not add the summaries imported from the Alliance of Genome Resources onto the RefSeq transcript records. Summaries are also not added to model RefSeqs.

Instead of Refseq records, the complete set of gene summary text are available from NCBI's ASN1 binary dump files. We can modify our pipeline to extract gene summary from these files instead. A separate issue #130 was created for this task.

@jal347
Copy link
Contributor

jal347 commented Sep 27, 2022

Temporary fix to human genes is done.

@jal347 jal347 closed this as completed Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants