Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create message/output when downstream and QC analysis is not complete or is interrupted due to taxa not assigned (taxa is not bacterial) #123

Open
MOREYCK opened this issue Oct 2, 2023 · 1 comment
Assignees
Labels
Enhancement New feature or request

Comments

@MOREYCK
Copy link

MOREYCK commented Oct 2, 2023

Describe the current status:
For v2.0.2, when given a negative control samples (C. Auris). PHX currently fails at the calculate assembly ratio step.

image

The kraken database doesn’t have C. Auris in it so in the kraken file it says 98% of reads are unclassified. For FastANI, PHX only has the bacteria genomes from Refseq so this sample does not have a taxa assigned.

image

Describe the solution you'd like
When pipeline can't identify a taxa report out a "no taxa found" in GRiPHin_Summary.xlsx file and FAIL sample.

@jvhagey jvhagey added the Enhancement New feature or request label Oct 4, 2023
@jvhagey jvhagey self-assigned this Oct 4, 2023
@jvhagey
Copy link
Collaborator

jvhagey commented Oct 26, 2023

Hi @MOREYCK,

I have built into the v2.1.0-dev version handling for neg control. How PHX works is that it tries to first assign the taxa with fastANI based on the top 20 hits from the mash sketch that is built from all the bacterial genomes in ref seq. I tried a few yeast samples and they had either 0 MASH hits or a very bad hit (<80% ANI match with VERY low coverage). When PHX can't get a good match with FastANI it will fall back and report what taxa kraken2 assigned with weighted scaffolds (in the case of the yeast the match in kraken2 was human). In either case of 0 mash hits or a hit that is <80% ANI these "errors" will both show up in the "WARNINGS" column of the Griphin summary.

We are also working on a new entry point in PHX that will have a more limited database, this will make picking a neg control easier. This is the version we plan to do our validation with, but I don't have a timeline for that yet. Most likely in early 2024.

So your options right now are:

  1. Wait and validate the new entry point in PHX with a non-HAI bacterial isolate.
  2. Use the neg control you have already picked with v2.1.0 that will come out soon. For this you will need to write into your validation plan to consider it a neg result when you get the warnings "No MASH hit found" or "No hits with >=80% ANI." For either of these cases the "Taxa_Source" column in the Griphin summary will state "kraken2_wtasmbld" rather than "ANI_REFSEQ". In other words, you would only accept when taxa was IDed by FastANI.

Let me know your thoughts about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants