Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline Test files #216

Open
arunbodd opened this issue Jul 20, 2024 · 2 comments
Open

Pipeline Test files #216

arunbodd opened this issue Jul 20, 2024 · 2 comments

Comments

@arunbodd
Copy link

Hello Developer,

Can you please provide at least a test.config with test fasta files to run this pipeline and understand the output ?

Thank you.

@erinyoung
Copy link
Member

erinyoung commented Jul 22, 2024

My apologies for my late reply!

Generally we use Grandeur with fasta files for two things:

  1. QC and species estimation from long-read assembly
  2. Phylogenetic analysis

I don't have this built in to Grandeur (it's a long story, but a lot of sites are blocked locally - such as the ENA)

For phylogenetic analysis, this is what we use for testing with github actions (I'm making the assumption you're curious about the phylogenetic analysis):

Step 1. Get fasta files for the same species (they need to share 1500 genes)

mkdir fastas
cd fastas
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/013/783/245/GCA_013783245.1_ASM1378324v1/GCA_013783245.1_ASM1378324v1_genomic.fna.gz && gzip -d GCA_013783245.1_ASM1378324v1_genomic.fna.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/026/626/185/GCA_026626185.1_ASM2662618v1/GCA_026626185.1_ASM2662618v1_genomic.fna.gz && gzip -d GCA_026626185.1_ASM2662618v1_genomic.fna.gz 
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/020/808/985/GCA_020808985.1_ASM2080898v1/GCA_020808985.1_ASM2080898v1_genomic.fna.gz && gzip -d GCA_020808985.1_ASM2080898v1_genomic.fna.gz
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/904/863/225/GCA_904863225.1_KSB1_6J/GCA_904863225.1_KSB1_6J_genomic.fna.gz           && gzip -d GCA_904863225.1_KSB1_6J_genomic.fna.gz
cd ../

Step 2A. Then run the workflow

nextflow run . -profile docker,msa --fastas fastas

OR

Step 2B. Create the list of fastas and then run the workflow

Instead of pointing the workflow to a directory, a list of fasta files can be used instead. This must be the option used if using cloud resources.

Creating the fasta list

ls fastas/* > fastas.txt

fastas.txt should have file contents like so

fastas/GCA_013783245.1_ASM1378324v1_genomic.fna
fastas/GCA_026626185.1_ASM2662618v1_genomic.fna
fastas/GCA_020808985.1_ASM2080898v1_genomic.fna
fastas/GCA_904863225.1_KSB1_6J_genomic.fna

Running the workflow

nextflow run . -profile docker,msa --fasta_list fastas.txt

Step 3. Looking at results:

This gives a summary file with 1-2 key results from each analysis.

sample	file	version	per_core_genome_genes	warnings	amrfinder_genes_(per_cov/per_ident)	predicted_organism	mlst_matching_pubmlst_scheme	mlst_st	fastani_top_organism	fastani_top_reference	fastani_top_ani_estimate	fastani_top_total_query_sequence_fragments	fastani_top_fragments_aligned_as_orthologous_matches	mash_reference	mash_mash-distance	mash_p-value	mash_matching-hashes	mash_organism	plasmidfinder_plasmid_(identity)	kleborate_virulence_score	kleborate_resistance_score
GCA_013783245.1_ASM1378324v1_genomic	GCA_013783245.1_ASM1378324v1_genomic.fna	4.5.24184	84.42	Multiple FastANI hits,Low core genes,	['arsA (100.00/98.63)', 'arsB (100.00/98.83)', 'arsC (100.00/99.29)', 'arsD (100.00/90.83)', 'arsR (100.00/98.28)', 'blaSHV-11 (100.00/100.00)', 'emrD (100.00/99.49)', 'fieF (100.00/100.00)', 'fosA (100.00/99.28)', 'oqxA (100.00/100.00)', 'oqxB (100.00/100.00)', 'pcoA (100.00/100.00)', 'pcoB (100.00/100.00)', 'pcoC (100.00/100.00)', 'pcoD (100.00/99.68)', 'pcoE (100.00/97.92)', 'pcoR (100.00/100.00)', 'pcoS (100.00/99.57)', 'pmrB_R256G (100.00/99.45)', 'silA (100.00/98.85)', 'silB (100.00/97.91)', 'silC (100.00/99.35)', 'silE (100.00/92.31)', 'silF (100.00/99.15)', 'silP (99.64/94.53)', 'silR (100.00/98.23)', 'silS (100.00/98.78)']	Klebsiella_pneumoniae	klebsiella	37	Klebsiella_pneumoniae	Klebsiella_pneumoniae_GCF_000240185.1.fna.gz	99.124	1653	1791	refseq-NZ-1328379-PRJNA224116-SAMN02138587-GCF_000567645.1-.-Klebsiella_pneumoniae_MGH_47.fna	0.00305472	0	883/1000	Klebsiella_pneumoniae	['Col440I (92.73)', 'IncFIB(K) (98.93)', 'IncFII(K) (100.0)']	0	0
GCA_020808985.1_ASM2080898v1_genomic	GCA_020808985.1_ASM2080898v1_genomic.fna	4.5.24184	84.62	Multiple FastANI hits,Low core genes,	['blaSHV-11 (100.00/100.00)', 'emrD (100.00/99.49)', 'fieF (100.00/100.00)', 'fosA (100.00/100.00)', 'fosA7 (65.71/91.30)', 'oqxA (100.00/100.00)', 'oqxB (100.00/99.81)']	Klebsiella_pneumoniae	klebsiella	1017	Klebsiella_pneumoniae	Klebsiella_pneumoniae_GCF_022869665.1.fna.gz	99.0812	1579	1779	refseq-NZ-1438805-PRJNA224116-SAMN02581266-NZ_JJNJ-.-Klebsiella_pneumoniae_UCI_60.fna	0.00823165	0	726/1000	Klebsiella_pneumoniae	['IncFIB(pKPHS1) (99.46)']	0	0
GCA_026626185.1_ASM2662618v1_genomic	GCA_026626185.1_ASM2662618v1_genomic.fna	4.5.24184	82.57	Multiple FastANI hits,Low core genes,	"['aac(3)-IVa (100.00/100.00)', 'aadA1 (100.00/100.00)', 'aadA2 (100.00/100.00)', 'aadA2 (100.00/100.00)', ""aph(3'')-Ib (100.00/100.00)"", ""aph(3'')-Ib (100.00/100.00)"", ""aph(3'')-Ib (100.00/99.63)"", ""aph(3')-IIa (100.00/100.00)"", ""aph(3')-Ia (100.00/100.00)"", 'aph(4)-Ia (100.00/100.00)', 'aph(6)-Id (100.00/100.00)', 'aph(6)-Id (100.00/100.00)', 'aph(6)-Id (100.00/100.00)', 'armA (100.00/100.00)', 'blaCTX-M-14 (100.00/100.00)', 'blaDHA-1 (100.00/100.00)', 'blaSHV-25 (100.00/100.00)', 'blaTEM-1 (100.00/100.00)', 'ble (61.11/96.20)', 'cmlA1 (100.00/100.00)', 'dfrA12 (100.00/100.00)', 'emrD (100.00/99.49)', 'fieF (100.00/100.00)', 'floR (100.00/99.75)', 'fosA (99.28/100.00)', 'fosA3 (100.00/100.00)', 'gyrA_S83I (100.00/99.77)', 'mph(A) (100.00/100.00)', 'mph(E) (100.00/100.00)', 'msr(E) (100.00/100.00)', 'oqxA (100.00/100.00)', 'oqxB (100.00/100.00)', 'parC_S80I (98.84/99.41)', 'qacE (82.61/95.79)', 'qacEdelta1 (100.00/100.00)', 'qacL (100.00/100.00)', 'qnrB4 (100.00/100.00)', 'qnrS1 (100.00/100.00)', 'rmtB1 (100.00/100.00)', 'sul1 (100.00/100.00)', 'sul1 (100.00/100.00)', 'sul2 (100.00/100.00)', 'sul3 (100.00/100.00)', 'terB (100.00/100.00)', 'terC (100.00/99.13)', 'terD (100.00/98.96)', 'terE (100.00/99.48)', 'tet(A) (100.00/99.75)', 'tmexC (100.00/99.74)', 'tmexD (100.00/99.90)', 'toprJ1 (100.00/100.00)']"	Klebsiella_pneumoniae	klebsiella	789	Klebsiella_pneumoniae	Klebsiella_pneumoniae_GCF_000240185.1.fna.gz	99.1677	1665	1861	refseq-NZ-573-PRJNA224116-SAMN02777842-GCF_000739495.1-.-Klebsiella_pneumoniae.fna	0.0083078	0	724/1000	Klebsiella_pneumoniae	['Col(pHAD28) (91.6)', 'Col440I (91.23)', 'IncFIB(pNDM-Mar) (99.32)', 'IncHI1B(pNDM-MAR) (100.0)', 'IncR (100.0)', 'IncX1 (98.4)']	0	1
GCA_904863225.1_KSB1_6J_genomic	GCA_904863225.1_KSB1_6J_genomic.fna	4.5.24184	83.44	Multiple FastANI hits,Low core genes,	"[""aac(6')-Ib-cr5 (100.00/100.00)"", ""aph(3'')-Ib (100.00/100.00)"", 'aph(6)-Id (100.00/100.00)', 'arsA (100.00/100.00)', 'arsB (100.00/100.00)', 'arsC (100.00/100.00)', 'arsD (100.00/91.67)', 'arsR (100.00/100.00)', 'blaCTX-M-15 (100.00/100.00)', 'blaOXA-1 (100.00/100.00)', 'blaSHV-1 (100.00/100.00)', 'blaTEM-1 (100.00/100.00)', 'catB3 (70.00/100.00)', 'clpK (97.89/99.25)', 'crcB (100.00/100.00)', 'dfrA14 (100.00/100.00)', 'emrD (100.00/99.49)', 'fieF (100.00/100.00)', 'fosA (100.00/99.28)', 'fosA7 (100.00/91.43)', 'hsp20 (100.00/100.00)', 'oqxA (100.00/100.00)', 'oqxB19 (100.00/100.00)', 'pcoA (100.00/100.00)', 'pcoB (100.00/100.00)', 'pcoC (100.00/100.00)', 'pcoD (100.00/99.68)', 'pcoE (100.00/94.44)', 'pcoR (100.00/100.00)', 'pcoS (100.00/99.14)', 'qnrB1 (100.00/100.00)', 'silA (100.00/98.85)', 'silB (100.00/97.91)', 'silC (100.00/100.00)', 'silE (100.00/91.61)', 'silF (100.00/99.15)', 'silP (99.64/94.18)', 'silR (100.00/100.00)', 'silS (100.00/100.00)', 'sul2 (100.00/100.00)', 'tet(A) (100.00/100.00)']"	Klebsiella_pneumoniae	klebsiella	323	Klebsiella_pneumoniae	Klebsiella_pneumoniae_GCF_000240185.1.fna.gz	99.0536	1616	1825	refseq-NZ-573-PRJNA224116-SAMEA2602936-NZ_CCGN-.-Klebsiella_pneumoniae.fna	0.000434439	0	982/1000	Klebsiella_pneumoniae	['Col(pHAD28) (100.0)', 'IncFIB(K) (98.93)', 'IncFII(K) (95.95)']	0	1

There is also a newick file generated with iqtree2:

(GCA_020808985.1_ASM2080898v1_genomic:0.0038882302,(((GCA_013783245.1_ASM1378324v1_genomic:0.0035602662,GCA_026626185.1_ASM2662618v1_genomic:0.0030635049)67.8/75:0.0004092349,Klebsiella_pneumoniae_GCF_000240185.1:0.0032600322)100/100:0.0009262356,Klebsiella_pneumoniae_GCF_022869665.1:0.0046128026)99.6/99:0.0005900806,GCA_904863225.1_KSB1_6J_genomic:0.0039588357);

A SNP matrix generated via SNP dists:

snp-dists 0.8.2,GCA_020808985.1_ASM2080898v1_genomic,GCA_013783245.1_ASM1378324v1_genomic,Klebsiella_pneumoniae_GCF_022869665.1,GCA_026626185.1_ASM2662618v1_genomic,Klebsiella_pneumoniae_GCF_000240185.1,GCA_904863225.1_KSB1_6J_genomic
GCA_020808985.1_ASM2080898v1_genomic,0,26202,26340,24554,25128,24777
GCA_013783245.1_ASM1378324v1_genomic,26202,0,26648,21896,22221,26669
Klebsiella_pneumoniae_GCF_022869665.1,26340,26648,0,26209,26246,26393
GCA_026626[18](https://github.com/UPHL-BioNGS/Grandeur/actions/runs/9766935718/job/26961027547#step:5:19)5.1_ASM2662618v1_genomic,24554,21896,26209,0,20967,25626
Klebsiella_pneumoniae_GCF_000240185.1,25128,22221,26246,20967,0,25609
GCA_904863225.1_KSB1_6J_genomic,24777,26669,26393,25626,25609,0

And more.

More information can be found on our wiki pages https://github.com/UPHL-BioNGS/Grandeur/wiki/Phylogenetic-Analysis, https://github.com/UPHL-BioNGS/Grandeur/wiki/USAGE#fasta-files, and https://github.com/UPHL-BioNGS/Grandeur/wiki/phylogenetic_analysis.

@erinyoung
Copy link
Member

Did this work for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants