-
Notifications
You must be signed in to change notification settings - Fork 1
ska annotate
Simon Harris edited this page Sep 4, 2018
·
6 revisions
The annotate subcommand locates split kmers in a reference genome sequence and annotates them into a vcf (v4.3) format output file.
If the input format is a gff file, split kmers matching CDS, tRNA or rRNA features will be annotated with the following information where availble. This will all be in the info field of the vcf.
- Feature ID
- Feature type (CDS, tRNA or rRNA)
- Strand
- Position of base in feature
For CDS features the following will also be included where available
- Locus tag
- Systematic ID
- Gene name
- Position of amino acid in feature
- Position of base in codon
- Reference amino acid
- Alternate amino acids (comma separated list matching the alt bases in the 5th column of the vcf file)
- Product (only output when the -p flag is used)
ska annotate [options] <kmer files>
Options:
-h Print this help.
-f <file> File of split kmer file names. These will be added to or
used as an alternative input to the list provided on the
command line.
-i Include kmers in repetitive reference regions.
-o <file> Prefix for output files. [Default = found]
-p Include product in output.
-r <file> Reference fasta/gff file name. [Required]
-v Only output variant sites.
SKA is currently only available as a preprint, so for now, if you use it, please cite: Harris SR. 2018. SKA: Split Kmer Analysis Toolkit for Bacterial Genomic Epidemiology. bioRxiv 453142 doi: https://doi.org/10.1101/453142