Skip to content

ska annotate

Simon Harris edited this page Sep 4, 2018 · 6 revisions

SKA annotate

The annotate subcommand locates split kmers in a reference genome sequence and annotates them into a vcf (v4.3) format output file.

If the input format is a gff file, split kmers matching CDS, tRNA or rRNA features will be annotated with the following information where availble. This will all be in the info field of the vcf.

  • Feature ID
  • Feature type (CDS, tRNA or rRNA)
  • Strand
  • Position of base in feature

For CDS features the following will also be included where available

  • Locus tag
  • Systematic ID
  • Gene name
  • Position of amino acid in feature
  • Position of base in codon
  • Reference amino acid
  • Alternate amino acids (comma separated list matching the alt bases in the 5th column of the vcf file)
  • Product (only output when the -p flag is used)

Usage

ska annotate [options] <kmer files>

Options:
-h		Print this help.
-f <file>	File of split kmer file names. These will be added to or 
		used as an alternative input to the list provided on the 
		command line.
-i		Include kmers in repetitive reference regions.
-o <file>	Prefix for output files. [Default = found]
-p		Include product in output.
-r <file>	Reference fasta/gff file name. [Required]
-v		Only output variant sites.
Clone this wiki locally