ska distance

SKA distance

The distance subcommand allows calculation of pairwise distances between split kmer files and clustering based on a SNP and identity cutoffs.

Distance output columns

Column	Description
File 1	The name of the first split kmer file being compared
File 2	The name of the first split kmer file being compared
Matches	Number of split kmers found in both files where the middle base is an A, C, G or T and matches between files
Mismatches	Number of split kmers found in only one of the files
SNPs	Number of split kmers found in both files where the middle base is an A, C, G or T but differs between files
Ns	Number of split kmers found in both files where the middle base is an N in at least one of the files

Cluster output columns

Column	Description
File	The name of the split kmer file
Cluster	An index for the cluster containing the file

Usage

ska distance [options] <split kmer files>

Options:
-c <file>	Clusters output file name (tsv format).
-d <file>	Distances output file name (tsv format).
-h		Print this help
-f <file>	File of split kmer file names. These will be added to or 
		used as an alternative input to the list provided on the 
		command line.
-i <float>	Identity cutoff for defining clusters. Isolates will be 
		clustered if they share at least this proportion of the 
		kmers ot the isolate with fewer kmers and pass the SNP 
		cutoff.
-s <int>	SNP cutoff for defining clusters. Isolates will be clustered 
		if they are separated by fewer than this number of SNPs and 
		pass the identity cutoff

Citation

SKA is currently only available as a preprint, so for now, if you use it, please cite: Harris SR. 2018. SKA: Split Kmer Analysis Toolkit for Bacterial Genomic Epidemiology. bioRxiv 453142 doi: https://doi.org/10.1101/453142

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ska distance

SKA distance

Distance output columns

Cluster output columns

Usage

Citation

Clone this wiki locally