Skip to content
Brent Pedersen edited this page Jan 20, 2022 · 1 revision

why echtvar

one of the first steps after variant-calling in many pipelines is filtering on allele-frequency. This requires annotating with large datasets (for example, gnomad genomes is over 1TB of data). Echtvar uses integer compression, variant encoding and genomic chunking to make this stupid fast.

To make this simpler, smaller, and faster, echtvar encodes and compresses the variant, allele-frequency and other (user-specified) columns from a population VCF/BCF into an efficient format. This enables rapid annotation. In our tests, echtvar can annotate at ~1 million variants / second, but this is highly dependent on disk speed.

versus slivar

slivar has a similar feature as echtvar. It has the following limitations that echtvar overcomes.

  • slivar reads each chromosome into memory. This can make memory use quite high when there are many attributes and many variants (for example with CADD, which has 3 variants per genomic location).
  • slivar only uses general-purpose gzip (zlib) compression.
  • it uses 64 bits for small variants with overflow to a text table. echtvar uses 32 bits for small variants with overflow to an efficient binary format.

In our experience, an echtvar file will be about 60-70% of the size of the corresponding slivar encoded file. And, echtvar is substantially (often 5X) faster than slivar.

versus vcfanno/bcftools/snpSift

other tools like vcfanno, bcftools annotate and snpSift can annotate a query VCF with one or more VCFs. Each of these must parse much of the original (often huge) annotation files and so speed is limited by parsing of the annotation files.

expressions

other tools like bcftools, snpSift, and slivar support filtering expressions. The expressions in echtvar are stupid fast. In fact, it is often faster to apply an expression because writing to disk is the bottleneck and an expression will filter such that fewer variants are written to disk.

Other tools, especially slivar provide more flexible and complete filtering. The intent with echtvar is to cover most common use-cases with extreme speed. This is done with the fasteval rust library.

Clone this wiki locally