Skip to content

Population stats

jmmut edited this page Oct 7, 2020 · 2 revisions

How to parse population stats in aggregated VCFs

Say you have an aggregated VCF like this:

#CHROM	POS	ID	REF	ALT	FILTER	QUAL	INFO
chr1	100	.	A	T	.	.	AF=0.3;AF_AFR=0.4;AF_EUR=0.1

The pipeline for aggregated VCFs will only load the AF=0.3 statistic. However, it's possible to tell the pipeline how to parse the other frequencies and store them as population frequencies.

Create the mapping file

First, you have to write a mapping file (e.g. stats-mapping.properties) like this:

AFR.AF=AF_AFR
EUR.AF=AF_EUR
ALL.AF=AF

where:

  • The first string until the dot (e.g. AFR) is the population name as it will appear in the EVA DB and website, and can be chosen by the EVA operator, except the fixed ALL population (for the whole sample set).
  • The string from the dot until the = character (e.g. AF) is the variable. It can only be one of AN (allele number), AC (allele count), AF (allele frequency). Providing AC requires providing AN too to be able to compute the frequency AC/AN.
  • From the = character until the end (e.g. AF_AFR), that's the tag as it appears in the VCF, and can be whatever the submitter used.

Note that the tag from the VCF is arbitrary. It could appear in the VCF as FREQ_AFRICAN=0.4 and the line in the mapping file would be AFR.AF=FREQ_AFRICAN.

Pass it to the pipeline

Once the mapping file is ready, you have to put its path in the pipeline parameter properties file as:

input.vcf.aggregation.mapping-path=/path/to/stats-mapping.properties

Note that the job has to be the one for aggregated VCFs (spring.batch.job.names=aggregated-vcf-job), and the aggregation type has to be one of input.vcf.aggregation=BASIC, or =EVS, or =EXAC (in other words, can't be =NONE).

More details

For more details about this feature, look at the source code at https://github.com/EBIvariation/eva-pipeline/blob/master/src/main/java/uk/ac/ebi/eva/pipeline/io/mappers/VariantAggregatedVcfFactory.java#L65 . If that file doesn't exist anymore, then it's likely that the one being used is https://github.com/EBIvariation/variation-commons/blob/master/variation-commons-core/src/main/java/uk/ac/ebi/eva/commons/core/models/factories/VariantAggregatedVcfFactory.java#L69 .

Clone this wiki locally