Skip to content

Code to convert Hg19 reference/alternative alleles to ancestral/derived alleles.

Notifications You must be signed in to change notification settings

David-Peede/hg19_aa_calls

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hg19 Ancestral Allele Calls

This repository contains the code to convert the chimp (panTro6) multiple sequence alignment and Enredo-Pecan-Ortheus (EPO) ancestral sequence alignment to an all-sites VCF file. If you don't want to run the code yourself, all of the processed VCF files and tables can be downloaded from my Dropbox. If you would like to run this code yourself, follow the instructions below.

data

All of the data is publicly available for download.

# Download the latest Hg19 reference genome from UCSC.
wget -P ./data/hg19 https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/latest/hg19.fa.gz

# Download the panTro6 MAF file.
wget -P ./data/panTro https://hgdownload.soe.ucsc.edu/goldenPath/hg19/vsPanTro6/hg19.panTro6.synNet.maf.gz

# Download the Enredo-Pecan-Ortheus (EPO) ancestral sequences.
wget -P ./data https://ftp.ensembl.org/pub/release-74/fasta/ancestral_alleles/homo_sapiens_ancestor_GRCh37_e71.tar.bz2
# Extract and then delete the EPO tar file.
cd ./data
tar -xvjf homo_sapiens_ancestor_GRCh37_e71.tar.bz2
rm homo_sapiens_ancestor_GRCh37_e71.tar.bz2

aa_calls

This directory containes two subdirectories tables and vcfs. It should be noted that the code to proccess the MAF files in this repo was inspired by Simon Martin's genomics_general repo that I optimized for my own specific use.

tables

This directory contains gzipped CSV files with the Hg19 reference, EPO, and panTro6 alleles for comparison.

# Generate tables with the Hg19 reference sequence, EPO ancestral sequence, and panTro6 sequence.
for CHR in {1..22} X Y; do
    python ./aa_tools/hg19_epo_panTro6_table.py -c ${CHR}
done

vcfs

This directory contains the all sites VCF files for the EPO and panTro6 alleles—note you will need tabix to bgzip the VCF files.

# Generate VCF files for the panTro6 ancestral sequence.
for CHR in {1..22} X Y; do
    python ./aa_tools/hg19_panTro6_vcf.py -c ${CHR} | bgzip > ./aa_calls/vcfs/hg19_panTro6_chr${CHR}.vcf.gz
done

# Generate VCF files for the EPO ancestral sequence.
for CHR in {1..22} X Y; do
    python ./aa_tools/hg19_epo_vcf.py -c ${CHR} | bgzip > ./aa_calls/vcfs/hg19_epo_chr${CHR}.vcf.gz
done

About

Code to convert Hg19 reference/alternative alleles to ancestral/derived alleles.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published