Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hg19 and hg38 analysis from same workflow branch #8

Open
wants to merge 53 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
ba8b0f5
Removing ExAC and EVS from annotation and no-control filtering
NagaComBio Aug 19, 2019
8584941
Removing DUKE, DAC and HiSeqDepth based confidence annotaions
NagaComBio Oct 11, 2019
7e49b5d
Removing getRefGenomeAndChrPrefixFromHeader functions
NagaComBio Oct 11, 2019
ce97964
Updating annotation file paths to hg38 directory
NagaComBio Oct 11, 2019
c514297
Removing the hg38 encode blacklist filter
NagaComBio Jan 21, 2020
3bebd1f
Variant calling from CRAM files
NagaComBio Jan 7, 2021
b0ac261
Checking for nonREFnonALT in BoolCounter class
NagaComBio Feb 11, 2021
674900f
updating pysam to 0.16.0.1
NagaComBio Feb 11, 2021
128c1c0
Moving files to ngs_share
NagaComBio Feb 11, 2021
2de0727
Updating BoolCounter class
NagaComBio Mar 11, 2021
d8f3cf1
REF name via BAM header
NagaComBio Mar 11, 2021
b99be8f
Reverting generic xml to hg19
NagaComBio Mar 11, 2021
98bb76e
New xml for GRCh38 files
NagaComBio Mar 11, 2021
90b9978
chrLength file with 'chr' prefix
NagaComBio Mar 11, 2021
58b77cd
Removing hard-coded header parsing
NagaComBio Mar 16, 2021
1af4b68
Liftover local control for hg38 WES and WGS
NagaComBio Mar 16, 2021
baabd82
hg19 specific annotations
NagaComBio Mar 16, 2021
5a764ef
User defined threshold-based 'SNP_support_germline' annotations
NagaComBio Mar 16, 2021
da88940
Removed extra spaces
NagaComBio Mar 16, 2021
ad3f650
Merge branch 'master' into hg38
NagaComBio Mar 16, 2021
aef5111
Uncommenting reference detection
NagaComBio Apr 26, 2021
22a55ec
updating refgenome help
NagaComBio Apr 30, 2021
71900b3
Raise error: unknown alignment suffix
NagaComBio Apr 30, 2021
dbd835d
Reformatting in_dbSNPcounter.pl
NagaComBio Apr 30, 2021
df184d9
Reformatting IMD R file
NagaComBio Apr 30, 2021
f93779c
Increasing the mem requirement
NagaComBio Sep 29, 2021
7beae93
safer IO for in_dbSNPcounter.pl
NagaComBio Oct 18, 2021
d270358
Add variant calling in HLA/ALT contigs
NagaComBio May 4, 2022
b683df2
Update ngs_share path
NagaComBio May 4, 2022
1c0bc26
Remove RE/MAP from hg38 penalties
NagaComBio May 30, 2022
6d47982
Add m2e2,HLA/ALT mappability files
NagaComBio Jul 8, 2022
5fcc9e8
Merge branch 'mappability branch' into hg38
NagaComBio Jul 8, 2022
01567ac
Fix the diagnostic plots for GRCh38
NagaComBio Sep 16, 2022
6321e0a
Remove the quote for RAW_SNV_FILTER_OPTIONS
NagaComBio Sep 16, 2022
0adf1e4
Upgrade to gencodev39 for hg38
NagaComBio Oct 6, 2022
72a4135
Merge branch 'master' into hg38
NagaComBio Oct 27, 2022
c4fb48b
Bug fix with dbSNP counter
NagaComBio Nov 7, 2022
dc2ca36
Update WGS local control
NagaComBio Nov 7, 2022
973ae9c
hg38: Add local control and gnomAD based confidence annotation
NagaComBio Nov 7, 2022
56aa726
Update README
NagaComBio Nov 7, 2022
da45d2a
Exempt classification with FREQ
NagaComBio Nov 7, 2022
45fa141
Reverting SNP based confidence scoring
NagaComBio Mar 20, 2023
5479faf
Add exception for 'N' in createErrorPlots.py
NagaComBio Mar 20, 2023
f633856
Remove NA values in quantile calculation
NagaComBio Mar 20, 2023
ea7154b
Update reference
NagaComBio Mar 20, 2023
3f8419c
Update raw_filter_punishment in accordance with RAW_SNV_FILTER_OPTIONS
NagaComBio Mar 20, 2023
13bdc5f
Move python env
NagaComBio Mar 23, 2023
e2f7db9
Update virtual env path
NagaComBio Jun 12, 2023
44614b6
XML comments to description
NagaComBio Jun 12, 2023
dfff910
Merge branch 'master' into hg38
NagaComBio Jun 12, 2023
3a47349
Minor update
NagaComBio Jun 12, 2023
b8d4664
Update readme with hg38 calls
NagaComBio Jun 12, 2023
4dca636
Fixing the missed conflict
NagaComBio Jun 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,17 @@ The optional configuration JSON file defaults to the `convertToStdVCF.json` resi

## Changelog

* 3.0.0
* Major
* Support for hg38/GRCh38 reference genome and variant calling from ALT and HLA contigs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any new mandatory configuration values? At least list them. Details should be in the XML.
Was the semantics of conf. values changed?

* Minor
* For hg38: Removing mappability and repeat elements' annotations from penalty calculations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any new optional configuration values? (at least list them)

* skipREMAP: Option to remove repeat elements and mappability from confidence annotations in hg19.
* Removing EVS And ExAC AF from the annotations and no-control workflow filtering
* Support for variant calling from CRAM files
* Bug fix: Removing "quote" around the raw filter option `<RAW_SNV_FILTER_OPTIONS>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. But this is a technical/code description. What was the effect of the bug to the user? (probably that multiple RAW_SNV_FILTER_OPTIONS could not be used).

* Update `COWorkflowsBasePlugin` to `1.4.2`

* 2.2.0

* minor: Added `vcfConvertToStd.py`; slightly modified from code by @juleskers and Sophia Stahl
Expand Down
2 changes: 1 addition & 1 deletion buildinfo.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
dependson=COWorkflowsBasePlugin:1.3.0
dependson=COWorkflowsBasePlugin:1.4.2
JDKVersion=1.8
GroovyVersion=2.4
RoddyAPIVersion=3.4
2 changes: 1 addition & 1 deletion buildversion.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
2.1
3.0
0
4 changes: 4 additions & 0 deletions resources/analysisTools/snvPipeline/PurityReloaded.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is passing over the VCF once. Not sure how much memory it uses, but maybe it is worthwhile limiting the memory with this approach.

Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,10 @@ def parseVcf(file,num):
while (l!= ""):
t=l.split('\t')
if (t[0][0] != "#") and isValid(t):
# Skiping the non-primary assembly variants from purity calculations
NagaComBio marked this conversation as resolved.
Show resolved Hide resolved
if t[0].startswith('HLA') or t[0].endswith('_alt'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t -> fields
l -> line

BTW (2 lines up): line[0] != "#" is much clearer than fields[0][0] != "#" (not to speak of t[0][0]).

l=vcf.readline()
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is i (next line)? Maybe mapped_chromosome?

i = chromMap[t[0]]
if (t[12]=="germline"):
#DP5=string.split(string.split(t[11],";")[1],",")
Expand Down
256 changes: 143 additions & 113 deletions resources/analysisTools/snvPipeline/confidenceAnnotation_SNVs.py

Large diffs are not rendered by default.

6 changes: 5 additions & 1 deletion resources/analysisTools/snvPipeline/createErrorPlots.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def calculateLogSize(size, max_val, base):
return math.log(((size/max_val)*(base-1.))+1., base)

def calculateRootedSize(size, max_val):
if(float(size) != 0.0):
if(float(size) != 0.0 and float(max_val) != 0.0):
return np.sqrt(size/max_val)
else:
return 0.0
Expand Down Expand Up @@ -199,6 +199,10 @@ def calculateErrorMatrix(vcfFilename, referenceFilename, errorType):
# 23.05.2016 JB: Excluded multiallelic SNVs
if ',' in split_line[header.index("ALT")]: continue

# 21.02.2023 NP: Excluded SNVs with 'N' before or after "," in context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a short word about why you exclude these? I guess it is not obvious because it was added to the SNV workflow only after years of operation.

if 'N,' in split_line[header.index("SEQUENCE_CONTEXT")] or ',N' in split_line[header.index("SEQUENCE_CONTEXT")]:
NagaComBio marked this conversation as resolved.
Show resolved Hide resolved
continue

chrom = split_line[header.index("CHROM")]
pos = int(split_line[header.index("POS")])
context = ""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ module load bedtools/"${BEDTOOLS_VERSION:?BEDTOOLS_VERSION undefined}"
module load pypy/"${PYPY_VERSION:?PYPY_VERSION undefined}"

set +ue
source /dkfz/cluster/virtualenvs/paramasi/python_2.7.9_SNVCalling_pysam_0.16.0.1/bin/activate
source /omics/groups/OE0246/shared/paramasi/compute_env/python_2.7.9_SNVCalling_pysam_0.16.0.1/bin/activate
vinjana marked this conversation as resolved.
Show resolved Hide resolved
set -ue

export BGZIP_BINARY=bgzip
Expand Down
87 changes: 46 additions & 41 deletions resources/analysisTools/snvPipeline/filter_PEoverlap.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try limiting the (heap) memory consumption of this Python process. See here.

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion resources/analysisTools/snvPipeline/filter_vcf.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ getRefGenomeAndChrPrefixFromHeader ${TUMOR_BAMFILE_FULLPATH_BP} # Sets CHR_PREFI
ALIGNMENT_FOLDER=`dirname ${TUMOR_BAMFILE_FULLPATH_BP}`

# bugfix: ensure to interpret CHROMOSOME_INDICES as array - otherwise TOOL_INTERMUTATION_DISTANCE_COORD_COLOR will fail...
declare -a CHROMOSOME_INDICES="${CHROMOSOME_INDICES}"
declare -a CHROMOSOME_INDICES="${CHROMOSOME_INDICES_PLOTTING}"
NagaComBio marked this conversation as resolved.
Show resolved Hide resolved
numberOfChromosomes=${CHROMOSOME_INDICES[@]}
outputFilenamePrefix=${mpileupDirectory}/${SNVFILE_PREFIX}${PID}
outputFilenamePrefix_original=${outputFilenamePrefix}
Expand Down
132 changes: 63 additions & 69 deletions resources/analysisTools/snvPipeline/in_dbSNPcounter.pl
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
{
$minscore = 8;
}
open (FH, $file) or die "Could not open $file: $!\n";
open (my $FH, $file) or die "Could not open $file: $!\n";

# count all, somatic, s. with minconfidence, in dbSNP, in 1KG
my $all = 0;
Expand All @@ -37,85 +37,80 @@


my $header = "";
while ($header = <FH>)
{
last if ($header =~ /^\#CHROM/); # that is the line with the column names
}
chomp $header;

my $DBSBP = 13;
my $KG = 14;
my $CONFIDENCE = 28;
my $CANNOTATION = -1;
my $RECLASSIFICATION = -1;
my $CLASSIFICATION;
my ($DBSBP, $KG, $CONFIDENCE, $CLASSIFICATION);

@help = split(/\t/, $header);
for (my $c = 0; $c < @help; $c++)
{
# fixed column labels (from vcf_pileup_compare)
if ($help[$c] eq "CONFIDENCE")
{
$CONFIDENCE = $c;
print STDERR "CONFIDENCE in column $c\n";
}
if ($help[$c] eq "1K_GENOMES")
while(!eof($FH)){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!

my $line = readline($FH) || die "Could not read line: $?";
chomp $line;
if ($line =~ /^#/)
{
$KG = $c;
print STDERR "INFO in column $c\n";
}
if ($help[$c] eq "DBSNP")
{
$DBSBP = $c;
print STDERR "DBSNP_COL in column $c\n";
}
if ($help[$c] eq "ANNOTATION_control")
{
$CANNOTATION = $c;
print STDERR "ANNOTATION_control in column $c\n";
}
if ($help[$c] eq "RECLASSIFICATION")
{
$RECLASSIFICATION = $c;
print STDERR "RECLASSIFICATION in column $c\n";
}
}
#print STDOUT "HELLO\n";
if ($line =~/^#CHROM/){
chomp $line;

if ($CANNOTATION > 0) {
$CLASSIFICATION = $CANNOTATION;
} else {
$CLASSIFICATION = $RECLASSIFICATION;
}
my @head=split("\t", $line);
my %col;

# I know that it's evilly hardcoded!
while (<FH>)
{
if ($_ =~ /^#/) # skip header lines
{
next;
}
$all++;
@help = split ("\t", $_);
if ($help[$CLASSIFICATION] =~ /somatic/)
{
$somatic++;
if ($help[$CONFIDENCE] >= $minscore)
{
$scored++;
#print STDERR $_;
if ($help[$DBSBP] =~ /MATCH=exact/)
my $i = 0;
while($i < @head)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. I try to avoid this and always use explicit cast

Suggested change
while($i < @head)
while($i < scalar(@head))

{
$dbSNP++;
if($head[$i] eq "DBSNP"){
$col{"DBSNP"} = $i;
print STDERR "DBSNP in column ", $i+1,"\n";
}
if($head[$i] eq "1K_GENOMES"){
$col{"1K_GENOMES"} = $i;
print STDERR "1K_GENOMES in column ", $i+1,"\n";
}
if($head[$i] eq "RECLASSIFICATION"){
$col{"RECLASSIFICATION"} = $i;
print STDERR "RECLASSIFICATION in column ", $i+1,"\n";
}
if($head[$i] eq "ANNOTATION_control"){
$col{"ANNOTATION_control"} = $i;
print STDERR "ANNOTATION_control in column ", $i+1,"\n";
}
if($head[$i] eq "CONFIDENCE"){
$col{"CONFIDENCE"} = $i;
print STDERR "CONFIDENCE in column ", $i+1,"\n";
}
$i++;
}
if ($help[$KG] =~ /MATCH=exact/)

$DBSBP = $col{"DBSNP"};
$KG = $col{"1K_GENOMES"};
$CONFIDENCE = $col{"CONFIDENCE"};

if ($col{"ANNOTATION_control"} > 0) {
$CLASSIFICATION = $col{"ANNOTATION_control"};
} else {
$CLASSIFICATION = $col{"RECLASSIFICATION"};
}
}
}
else{
$all++;
@help = split ("\t", $line);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also rename this to @line. I think, it also has advantages if two variables that differ only by the type, but not by the content (in principle) are called the same. Some for @head above.

if ($help[$CLASSIFICATION] =~ /somatic/)
NagaComBio marked this conversation as resolved.
Show resolved Hide resolved
{
$somatic++;
if ($help[$CONFIDENCE] >= $minscore)
{
$oneKG++;
$scored++;
if ($help[$DBSBP] =~ /MATCH=exact/)
{
$dbSNP++;
}
if ($help[$KG] =~ /MATCH=exact/)
{
$oneKG++;
}
}
}
}
}
close $FH;

close FH;
if ($scored > 0)
{
$perc_dbSNP = sprintf ("%.2f", ($dbSNP/$scored*100));
Expand All @@ -126,6 +121,5 @@
else # there might be no ...
{
die "!!! There are no SNVs with confidence >= $minscore in the input, it makes no sense to do further analyses with the current settings on these data!!!\n";

}
exit;
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,14 @@ c.array <- unlist(strsplit(opt$chrArray, ',', fixed = TRUE))
data <- read.table(opt$chrLengthFile, header=FALSE)
chrLength <- data.frame(data)
rownames(chrLength) <- chrLength$V1

if(opt$chrPrefix != "" &&
grepl(opt$chrPrefix, chrLength$V1[1]) &&
!grepl(opt$chrPrefix, c.array[1])){

c.array <- paste0(opt$chrPrefix, c.array, opt$chrSuffix)
}

xtotal <- sum(chrLength[c.array,2]/10)
xoffsets <- c(0)

Expand Down Expand Up @@ -84,7 +92,7 @@ pdf(opt$outFilePrefix, width=20, height=8)
plotColors = c("CA"="blue","CG"="black","CT"="red","TA"="purple","TC"="orange","TG"="green")
baseChanges <- c("CA", "CG", "CT", "TA", "TC","TG")

c.name <- paste0(opt$chrPrefix, c.array[1], opt$chrSuffix)
c.name <- c.array[1]

sel <- which(dat$diff > 0)
if(length(sel)){
Expand All @@ -101,7 +109,7 @@ if(length(sel)){
# chromosomes <- c(2:22, 'X')

for (chr in c.array) {
c.name <- paste0(opt$chrPrefix, chr, opt$chrSuffix)
c.name <- chr

sel <- which(dat$X.CHROM == c.name & dat$diff > 0)
points((dat$POS[sel]/10+xoffset)/xtotal, log10(dat$diff[sel]), col = plotColors[dat$change[sel]], pch=16,cex=0.8)
Expand Down
25 changes: 14 additions & 11 deletions resources/analysisTools/snvPipeline/snvAnnotation.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ declare -r outputDirectory=`dirname ${FILENAME_VCF_OUT}`
relinked=false
# (Re-)link bam file and index. The pipeline cannot handle i.e. .mdup.bam and .mdup.bai
# Look up the index with .bam.bai. If the file exists nothing happens.
if [[ ! -f ${TUMOR_BAMFILE_FULLPATH_BP}.bai ]]
if [[ ! -f ${TUMOR_BAMFILE_FULLPATH_BP}.bai && ! -f ${TUMOR_BAMFILE_FULLPATH_BP}.crai ]]
NagaComBio marked this conversation as resolved.
Show resolved Hide resolved
then
shortBamIdx=`dirname ${TUMOR_BAMFILE_FULLPATH_BP}`"/"`basename ${TUMOR_BAMFILE_FULLPATH_BP} .bam`".bai"
[[ ! -f ${shortBamIdx} ]] && echo "Bam index file cannot be found!" && exit -15
Expand Down Expand Up @@ -91,12 +91,10 @@ fi

cmdFilter="${cmdFilter} | ${PERL_BINARY} ${TOOL_ANNOTATE_VCF_FILE} --tabix_bin=${TABIX_BINARY} -a - -b ${DBSNP} --columnName=${DBSNP_COL} --reportMatchType --bAdditionalColumn=2 --reportLevel 4 | \
${PERL_BINARY} ${TOOL_ANNOTATE_VCF_FILE} --tabix_bin=${TABIX_BINARY} -a - -b ${KGENOME} --columnName=${KGENOMES_COL} --reportMatchType --bAdditionalColumn=2 --reportLevel 4 | \
${PERL_BINARY} ${TOOL_ANNOTATE_VCF_FILE} --tabix_bin=${TABIX_BINARY} -a - -b ${ExAC} --columnName ${ExAC_COL} --bFileType vcf --reportMatchType --reportLevel 4 | \
${PERL_BINARY} ${TOOL_ANNOTATE_VCF_FILE} --tabix_bin=${TABIX_BINARY} -a - -b ${EVS} --columnName ${EVS_COL} --bFileType vcf --reportMatchType --reportLevel 4 | \
${PERL_BINARY} ${TOOL_ANNOTATE_VCF_FILE} -a - -b ${GNOMAD_WES_ALL_SNV} --columnName=${GNOMAD_WES_COL} --bFileType vcf --reportMatchType --reportLevel 4 | \
${PERL_BINARY} ${TOOL_ANNOTATE_VCF_FILE} -a - -b ${GNOMAD_WGS_ALL_SNV} --columnName=${GNOMAD_WGS_COL} --bFileType vcf --reportMatchType --reportLevel 4 | \
${PERL_BINARY} ${TOOL_ANNOTATE_VCF_FILE} --tabix_bin=${TABIX_BINARY} -a - -b ${GNOMAD_WES_ALL_SNV} --columnName=${GNOMAD_WES_COL} --bFileType vcf --reportMatchType --reportLevel 4 | \
${PERL_BINARY} ${TOOL_ANNOTATE_VCF_FILE} --tabix_bin=${TABIX_BINARY} -a - -b ${GNOMAD_WGS_ALL_SNV} --columnName=${GNOMAD_WGS_COL} --bFileType vcf --reportMatchType --reportLevel 4 | \
${PERL_BINARY} ${TOOL_ANNOTATE_VCF_FILE} --tabix_bin=${TABIX_BINARY} -a - -b ${LOCALCONTROL_WGS} --columnName ${LOCALCONTROL_WGS_COL} --bFileType vcf --reportMatchType --reportLevel 4 | \
${PERL_BINARY} ${TOOL_ANNOTATE_VCF_FILE} --tabix_bin=${TABIX_BINARY} -a - -b ${LOCALCONTROL_WES} --columnName ${LOCALCONTROL_WES_COL} --bFileType vcf --reportMatchType --reportLevel 4"
${PERL_BINARY} ${TOOL_ANNOTATE_VCF_FILE} --tabix_bin=${TABIX_BINARY} -a - -b ${LOCALCONTROL_WES} --columnName ${LOCALCONTROL_WES_COL} --bFileType vcf --reportMatchType --reportLevel 4 "

if [[ -f ${RECURRENCE} ]]; then
cmdFilter="${cmdFilter} | ${PERL_BINARY} ${TOOL_ANNOTATE_VCF_FILE} -a - -b ${RECURRENCE} --columnName ${RECURRENCE_COL} --bFileType vcf"
Expand Down Expand Up @@ -199,7 +197,8 @@ mkfifo ${npConfidence}
if [[ ${isNoControlWorkflow-false} == "false" ]]; then
cat ${filenameSNVVCF} > ${npConfidence} &
else
cat ${filenameSNVVCF} | ${PYPY_OR_PYTHON_BINARY} -u ${TOOL_CONFIDENCE_ANNOTATION} ${noControlFlag} -i - ${CONFIDENCE_OPTS} -a 0 > ${npConfidence} &
cat ${filenameSNVVCF} | ${PYPY_OR_PYTHON_BINARY} -u ${TOOL_CONFIDENCE_ANNOTATION} ${noControlFlag} -i - ${CONFIDENCE_OPTS} -a 0 \
--gnomAD_WGS_maxMAF=${CRIT_GNOMAD_GENOMES_maxMAF} --gnomAD_WES_maxMAF=${CRIT_GNOMAD_EXOMES_maxMAF} --localControl_WGS_maxMAF=${CRIT_LOCALCONTROL_maxMAF} --localControl_WES_maxMAF=${CRIT_LOCALCONTROL_maxMAF} --1000genome_maxMAF=${CRIT_1KGENOMES_maxMAF} > ${npConfidence} &
fi

# create BaseScore FIFOs and their consumer processes (zip and write to target file)
Expand All @@ -214,7 +213,8 @@ cat ${filenameReferenceAlleleReadPositions}_NP | ${BGZIP_BINARY} -f >${filenameR
if [[ ${runArtifactFilter-true} == true ]]
then
cat ${npConfidence} | filterPeOverlap ${noControlFlag} --alignmentFile=${tumorbamfullpath} --mapq=$mapqual --baseq=$basequal --qualityScore=phred --maxNumberOfMismatchesInRead=${NUMBER_OF_MISMACTHES_THRESHOLD--1} --altBaseQualFile=${filenameAlternativeAlleleBaseScores}_NP --refBaseQualFile=${filenameReferenceAlleleBaseScores}_NP --altBasePositionsFile=${filenameAlternativeAlleleReadPositions}_NP --refBasePositionsFile ${filenameReferenceAlleleReadPositions}_NP --referenceFile=${REFERENCE_GENOME} \
| ${PYPY_OR_PYTHON_BINARY} -u ${TOOL_CONFIDENCE_ANNOTATION} ${noControlFlag} -i - ${CONFIDENCE_OPTS} -a 0 -f ${filenameSomaticSNVsTmp} > ${filenameSNVVCFTemp}.tmp
| ${PYPY_OR_PYTHON_BINARY} -u ${TOOL_CONFIDENCE_ANNOTATION} ${noControlFlag} -i - ${CONFIDENCE_OPTS} -a 0 -f ${filenameSomaticSNVsTmp} \
--gnomAD_WGS_maxMAF=${CRIT_GNOMAD_GENOMES_maxMAF} --gnomAD_WES_maxMAF=${CRIT_GNOMAD_EXOMES_maxMAF} --localControl_WGS_maxMAF=${CRIT_LOCALCONTROL_maxMAF} --localControl_WES_maxMAF=${CRIT_LOCALCONTROL_maxMAF} --1000genome_maxMAF=${CRIT_1KGENOMES_maxMAF} > ${filenameSNVVCFTemp}.tmp

[[ $? != 0 ]] && echo "Error in first iteration of confidence annotation" && exit 2

Expand Down Expand Up @@ -243,7 +243,8 @@ then
fi

cat ${filenameSNVVCFTemp} | ${PYTHON_BINARY} ${TOOL_FLAG_BIAS} --vcfFile="/dev/stdin" --referenceFile=${REFERENCE_GENOME} --sequence_specificFile=${filenamePCRerrorMatrix} --sequencing_specificFile=${filenameSequencingErrorMatrix} --numReads=${nReads} --numMuts=${nMuts} --biasPValThreshold=${biasPValThreshold} --biasRatioThreshold=${biasRatioThreshold} --biasRatioMinimum=${biasRatioMinimum} --maxNumOppositeReadsSequencingWeakBias=${maxNumOppositeReadsSequencingWeakBias} --maxNumOppositeReadsSequenceWeakBias=${maxNumOppositeReadsSequenceWeakBias} --maxNumOppositeReadsSequencingStrongBias=${maxNumOppositeReadsSequencingStrongBias} --maxNumOppositeReadsSequenceStrongBias=${maxNumOppositeReadsSequenceStrongBias} --ratioVcf=${rVcf} --bias_matrixSeqFile=${filenameBiasMatrixSeqFile} --bias_matrixSeqingFile=${filenameBiasMatrixSeqingFile} --vcfFileFlagged="/dev/stdout" | \
${PYPY_OR_PYTHON_BINARY} -u ${TOOL_CONFIDENCE_ANNOTATION} ${noControlFlag} -i - ${CONFIDENCE_OPTS} -a 1 -f ${filenameSomaticSNVsTmp} > ${filenameSNVVCFTemp}.tmp
${PYPY_OR_PYTHON_BINARY} -u ${TOOL_CONFIDENCE_ANNOTATION} ${noControlFlag} -i - ${CONFIDENCE_OPTS} -a 1 -f ${filenameSomaticSNVsTmp} \
--gnomAD_WGS_maxMAF=${CRIT_GNOMAD_GENOMES_maxMAF} --gnomAD_WES_maxMAF=${CRIT_GNOMAD_EXOMES_maxMAF} --localControl_WGS_maxMAF=${CRIT_LOCALCONTROL_maxMAF} --localControl_WES_maxMAF=${CRIT_LOCALCONTROL_maxMAF} --1000genome_maxMAF=${CRIT_1KGENOMES_maxMAF} > ${filenameSNVVCFTemp}.tmp

[[ $? != 0 ]] && echo "Error in first filtering and/or second interation of confidence annotation" && exit 5

Expand All @@ -267,7 +268,8 @@ then
fi

cat ${filenameSNVVCFTemp} | ${PYTHON_BINARY} -u ${TOOL_FLAG_BIAS} --vcfFile="/dev/stdin" --referenceFile=${REFERENCE_GENOME} --sequence_specificFile=${filenamePCRerrorMatrix} --sequencing_specificFile=${filenameSequencingErrorMatrix} --numReads=${nReads} --numMuts=${nMuts} --biasPValThreshold=${biasPValThreshold} --biasRatioThreshold=${biasRatioThreshold} --biasRatioMinimum=${biasRatioMinimum} --maxNumOppositeReadsSequencingWeakBias=${maxNumOppositeReadsSequencingWeakBias} --maxNumOppositeReadsSequenceWeakBias=${maxNumOppositeReadsSequenceWeakBias} --maxNumOppositeReadsSequencingStrongBias=${maxNumOppositeReadsSequencingStrongBias} --maxNumOppositeReadsSequenceStrongBias=${maxNumOppositeReadsSequenceStrongBias} --ratioVcf=${rVcf} --bias_matrixSeqFile=${filenameBiasMatrixSeqFile} --bias_matrixSeqingFile=${filenameBiasMatrixSeqingFile} --vcfFileFlagged="/dev/stdout" | \
${PYPY_OR_PYTHON_BINARY} -u ${TOOL_CONFIDENCE_ANNOTATION} ${noControlFlag} -i - ${CONFIDENCE_OPTS_PANCAN} -a 2 > ${filenameSNVVCF}
${PYPY_OR_PYTHON_BINARY} -u ${TOOL_CONFIDENCE_ANNOTATION} ${noControlFlag} -i - ${CONFIDENCE_OPTS_PANCAN} -a 2 \
--gnomAD_WGS_maxMAF=${CRIT_GNOMAD_GENOMES_maxMAF} --gnomAD_WES_maxMAF=${CRIT_GNOMAD_EXOMES_maxMAF} --localControl_WGS_maxMAF=${CRIT_LOCALCONTROL_maxMAF} --localControl_WES_maxMAF=${CRIT_LOCALCONTROL_maxMAF} --1000genome_maxMAF=${CRIT_1KGENOMES_maxMAF} > ${filenameSNVVCF}

[[ $? != 0 ]] && echo "Error in second filtering and/or third iteration of confidence annotation" && exit 8

Expand All @@ -289,7 +291,8 @@ then
fi
else
cat ${npConfidence} | filterPeOverlap ${noControlFlag} --alignmentFile=${tumorbamfullpath} --mapq=$mapqual --baseq=$basequal --qualityScore=phred --maxNumberOfMismatchesInRead=${NUMBER_OF_MISMACTHES_THRESHOLD--1} --altBaseQualFile=${filenameAlternativeAlleleBaseScores}_NP --refBaseQualFile=${filenameReferenceAlleleBaseScores}_NP --altBasePositionsFile=${filenameAlternativeAlleleReadPositions}_NP --refBasePositionsFile=${filenameReferenceAlleleReadPositions}_NP --referenceFile=${REFERENCE_GENOME} | \
${PYPY_OR_PYTHON_BINARY} -u ${TOOL_CONFIDENCE_ANNOTATION} ${noControlFlag} -i - ${CONFIDENCE_OPTS_PANCAN} > ${filenameSNVVCFTemp}
${PYPY_OR_PYTHON_BINARY} -u ${TOOL_CONFIDENCE_ANNOTATION} ${noControlFlag} -i - ${CONFIDENCE_OPTS_PANCAN} \
--gnomAD_WGS_maxMAF=${CRIT_GNOMAD_GENOMES_maxMAF} --gnomAD_WES_maxMAF=${CRIT_GNOMAD_EXOMES_maxMAF} --localControl_WGS_maxMAF=${CRIT_LOCALCONTROL_maxMAF} --localControl_WES_maxMAF=${CRIT_LOCALCONTROL_maxMAF} --1000genome_maxMAF=${CRIT_1KGENOMES_maxMAF} > ${filenameSNVVCFTemp}

exitCode=$?
[[ $exitCode == 0 ]] && [[ -f ${filenameSNVVCFTemp} ]] && mv ${filenameSNVVCFTemp} ${filenameSNVVCF}
Expand Down
Loading