-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hg19 and hg38 analysis from same workflow branch #8
base: master
Are you sure you want to change the base?
Changes from all commits
ba8b0f5
8584941
7e49b5d
ce97964
c514297
3bebd1f
b0ac261
674900f
128c1c0
2de0727
d8f3cf1
b99be8f
98bb76e
90b9978
58b77cd
1af4b68
baabd82
5a764ef
da88940
ad3f650
aef5111
22a55ec
71900b3
dbd835d
df184d9
f93779c
7beae93
d270358
b683df2
1c0bc26
6d47982
5fcc9e8
01567ac
6321e0a
0adf1e4
72a4135
c4fb48b
dc2ca36
973ae9c
56aa726
da45d2a
45fa141
5479faf
f633856
ea7154b
3f8419c
13bdc5f
e2f7db9
44614b6
dfff910
3a47349
b8d4664
4dca636
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
# DKFZ SNVCalling Workflow | ||
|
||
An SNV calling workflow developed in the Applied Bioinformatics and Theoretical Bioinformatics groups at the DKFZ. An earlier version (pre Github) of this workflow was used in the [Pancancer](https://dockstore.org/containers/quay.io/pancancer/pcawg-dkfz-workflow) project. The workflow is only suited for human data (hg37/hg19; some later versions also hg38), because of the big role annotations play in this workflow. | ||
An SNV calling workflow developed in the Applied Bioinformatics and Theoretical Bioinformatics groups at the DKFZ. An earlier version (pre Github) of this workflow was used in the [Pancancer](https://dockstore.org/containers/quay.io/pancancer/pcawg-dkfz-workflow) project. The workflow is only suited for human data (hg37/GRCh37/hg19; and also for hg38/GRCh38), because of the big role annotations play in this workflow. | ||
|
||
> <table><tr><td><a href="https://www.denbi.de/"><img src="docs/images/denbi.png" alt="de.NBI logo" width="300" align="left"></a></td><td><strong>Your opinion matters!</strong> The development of this workflow is supported by the <a href="https://www.denbi.de/">German Network for Bioinformatic Infrastructure (de.NBI)</a>. By completing <a href="https://www.surveymonkey.de/r/denbi-service?sc=hd-hub&tool=SNVCallingWorkflow">this very short (30-60 seconds) survey</a> you support our efforts to improve this tool.</td></tr></table> | ||
|
||
|
@@ -114,6 +114,8 @@ Please have a look at the `resources/configurationFiles/analysisSNVCalling.xml` | |
|
||
## Example Call | ||
|
||
For hg19, | ||
|
||
```bash | ||
roddy.sh run projectConfigurationName@analysisName patientId \ | ||
--useconfig=/path/to/your/applicationProperties.ini \ | ||
|
@@ -123,6 +125,22 @@ roddy.sh run projectConfigurationName@analysisName patientId \ | |
--cvalues="bamfile_list:/path/to/your/control.bam;/path/to/your/tumor.bam,sample_list:normal;tumor,possibleTumorSampleNamePrefixes:tumor,possibleControlSampleNamePrefixes:normal,REFERENCE_GENOME:/reference/data/hs37d5_PhiX.fa,CHROMOSOME_LENGTH_FILE:/reference/data/hs37d5_PhiX.chromSizes,extractSamplesFromOutputFiles:false" | ||
``` | ||
|
||
For hg38, add the following workflow configs to the `projectConfigs` | ||
|
||
```xml | ||
<availableAnalyses> | ||
<analysis id="snvCalling" configuration="snvCallingAnalysisGRCh38" useplugin="SNVCallingWorkflow:3.0.0"/> | ||
</availableAnalyses> | ||
``` | ||
|
||
```bash | ||
roddy.sh run projectConfigurationName@analysisName patientId \ | ||
--useconfig=/path/to/your/applicationProperties.ini \ | ||
--configurationDirectories=/path/to/your/projectConfigs \ | ||
--useiodir=/input/directory,/output/directory/snv \ | ||
--cvalues="bamfile_list:/path/to/your/control.bam;/path/to/your/tumor.bam,sample_list:normal;tumor,possibleTumorSampleNamePrefixes:tumor,possibleControlSampleNamePrefixes:normal,REFERENCE_GENOME:/reference/data/GRCh38_decoy_ALT_HLA_PhiX.fa,CHROMOSOME_LENGTH_FILE:/reference/data/GRCh38_decoy_ALT_HLA_PhiX.chromSizes,extractSamplesFromOutputFiles:false" | ||
``` | ||
|
||
### No Control | ||
|
||
* Set the parameter `isNoControlWorkflow` to `true`. | ||
|
@@ -157,6 +175,20 @@ The optional configuration JSON file defaults to the `convertToStdVCF.json` resi | |
|
||
* patch: Remove all code related to PyPy and hts-python (including `copysam.py` and `PYPY_OR_PYTHON_BINARY`) | ||
|
||
|
||
* 3.0.0 | ||
|
||
* Major | ||
* Support for hg38/GRCh38 reference genome and variant calling from ALT and HLA contigs. | ||
* Minor | ||
* For hg38: Removing mappability and repeat elements' annotations from penalty calculations. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are there any new optional configuration values? (at least list them) |
||
* skipREMAP: Option to remove repeat elements and mappability from confidence annotations in hg19. | ||
* Removing EVS And ExAC AF from the annotations and no-control workflow filtering | ||
* Support for variant calling from CRAM files | ||
* Bug fix: Removing "quote" around the raw filter option `<RAW_SNV_FILTER_OPTIONS>` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. But this is a technical/code description. What was the effect of the bug to the user? (probably that multiple |
||
* Update `COWorkflowsBasePlugin` to `1.4.2` | ||
|
||
|
||
* 2.2.0 | ||
|
||
* minor: Update virtualenv | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
dependson=COWorkflowsBasePlugin:1.3.0 | ||
dependson=COWorkflowsBasePlugin:1.4.2 | ||
JDKVersion=1.8 | ||
GroovyVersion=2.4 | ||
RoddyAPIVersion=3.4 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
2.1 | ||
3.0 | ||
0 |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is passing over the VCF once. Not sure how much memory it uses, but maybe it is worthwhile limiting the memory with this approach. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -251,6 +251,10 @@ def parseVcf(file,num): | |
while (l!= ""): | ||
t=l.split('\t') | ||
if (t[0][0] != "#") and isValid(t): | ||
# Skipping the non-primary assembly variants from purity calculations | ||
if t[0].startswith('HLA') or t[0].endswith('_alt'): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
BTW (2 lines up): |
||
l=vcf.readline() | ||
continue | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is |
||
i = chromMap[t[0]] | ||
if (t[12]=="germline"): | ||
#DP5=string.split(string.split(t[11],";")[1],",") | ||
|
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,7 +31,7 @@ def calculateLogSize(size, max_val, base): | |
return math.log(((size/max_val)*(base-1.))+1., base) | ||
|
||
def calculateRootedSize(size, max_val): | ||
if(float(size) != 0.0): | ||
if(float(size) != 0.0 and float(max_val) != 0.0): | ||
return np.sqrt(size/max_val) | ||
else: | ||
return 0.0 | ||
|
@@ -199,6 +199,10 @@ def calculateErrorMatrix(vcfFilename, referenceFilename, errorType): | |
# 23.05.2016 JB: Excluded multiallelic SNVs | ||
if ',' in split_line[header.index("ALT")]: continue | ||
|
||
# 21.02.2023 NP: Excluded SNVs with 'N' before or after "," in context | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add a short word about why you exclude these? I guess it is not obvious because it was added to the SNV workflow only after years of operation. |
||
if {'N,', ',N'}.intersection(split_line[header.index("SEQUENCE_CONTEXT")]): | ||
continue | ||
|
||
chrom = split_line[header.index("CHROM")] | ||
pos = int(split_line[header.index("POS")]) | ||
context = "" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any new mandatory configuration values? At least list them. Details should be in the XML.
Was the semantics of conf. values changed?