Skip to content

Commit

Permalink
Merge pull request #211 from nf-core/dev
Browse files Browse the repository at this point in the history
PR for release 2.4.0
  • Loading branch information
ggabernet committed Dec 6, 2022
2 parents 8c1185f + 04c0566 commit a6fdad9
Show file tree
Hide file tree
Showing 31 changed files with 669 additions and 149 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/awsfulltest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,7 @@ jobs:
"outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/airrflow/results-${{ github.sha }}"
}
profiles: test_full,aws_tower
- uses: actions/upload-artifact@v3
with:
name: Tower debug log file
path: tower_action_*.log
4 changes: 4 additions & 0 deletions .github/workflows/awstest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,7 @@ jobs:
"outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/airrflow/results-test-${{ github.sha }}"
}
profiles: test,aws_tower
- uses: actions/upload-artifact@v3
with:
name: Tower debug log file
path: tower_action_*.log
1 change: 1 addition & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
email_template.html
adaptivecard.json
.nextflow*
work/
data/
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,14 @@
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).

## [2.4.0] 2022-12-05 "Aparecium"

### `Added`

- [#209](https://github.com/nf-core/airrflow/pull/209) Template update to nf-core tools v2.6.
- [#210](https://github.com/nf-core/airrflow/pull/210) Add fastp for read QC, adapter trimming and read clipping.
- [#212](https://github.com/nf-core/airrflow/pull/212) Bump versions to 2.4.0

## [2.3.0] - 2022-09-22 "Expelliarmus"

### `Added`
Expand Down
56 changes: 0 additions & 56 deletions CITATION.cff

This file was deleted.

4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

- [Fastp](https://doi.org/10.1093/bioinformatics/bty560)

> Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu, fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics. 2018 Sept 1; 34(17):i884–i890. doi: 10.1093/bioinformatics/bty560.
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ On release, automated continuous integration tests run the pipeline on a full-si

By default, the pipeline currently performs the following steps:

- Raw read quality control (`FastQC`)
- Raw read quality control, adapter trimming and read clipping (`fastp`)
- Pre-processing (`pRESTO`)
- Filtering sequences by sequencing quality.
- Masking amplicon primers.
Expand All @@ -35,6 +35,7 @@ By default, the pipeline currently performs the following steps:
- Assembling R1 and R2 read mates.
- Removing and annotating read duplicates with different UMI barcodes.
- Filtering out sequences that do not have at least 2 duplicates.
- Post-assembly read quality control (`FastQC`s)
- Assigning gene segment alleles with `IgBlast` using the IMGT database (`Change-O`).
- Finding the Hamming distance threshold for clone definition (`SHazaM`).
- Clonal assignment: defining clonal lineages of the B-cell / T-cell populations (`Change-O`).
Expand Down
67 changes: 67 additions & 0 deletions assets/adaptivecard.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
{
"type": "message",
"attachments": [
{
"contentType": "application/vnd.microsoft.card.adaptive",
"contentUrl": null,
"content": {
"\$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
"msteams": {
"width": "Full"
},
"type": "AdaptiveCard",
"version": "1.2",
"body": [
{
"type": "TextBlock",
"size": "Large",
"weight": "Bolder",
"color": "<% if (success) { %>Good<% } else { %>Attention<%} %>",
"text": "nf-core/airrflow v${version} - ${runName}",
"wrap": true
},
{
"type": "TextBlock",
"spacing": "None",
"text": "Completed at ${dateComplete} (duration: ${duration})",
"isSubtle": true,
"wrap": true
},
{
"type": "TextBlock",
"text": "<% if (success) { %>Pipeline completed successfully!<% } else { %>Pipeline completed with errors. The full error message was: ${errorReport}.<% } %>",
"wrap": true
},
{
"type": "TextBlock",
"text": "The command used to launch the workflow was as follows:",
"wrap": true
},
{
"type": "TextBlock",
"text": "${commandLine}",
"isSubtle": true,
"wrap": true
}
],
"actions": [
{
"type": "Action.ShowCard",
"title": "Pipeline Configuration",
"card": {
"type": "AdaptiveCard",
"\$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
"body": [
{
"type": "FactSet",
"facts": [<% out << summary.collect{ k,v -> "{\"title\": \"$k\", \"value\" : \"$v\"}"}.join(",\n") %>
]
}
]
}
}
]
}
}
]
}
25 changes: 25 additions & 0 deletions assets/methods_description_template.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
id: "nf-core-airrflow-methods-description"
description: "Suggested text and references to use when describing pipeline usage within the methods section of a publication."
section_name: "nf-core/airrflow Methods Description"
section_href: "https://github.com/nf-core/airrflow"
plot_type: "html"
## TODO nf-core: Update the HTML below to your prefered methods description, e.g. add publication citation for this pipeline
## You inject any metadata in the Nextflow '${workflow}' object
data: |
<h4>Methods</h4>
<p>Data was processed using nf-core/airrflow v${workflow.manifest.version} ${doi_text} of the nf-core collection of workflows (<a href="https://doi.org/10.1038/s41587-020-0439-x">Ewels <em>et al.</em>, 2020</a>).</p>
<p>The pipeline was executed with Nextflow v${workflow.nextflow.version} (<a href="https://doi.org/10.1038/nbt.3820">Di Tommaso <em>et al.</em>, 2017</a>) with the following command:</p>
<pre><code>${workflow.commandLine}</code></pre>
<h4>References</h4>
<ul>
<li>Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. <a href="https://doi.org/10.1038/nbt.3820">https://doi.org/10.1038/nbt.3820</a></li>
<li>Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. <a href="https://doi.org/10.1038/s41587-020-0439-x">https://doi.org/10.1038/s41587-020-0439-x</a></li>
</ul>
<div class="alert alert-info">
<h5>Notes:</h5>
<ul>
${nodoi_text}
<li>The command above does not include parameters contained in any configs or profiles that may have been used. Ensure the config file is also uploaded with your publication!</li>
<li>You should also cite all software used within this run. Check the "Software Versions" of this report to get version information.</li>
</ul>
</div>
6 changes: 4 additions & 2 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,11 @@ module_order:
- "./*_ASSEMBLED_fastqc.zip"

report_section_order:
software_versions:
"nf-core-airrflow-methods-description":
order: -1000
nf-core-airrflow-summary:
software_versions:
order: -1001
"nf-core-airrflow-summary":
order: -1002

export_plots: true
32 changes: 30 additions & 2 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,36 @@ process {
]
}

withName: FASTQC {
ext.args = '--quiet'
withName: 'FASTP' {
publishDir = [
[
path: { "${params.outdir}/fastp/${meta.id}" },
mode: params.publish_dir_mode,
pattern: "*.{html,json,log}"
],
[
enabled: params.save_trimmed,
path: { "${params.outdir}/fastp/${meta.id}/" },
mode: params.publish_dir_mode,
pattern: "*.fastp.fastq.gz"
]
]
ext.args = [ "--disable_quality_filtering --disable_length_filtering",
params.trim_fastq ?: "--disable_adapter_trimming",
params.clip_r1 > 0 ? "--trim_front1 ${params.clip_r1}" : "", // Remove bp from the 5' end of read 1
params.clip_r2 > 0 ? "--trim_front2 ${params.clip_r2}" : "", // Remove bp from the 5' end of read 2
params.three_prime_clip_r1 > 0 ? "--trim_tail1 ${params.three_prime_clip_r1}" : "", // Remove bp from the 3' end of read 1 AFTER adapter/quality trimming has been performed
params.three_prime_clip_r2 > 0 ? "--trim_tail2 ${params.three_prime_clip_r2}" : "", // Remove bp from the 3' end of read 2 AFTER adapter/quality trimming has been performed
params.trim_nextseq ? "--trim_poly_g" : "", // Apply the --nextseq=X option, to trim based on quality after removing poly-G tails
].join(" ").trim()
}

withName: 'GUNZIP_*' {
publishDir = [
[
enabled: false
]
]
}

withName: FASTQC_POSTASSEMBLY {
Expand Down
48 changes: 31 additions & 17 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The directories listed below will be created in the results directory after the

The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:

- [FastQC](#fastqc) - read quality control
- [FastP](#fastp) - read quality control, adapter trimming and read clipping
- [pRESTO](#presto) - read pre-processing
- [Filter by sequence quality](#filter-by-sequence-quality) - filter sequences by quality
- [Mask primers](#mask-primers) - Masking primers
Expand All @@ -21,6 +21,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Assemble mates](#assemble-mates) - Assemble sequence mates.
- [Remove duplicates](#remove-duplicates) - Remove and annotate read duplicates.
- [Filter sequences for at least 2 representative](#filter-sequences-for-at-least-2-representative) Filter sequences that do not have at least 2 duplicates.
- [FastQC](#fastqc) - read quality control post-assembly
- [Change-O](#change-o) - Assign genes and clonotyping
- [Assign genes with Igblast](#assign-genes-with-igblast)
- [Make database from assigned genes](#make-database-from-assigned-genes)
Expand All @@ -39,29 +40,20 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [MultiQC](#MultiQC) - MultiQC
- [Pipeline information](#pipeline-information) - Pipeline information

## FastQC
## Fastp

<details markdown="1">
<summary>Output files</summary>

- `fastqc/`
- `*_fastqc.html`: FastQC report containing quality metrics for the raw unmated reads.
- `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images for the raw unmated reads.
- `postassembly/`
- `*_ASSEMBLED_fastqc.html`: FastQC report containing quality metrics for the mated and quality filtered reads.
- `*_ASSEMBLED_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images for the mated and quality filtered reads.
- `fastp/`
- `<sample_id>/`
- `*.fastp.html`: Fast report containing quality metrics for the mated and quality filtered reads.
- `*.fastp.json`: Zip archive containing the FastQC report, tab-delimited data file and plot images for the mated and quality filtered reads.
- `*.fastp.log`: Fastp

</details>

[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).

![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png)

![MultiQC - FastQC mean quality scores plot](images/mqc_fastqc_quality.png)

![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png)

> **NB:** Two sets of FastQC plots are displayed in the MultiQC report: first for the raw _untrimmed_ and unmated reads and secondly for the assembled and QC filtered reads (but before collapsing duplicates). They may contain adapter sequence and potentially regions with low quality.
[fastp](https://doi.org/10.1093/bioinformatics/bty560) gives general quality metrics about your sequenced reads, as well as allows filtering reads by quality, trimming adapters and clipping reads at 5' or 3' ends. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [fastp documentation](https://github.com/OpenGene/fastp).

## presto

Expand Down Expand Up @@ -193,6 +185,28 @@ Remove duplicates using [CollapseSeq](https://presto.readthedocs.io/en/version-0

Remove sequences which do not have 2 representative using [SplitSeq](https://presto.readthedocs.io/en/version-0.5.11/tools/SplitSeq.html) from the pRESTO Immcantation toolset.

## FastQC

<details markdown="1">
<summary>Output files</summary>

- `fastqc/`
- `postassembly/`
- `*_ASSEMBLED_fastqc.html`: FastQC report containing quality metrics for the mated and quality filtered reads.
- `*_ASSEMBLED_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images for the mated and quality filtered reads.

</details>

[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).

![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png)

![MultiQC - FastQC mean quality scores plot](images/mqc_fastqc_quality.png)

![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png)

> **NB:** Two sets of FastQC plots are displayed in the MultiQC report: first for the raw _untrimmed_ and unmated reads and secondly for the assembled and QC filtered reads (but before collapsing duplicates). They may contain adapter sequence and potentially regions with low quality.
## Change-O

### Assign genes with Igblast
Expand Down
Loading

0 comments on commit a6fdad9

Please sign in to comment.