Merge pull request #211 from nf-core/dev

PR for release 2.4.0
nf-core · Dec 6, 2022 · a6fdad9 · a6fdad9
2 parents 8c1185f + 04c0566
commit a6fdad9
Show file tree

Hide file tree

Showing 31 changed files with 669 additions and 149 deletions.
diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml
@@ -25,3 +25,7 @@ jobs:
               "outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/airrflow/results-${{ github.sha }}"
             }
           profiles: test_full,aws_tower
+      - uses: actions/upload-artifact@v3
+        with:
+          name: Tower debug log file
+          path: tower_action_*.log
diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml
@@ -23,3 +23,7 @@ jobs:
               "outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/airrflow/results-test-${{ github.sha }}"
             }
           profiles: test,aws_tower
+      - uses: actions/upload-artifact@v3
+        with:
+          name: Tower debug log file
+          path: tower_action_*.log
diff --git a/.prettierignore b/.prettierignore
@@ -1,4 +1,5 @@
 email_template.html
+adaptivecard.json
 .nextflow*
 work/
 data/

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,6 +3,14 @@
 The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
 
+## [2.4.0] 2022-12-05 "Aparecium"
+
+### `Added`
+
+- [#209](https://github.com/nf-core/airrflow/pull/209) Template update to nf-core tools v2.6.
+- [#210](https://github.com/nf-core/airrflow/pull/210) Add fastp for read QC, adapter trimming and read clipping.
+- [#212](https://github.com/nf-core/airrflow/pull/212) Bump versions to 2.4.0
+
 ## [2.3.0] - 2022-09-22 "Expelliarmus"
 
 ### `Added`

diff --git a/CITATION.cff b/CITATION.cff
diff --git a/CITATIONS.md b/CITATIONS.md
@@ -12,6 +12,10 @@
 
 - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
 
+- [Fastp](https://doi.org/10.1093/bioinformatics/bty560)
+
+  > Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu, fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics. 2018 Sept 1; 34(17):i884–i890. doi: 10.1093/bioinformatics/bty560.
+
 - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
 
   > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

diff --git a/README.md b/README.md
@@ -24,7 +24,7 @@ On release, automated continuous integration tests run the pipeline on a full-si
 
 By default, the pipeline currently performs the following steps:
 
-- Raw read quality control (`FastQC`)
+- Raw read quality control, adapter trimming and read clipping (`fastp`)
 - Pre-processing (`pRESTO`)
   - Filtering sequences by sequencing quality.
   - Masking amplicon primers.
@@ -35,6 +35,7 @@ By default, the pipeline currently performs the following steps:
   - Assembling R1 and R2 read mates.
   - Removing and annotating read duplicates with different UMI barcodes.
   - Filtering out sequences that do not have at least 2 duplicates.
+- Post-assembly read quality control (`FastQC`s)
 - Assigning gene segment alleles with `IgBlast` using the IMGT database (`Change-O`).
 - Finding the Hamming distance threshold for clone definition (`SHazaM`).
 - Clonal assignment: defining clonal lineages of the B-cell / T-cell populations (`Change-O`).

diff --git a/assets/adaptivecard.json b/assets/adaptivecard.json
@@ -0,0 +1,67 @@
+{
+    "type": "message",
+    "attachments": [
+        {
+            "contentType": "application/vnd.microsoft.card.adaptive",
+            "contentUrl": null,
+            "content": {
+                "\$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
+                "msteams": {
+                    "width": "Full"
+                },
+                "type": "AdaptiveCard",
+                "version": "1.2",
+                "body": [
+                    {
+                        "type": "TextBlock",
+                        "size": "Large",
+                        "weight": "Bolder",
+                        "color": "<% if (success) { %>Good<% } else { %>Attention<%} %>",
+                        "text": "nf-core/airrflow v${version} - ${runName}",
+                        "wrap": true
+                    },
+                    {
+                        "type": "TextBlock",
+                        "spacing": "None",
+                        "text": "Completed at ${dateComplete} (duration: ${duration})",
+                        "isSubtle": true,
+                        "wrap": true
+                    },
+                    {
+                        "type": "TextBlock",
+                        "text": "<% if (success) { %>Pipeline completed successfully!<% } else { %>Pipeline completed with errors. The full error message was: ${errorReport}.<% } %>",
+                        "wrap": true
+                    },
+                    {
+                        "type": "TextBlock",
+                        "text": "The command used to launch the workflow was as follows:",
+                        "wrap": true
+                    },
+                    {
+                        "type": "TextBlock",
+                        "text": "${commandLine}",
+                        "isSubtle": true,
+                        "wrap": true
+                    }
+                ],
+                "actions": [
+                    {
+                        "type": "Action.ShowCard",
+                        "title": "Pipeline Configuration",
+                        "card": {
+                            "type": "AdaptiveCard",
+                            "\$schema": "http://adaptivecards.io/schemas/adaptive-card.json",
+                            "body": [
+                                {
+                                    "type": "FactSet",
+                                    "facts": [<% out << summary.collect{ k,v -> "{\"title\": \"$k\", \"value\" : \"$v\"}"}.join(",\n") %>
+                                    ]
+                                }
+                            ]
+                        }
+                    }
+                ]
+            }
+        }
+    ]
+}
diff --git a/assets/methods_description_template.yml b/assets/methods_description_template.yml
@@ -0,0 +1,25 @@
+id: "nf-core-airrflow-methods-description"
+description: "Suggested text and references to use when describing pipeline usage within the methods section of a publication."
+section_name: "nf-core/airrflow Methods Description"
+section_href: "https://github.com/nf-core/airrflow"
+plot_type: "html"
+## TODO nf-core: Update the HTML below to your prefered methods description, e.g. add publication citation for this pipeline
+## You inject any metadata in the Nextflow '${workflow}' object
+data: |
+  <h4>Methods</h4>
+  <p>Data was processed using nf-core/airrflow v${workflow.manifest.version} ${doi_text} of the nf-core collection of workflows (<a href="https://doi.org/10.1038/s41587-020-0439-x">Ewels <em>et al.</em>, 2020</a>).</p>
+  <p>The pipeline was executed with Nextflow v${workflow.nextflow.version} (<a href="https://doi.org/10.1038/nbt.3820">Di Tommaso <em>et al.</em>, 2017</a>) with the following command:</p>
+  <pre><code>${workflow.commandLine}</code></pre>
+  <h4>References</h4>
+  <ul>
+    <li>Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. <a href="https://doi.org/10.1038/nbt.3820">https://doi.org/10.1038/nbt.3820</a></li>
+    <li>Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. <a href="https://doi.org/10.1038/s41587-020-0439-x">https://doi.org/10.1038/s41587-020-0439-x</a></li>
+  </ul>
+  <div class="alert alert-info">
+    <h5>Notes:</h5>
+    <ul>
+      ${nodoi_text}
+      <li>The command above does not include parameters contained in any configs or profiles that may have been used. Ensure the config file is also uploaded with your publication!</li>
+      <li>You should also cite all software used within this run. Check the "Software Versions" of this report to get version information.</li>
+    </ul>
+  </div>
diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml
@@ -19,9 +19,11 @@ module_order:
         - "./*_ASSEMBLED_fastqc.zip"
 
 report_section_order:
-  software_versions:
+  "nf-core-airrflow-methods-description":
     order: -1000
-  nf-core-airrflow-summary:
+  software_versions:
     order: -1001
+  "nf-core-airrflow-summary":
+    order: -1002
 
 export_plots: true
diff --git a/conf/modules.config b/conf/modules.config
@@ -36,8 +36,36 @@ process {
             ]
         }
 
-        withName: FASTQC {
-            ext.args = '--quiet'
+        withName: 'FASTP' {
+            publishDir = [
+                [
+                    path: { "${params.outdir}/fastp/${meta.id}" },
+                    mode: params.publish_dir_mode,
+                    pattern: "*.{html,json,log}"
+                ],
+                [
+                    enabled: params.save_trimmed,
+                    path: { "${params.outdir}/fastp/${meta.id}/" },
+                    mode: params.publish_dir_mode,
+                    pattern: "*.fastp.fastq.gz"
+                ]
+            ]
+            ext.args = [ "--disable_quality_filtering --disable_length_filtering",
+                params.trim_fastq              ?: "--disable_adapter_trimming",
+                params.clip_r1 > 0             ? "--trim_front1 ${params.clip_r1}"            : "", // Remove bp from the 5' end of read 1
+                params.clip_r2   > 0           ? "--trim_front2 ${params.clip_r2}"            : "", // Remove bp from the 5' end of read 2
+                params.three_prime_clip_r1 > 0 ? "--trim_tail1 ${params.three_prime_clip_r1}" : "", // Remove bp from the 3' end of read 1 AFTER adapter/quality trimming has been performed
+                params.three_prime_clip_r2 > 0 ? "--trim_tail2 ${params.three_prime_clip_r2}" : "", // Remove bp from the 3' end of read 2 AFTER adapter/quality trimming has been performed
+                params.trim_nextseq            ? "--trim_poly_g"                              : "", // Apply the --nextseq=X option, to trim based on quality after removing poly-G tails
+            ].join(" ").trim()
+        }
+
+        withName: 'GUNZIP_*' {
+            publishDir = [
+                [
+                    enabled: false
+                ]
+            ]
         }
 
         withName: FASTQC_POSTASSEMBLY {

diff --git a/docs/output.md b/docs/output.md
@@ -10,7 +10,7 @@ The directories listed below will be created in the results directory after the
 
 The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
 
-- [FastQC](#fastqc) - read quality control
+- [FastP](#fastp) - read quality control, adapter trimming and read clipping
 - [pRESTO](#presto) - read pre-processing
   - [Filter by sequence quality](#filter-by-sequence-quality) - filter sequences by quality
   - [Mask primers](#mask-primers) - Masking primers
@@ -21,6 +21,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
   - [Assemble mates](#assemble-mates) - Assemble sequence mates.
   - [Remove duplicates](#remove-duplicates) - Remove and annotate read duplicates.
   - [Filter sequences for at least 2 representative](#filter-sequences-for-at-least-2-representative) Filter sequences that do not have at least 2 duplicates.
+- [FastQC](#fastqc) - read quality control post-assembly
 - [Change-O](#change-o) - Assign genes and clonotyping
   - [Assign genes with Igblast](#assign-genes-with-igblast)
   - [Make database from assigned genes](#make-database-from-assigned-genes)
@@ -39,29 +40,20 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
 - [MultiQC](#MultiQC) - MultiQC
 - [Pipeline information](#pipeline-information) - Pipeline information
 
-## FastQC
+## Fastp
 
 <details markdown="1">
 <summary>Output files</summary>
 
-- `fastqc/`
-  - `*_fastqc.html`: FastQC report containing quality metrics for the raw unmated reads.
-  - `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images for the raw unmated reads.
-  - `postassembly/`
-    - `*_ASSEMBLED_fastqc.html`: FastQC report containing quality metrics for the mated and quality filtered reads.
-    - `*_ASSEMBLED_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images for the mated and quality filtered reads.
+- `fastp/`
+  - `<sample_id>/`
+    - `*.fastp.html`: Fast report containing quality metrics for the mated and quality filtered reads.
+    - `*.fastp.json`: Zip archive containing the FastQC report, tab-delimited data file and plot images for the mated and quality filtered reads.
+    - `*.fastp.log`: Fastp
 
 </details>
 
-[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
-
-![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png)
-
-![MultiQC - FastQC mean quality scores plot](images/mqc_fastqc_quality.png)
-
-![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png)
-
-> **NB:** Two sets of FastQC plots are displayed in the MultiQC report: first for the raw _untrimmed_ and unmated reads and secondly for the assembled and QC filtered reads (but before collapsing duplicates). They may contain adapter sequence and potentially regions with low quality.
+[fastp](https://doi.org/10.1093/bioinformatics/bty560) gives general quality metrics about your sequenced reads, as well as allows filtering reads by quality, trimming adapters and clipping reads at 5' or 3' ends. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [fastp documentation](https://github.com/OpenGene/fastp).
 
 ## presto
 
@@ -193,6 +185,28 @@ Remove duplicates using [CollapseSeq](https://presto.readthedocs.io/en/version-0
 
 Remove sequences which do not have 2 representative using [SplitSeq](https://presto.readthedocs.io/en/version-0.5.11/tools/SplitSeq.html) from the pRESTO Immcantation toolset.
 
+## FastQC
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `fastqc/`
+  - `postassembly/`
+    - `*_ASSEMBLED_fastqc.html`: FastQC report containing quality metrics for the mated and quality filtered reads.
+    - `*_ASSEMBLED_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images for the mated and quality filtered reads.
+
+</details>
+
+[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).
+
+![MultiQC - FastQC sequence counts plot](images/mqc_fastqc_counts.png)
+
+![MultiQC - FastQC mean quality scores plot](images/mqc_fastqc_quality.png)
+
+![MultiQC - FastQC adapter content plot](images/mqc_fastqc_adapter.png)
+
+> **NB:** Two sets of FastQC plots are displayed in the MultiQC report: first for the raw _untrimmed_ and unmated reads and secondly for the assembled and QC filtered reads (but before collapsing duplicates). They may contain adapter sequence and potentially regions with low quality.
+
 ## Change-O
 
 ### Assign genes with Igblast