Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade simpleaf and alevinqc #361

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,14 @@

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

* [Alevin-fry](https://doi.org/10.1038/s41592-022-01408-3)

> He, D., Zakeri, M., Sarkar, H. et al. Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data. Nat Methods 19, 316–322 (2022).

* [Simpleaf](https://doi.org/10.1093/bioinformatics/btad614)

> He, D., Patro, R. simpleaf: a simple, flexible, and scalable framework for single-cell data processing using alevin-fry, Bioinformatics, Volume 39, Issue 10, October 2023, btad614.

* [Alevin](https://doi.org/10.1186/s13059-019-1670-y)

> Srivastava, A., Malik, L., Smith, T. et al. Alevin efficiently estimates accurate gene abundances from dscRNA-seq data. Genome Biol 20, 65 (2019).
Expand Down
Binary file modified docs/images/nf-core-scrnaseq_logo_light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,13 +86,13 @@ For details on how to load these into R and perform further downstream analysis,
**Output directory: `results/alevin`**

- `alevin`
- Contains the created Salmon Alevin pseudo-aligned output
- Contains the created alevin-fry pseudo-aligned output
- `alevinqc`
- Contains the QC report for the aforementioned Salmon Alevin output data

**Output directory: `results/reference_genome`**

- `salmon_index`
- `simpleaf_index`
- Contains the indexed reference transcriptome for Salmon Alevin
- `alevin/txp2gene.tsv`
- The transcriptome to gene mapping TSV file utilized by Salmon Alevin
Expand Down
10 changes: 5 additions & 5 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,15 @@ An [example samplesheet](../assets/samplesheet.csv) has been provided with the p

This parameter is currently supported by

- [Salmon Alevin](https://salmon.readthedocs.io/en/latest/alevin.html#expectcells)
- [Alevin-fry](https://alevin-fry.readthedocs.io/en/latest/generate_permit_list.html#:~:text=%2D%2Dexpect%2Dcells%20%3Cncells%3E)
- [STARsolo](https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md)
- [Cellranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger)

Note that since cellranger v7, it is **not recommended** anymore to supply the `--expected-cells` parameter.

## Aligning options

By default, the pipeline uses [Salmon Alevin](https://salmon.readthedocs.io/en/latest/alevin.html) (i.e. --aligner alevin) to perform pseudo-alignment of reads to the reference genome and to perform the downstream BAM-level quantification. Then QC reports are generated with AlevinQC.
By default, the pipeline uses [Alevin-fry](https://alevin-fry.readthedocs.io/en/latest/) (i.e. --aligner alevin) via [Simpleaf](https://simpleaf.readthedocs.io/en/latest/) to perform pseudo-alignment of reads to the reference genome and to perform the downstream BAM-level quantification. Then QC reports are generated with AlevinQC.

Other aligner options for running the pipeline are:

Expand Down Expand Up @@ -100,11 +100,11 @@ The command `kb --list` shows all supported, preconfigured protocols. Additional

For more details, please refer to the [Kallisto/bustools documentation](https://pachterlab.github.io/kallisto/manual#bus).

#### Alevin/fry
#### Alevin-fry

Alevin/fry also supports custom chemistries in a slighly different format, e.g. `1{b[16]u[12]x:}2{r:}`.
Alevin-fry also supports custom chemistries in a slightly different format, e.g. `1{b[16]u[12]x:}2{r:}`.

For more details, see the [simpleaf documentation](https://simpleaf.readthedocs.io/en/latest/quant-command.html#a-note-on-the-chemistry-flag)
For more details, see the [simpleaf documentation](https://simpleaf.readthedocs.io/en/latest/quant-command.html#a-note-on-the-chemistry-flag) and the [language specification](https://hackmd.io/@PI7Og0l1ReeBZu_pjQGUQQ/rJMgmvr13).

#### UniverSC

Expand Down
8 changes: 4 additions & 4 deletions modules/local/alevinqc.nf
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ process ALEVINQC {
label 'process_low'

//The alevinqc 1.14.0 container is broken, missing some libraries - thus reverting this to previous 1.12.1 version
conda "bioconda::bioconductor-alevinqc=1.12.1"
conda "bioconda::bioconductor-alevinqc=1.18.0"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/bioconductor-alevinqc:1.12.1--r41h9f5acd7_0' :
'biocontainers/bioconductor-alevinqc:1.12.1--r41h9f5acd7_0' }"
'https://depot.galaxyproject.org/singularity/bioconductor-alevinqc:1.18.0--r43hf17093f_0' :
'biocontainers/bioconductor-alevinqc:1.18.0--r43hf17093f_0' }"

input:
tuple val(meta), path(alevin_results)
Expand Down Expand Up @@ -43,4 +43,4 @@ process ALEVINQC {
"versions.yml"
)
"""
}
}
13 changes: 7 additions & 6 deletions modules/local/simpleaf_index.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ process SIMPLEAF_INDEX {
tag "$transcript_gtf"
label "process_medium"

conda 'bioconda::simpleaf=0.10.0-1'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to switch to the central nf-core/modules version at this point (see also #296)?

There are modules for simpleaf_index and simpleaf_quant already, they might also need slight updates though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modules look good and switching to that would reduce the maintenance burden of simpleaf for nf-core.

Should I submit a PR there? How should I merge the changes there back to scrnaseq?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes please! Once the module PRs are merged, you can simply install them with the nf-core tools CLI that you also use for linting:

nf-core modules install

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Thanks for the information. I will work on this next week! BTW, the other account, an-altosian, is also me :)

conda 'bioconda::simpleaf=0.17.2-0'
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/simpleaf:0.10.0--h9f5acd7_1' :
'biocontainers/simpleaf:0.10.0--h9f5acd7_1' }"
'https://depot.galaxyproject.org/singularity/simpleaf:0.17.2--h919a2d8_0' :
'biocontainers/simpleaf:0.17.2--h919a2d8_0' }"

input:
path genome_fasta
Expand All @@ -14,7 +14,7 @@ process SIMPLEAF_INDEX {

output:
path "salmon/index" , emit: index
path "salmon/ref/*_t2g_3col.tsv" , emit: transcript_tsv
path "salmon/ref/*t2g_3col.tsv" , emit: transcript_tsv
path "versions.yml" , emit: versions
path "salmon" , emit: salmon

Expand All @@ -23,7 +23,8 @@ process SIMPLEAF_INDEX {

script:
def args = task.ext.args ?: ''
def seq_inputs = (params.transcript_fasta) ? "--refseq $transcript_fasta" : "--gtf $transcript_gtf"
def seq_inputs = (params.transcript_fasta) ? "--refseq $transcript_fasta" : "--fasta $genome_fasta --gtf $transcript_gtf"
def no_piscem = (params.no_piscem) ? '--no-piscem' : ''
"""
# export required var
export ALEVIN_FRY_HOME=.
Expand All @@ -36,8 +37,8 @@ process SIMPLEAF_INDEX {
simpleaf \\
index \\
--threads $task.cpus \\
--fasta $genome_fasta \\
$seq_inputs \\
$no_piscem \\
$args \\
-o salmon

Expand Down
9 changes: 6 additions & 3 deletions modules/local/simpleaf_quant.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ process SIMPLEAF_QUANT {
tag "$meta.id"
label 'process_high'

conda 'bioconda::simpleaf=0.10.0-1'
conda 'bioconda::simpleaf=0.17.2-0'
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/simpleaf:0.10.0--h9f5acd7_1' :
'biocontainers/simpleaf:0.10.0--h9f5acd7_1' }"
'https://depot.galaxyproject.org/singularity/simpleaf:0.17.2--h919a2d8_0' :
'biocontainers/simpleaf:0.17.2--h919a2d8_0' }"

input:
//
Expand All @@ -29,6 +29,8 @@ process SIMPLEAF_QUANT {
def args = task.ext.args ?: ''
def args_list = args.tokenize()
def prefix = task.ext.prefix ?: "${meta.id}"
// selective alignment is only available in salmon
def use_selective_alignment = (params.no_piscem && params.use_selective_alignment) ? '-s' : ''

//
// check if users are using one of the mutually excludable parameters:
Expand Down Expand Up @@ -70,6 +72,7 @@ process SIMPLEAF_QUANT {
-c "$protocol" \\
$expect_cells \\
$unfiltered_command \\
$use_selective_alignment \\
$args

$save_whitelist
Expand Down
4 changes: 3 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,9 @@ params {
// salmon alevin parameters (simpleaf)
simpleaf_rlen = 91
barcode_whitelist = null
salmon_index = null
simpleaf_index = null
no_piscem = false
use_selective_alignment = false

// kallisto bustools parameters
kallisto_index = null
Expand Down
14 changes: 12 additions & 2 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -158,9 +158,9 @@
"description": "",
"default": "",
"properties": {
"salmon_index": {
"simpleaf_index": {
"type": "string",
"description": "This can be used to specify a precomputed Salmon index in the pipeline, in order to skip the generation of required indices by Salmon itself.",
"description": "This can be used to specify a precomputed Simpleaf index in the pipeline, in order to skip the generation of required indices by Simpleaf itself.",
"fa_icon": "fas fa-fish",
"format": "path",
"exists": true
Expand All @@ -178,6 +178,16 @@
"default": 91,
"description": "It is the target read length the index will be built for, using simpleaf.",
"fa_icon": "fas fa-map-marked-alt"
},
"no_piscem": {
"type": "boolean",
"fa_icon": "fas fa-map-marked-alt",
"description": "Don't use the default piscem mapper, instead use salmon-alevin"
},
"use_selective_alignment": {
"type": "boolean",
"fa_icon": "fas fa-map-marked-alt",
"description": "Use selective-alignment for mapping instead of pseudoalignment with structural constraints (only if using salmon alevin as the underlying mapper)."
}
}
},
Expand Down
10 changes: 5 additions & 5 deletions subworkflows/local/alevin.nf
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ workflow SCRNASEQ_ALEVIN {
genome_fasta
gtf
transcript_fasta
salmon_index
simpleaf_index
txp2gene
barcode_whitelist
protocol
Expand All @@ -26,16 +26,16 @@ workflow SCRNASEQ_ALEVIN {
main:
ch_versions = Channel.empty()

assert (genome_fasta && gtf && salmon_index && txp2gene) || (genome_fasta && gtf) || (genome_fasta && gtf && transcript_fasta && txp2gene):
assert (genome_fasta && gtf && simpleaf_index && txp2gene) || (genome_fasta && gtf) || (genome_fasta && gtf && transcript_fasta && txp2gene):
"""Must provide a genome fasta file ('--fasta') and a gtf file ('--gtf'), or a genome fasta file
and a transcriptome fasta file ('--transcript_fasta`) if no index and txp2gene is given!""".stripIndent()

/*
* Build salmon index
*/
if (!salmon_index) {
if (!simpleaf_index) {
SIMPLEAF_INDEX( genome_fasta, transcript_fasta, gtf )
salmon_index = SIMPLEAF_INDEX.out.index.collect()
simpleaf_index = SIMPLEAF_INDEX.out.index.collect()
transcript_tsv = SIMPLEAF_INDEX.out.transcript_tsv.collect()
ch_versions = ch_versions.mix(SIMPLEAF_INDEX.out.versions)

Expand All @@ -51,7 +51,7 @@ workflow SCRNASEQ_ALEVIN {
*/
SIMPLEAF_QUANT (
ch_fastq,
salmon_index,
simpleaf_index,
txp2gene,
protocol,
barcode_whitelist
Expand Down
4 changes: 2 additions & 2 deletions workflows/scrnaseq.nf
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ workflow SCRNASEQ {
kb_workflow = params.kb_workflow

//salmon params
ch_salmon_index = params.salmon_index ? file(params.salmon_index) : []
ch_simpleaf_index = params.simpleaf_index ? file(params.simpleaf_index) : []

//star params
star_index = params.star_index ? file(params.star_index, checkIfExists: true) : null
Expand Down Expand Up @@ -147,7 +147,7 @@ workflow SCRNASEQ {
ch_genome_fasta,
ch_filter_gtf,
ch_transcript_fasta,
ch_salmon_index,
ch_simpleaf_index,
ch_txp2gene,
ch_barcode_whitelist,
protocol_config['protocol'],
Expand Down
Loading