Skip to content

Commit

Permalink
Merge pull request #13 from itrujnara/dev
Browse files Browse the repository at this point in the history
Multiple tweaks
  • Loading branch information
JoseEspinosa committed May 6, 2024
2 parents 0f5159b + 8fc2254 commit 61581ea
Show file tree
Hide file tree
Showing 59 changed files with 721 additions and 189 deletions.
46 changes: 42 additions & 4 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,51 @@
## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
- [OMA](htpps://omabrowser.org)

> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
> Adrian M Altenhoff, Clément-Marie Train, Kimberly J Gilbert, Ishita Mediratta, Tarcisio Mendes de Farias, David Moi, Yannis Nevers, Hale-Seda Radoykova, Victor Rossier, Alex Warwick Vesztrocy, Natasha M Glover, Christophe Dessimoz, OMA orthology in 2021: website overhaul, conserved isoforms, ancestral gene order and more, Nucleic Acids Research, Volume 49, Issue D1, 8 January 2021, Pages D373–D379, https://doi.org/10.1093/nar/gkaa1007
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
- [PANTHER](https://pantherdb.org)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
> Thomas PD, Ebert D, Muruganujan A, Mushayahama T, Albou L-P, Mi H. PANTHER: Making genome-scale phylogenetics accessible to all. Protein Science. 2022; 31: 8–22. https://doi.org/10.1002/pro.4218
- [OrthoInspector](https://lbgi.fr/orthoinspector)

> Yannis Nevers, Arnaud Kress, Audrey Defosset, Raymond Ripp, Benjamin Linard, Julie D Thompson, Olivier Poch, Odile Lecompte, OrthoInspector 3.0: open portal for comparative genomics, Nucleic Acids Research, Volume 47, Issue D1, 08 January 2019, Pages D411–D418, https://doi.org/10.1093/nar/gky1068
- [EggNOG](https://eggnog5.embl.de)

> Jaime Huerta-Cepas, Damian Szklarczyk, Davide Heller, Ana Hernández-Plaza, Sofia K Forslund, Helen Cook, Daniel R Mende, Ivica Letunic, Thomas Rattei, Lars J Jensen, Christian von Mering, Peer Bork, eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses, Nucleic Acids Research, Volume 47, Issue D1, 08 January 2019, Pages D309–D314, https://doi.org/10.1093/nar/gky1085
- [UniProt](https://uniprot.org)

> The UniProt Consortium , UniProt: the Universal Protein Knowledgebase in 2023, Nucleic Acids Research, Volume 51, Issue D1, 6 January 2023, Pages D523–D531, https://doi.org/10.1093/nar/gkac1052
- [UniProt ID Mapping](https://uniprot.org/id-mapping)

> Huang H, McGarvey PB, Suzek BE, Mazumder R, Zhang J, Chen Y, Wu CH. A comprehensive protein-centric ID mapping service for molecular data integration. Bioinformatics. 2011 Apr 15;27(8):1190-1. doi: 10.1093/bioinformatics/btr101. PMID: 21478197; PMCID: PMC3072559.
- [AlphaFold](https://deepmind.google/technologies/alphafold)

> Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2
- [AlphaFold Database](https://alphafold.ebi.ac.uk)

> Mihaly Varadi, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, Oana Stroe, Gemma Wood, Agata Laydon, Augustin Žídek, Tim Green, Kathryn Tunyasuvunakool, Stig Petersen, John Jumper, Ellen Clancy, Richard Green, Ankur Vora, Mira Lutfi, Michael Figurnov, Andrew Cowie, Nicole Hobbs, Pushmeet Kohli, Gerard Kleywegt, Ewan Birney, Demis Hassabis, Sameer Velankar, AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models, Nucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D439–D444, https://doi.org/10.1093/nar/gkab1061
- [T-COFFEE](https://tcoffee.org)

> Notredame C, Higgins DG, Heringa J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000 Sep 8;302(1):205-17. doi: 10.1006/jmbi.2000.4042. PMID: 10964570.
- [IQTREE](https://iqtree.org)

> B.Q. Minh, H.A. Schmidt, O. Chernomor, D. Schrempf, M.D. Woodhams, A. von Haeseler, R. Lanfear (2020) IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol., 37:1530-1534. https://doi.org/10.1093/molbev/msaa015
> D.T. Hoang, O. Chernomor, A. von Haeseler, B.Q. Minh, L.S. Vinh (2018) UFBoot2: Improving the ultrafast bootstrap approximation. Mol. Biol. Evol., 35:518–522. https://doi.org/10.1093/molbev/msx281
- [FastME](https://atgc-montpellier.fr/fastme/)

> Vincent Lefort, Richard Desper, Olivier Gascuel, FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program, Molecular Biology and Evolution, Volume 32, Issue 10, October 2015, Pages 2798–2800, https://doi.org/10.1093/molbev/msv150
## Software packaging/containerisation tools

Expand Down
16 changes: 4 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,6 @@

![nf-core-reportho tube map](docs/images/reportho_tube_map.svg?raw=true "nf-core-reportho tube map")

<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->

1. **Obtain Query Information**: (depends on provided input) identification of Uniprot ID and taxon ID for the query or its closest homolog.
2. **Fetch Orthologs**: fetching of ortholog predictions from public databases, either through API or from local snapshot.
3. **Compare and Assemble**: calculation of agreement statistics, creation of ortholog lists, selection of the consensus list.
Expand Down Expand Up @@ -66,8 +64,6 @@ If using the latter format, you must set `--uniprot_query` to true.

Now, you can run the pipeline using:

<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->

```bash
nextflow run nf-core/reportho \
-profile <docker/singularity/.../institute> \
Expand All @@ -89,15 +85,13 @@ For more details about the output files and reports, please refer to the

## Credits

nf-core/reportho was originally written by itrujnara.
nf-core/reportho was originally written by Igor Trujnara (@itrujnara).

We thank the following people for their extensive assistance in the development of this pipeline:

@lsantus

@avignoli

@JoseEspinosa
- Luisa Santus (@lsantus)
- Alessio Vignoli (@avignoli)
- Jose Espinosa-Carrasco (@JoseEspinosa)

## Contributions and Support

Expand All @@ -110,8 +104,6 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
<!-- If you use nf-core/reportho for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->

<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.

You can cite the `nf-core` publication as follows:
Expand Down
5 changes: 2 additions & 3 deletions assets/samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
sample,fastq_1,fastq_2
SAMPLE_PAIRED_END,/path/to/fastq/files/AEG588A1_S1_L002_R1_001.fastq.gz,/path/to/fastq/files/AEG588A1_S1_L002_R2_001.fastq.gz
SAMPLE_SINGLE_END,/path/to/fastq/files/AEG588A4_S4_L003_R1_001.fastq.gz,
id,query
BicD2,Q8TD16
3 changes: 3 additions & 0 deletions bin/clustal2fasta.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

from Bio import SeqIO
Expand Down
3 changes: 3 additions & 0 deletions bin/clustal2phylip.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

from Bio import SeqIO
Expand Down
3 changes: 3 additions & 0 deletions bin/csv_adorn.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys


Expand Down
3 changes: 3 additions & 0 deletions bin/ensembl2uniprot.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

import requests
Expand Down
3 changes: 3 additions & 0 deletions bin/fetch_afdb_structures.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

import requests
Expand Down
3 changes: 3 additions & 0 deletions bin/fetch_inspector_group.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

import requests
Expand Down
3 changes: 3 additions & 0 deletions bin/fetch_oma_by_sequence.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys
from warnings import warn

Expand Down
3 changes: 3 additions & 0 deletions bin/fetch_oma_group.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

import requests
Expand Down
3 changes: 3 additions & 0 deletions bin/fetch_oma_groupid.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

from utils import fetch_seq
Expand Down
3 changes: 3 additions & 0 deletions bin/fetch_oma_taxid_by_id.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

from utils import fetch_seq
Expand Down
3 changes: 3 additions & 0 deletions bin/fetch_panther_group.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

import requests
Expand Down
3 changes: 3 additions & 0 deletions bin/fetch_sequences.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

import requests
Expand Down
3 changes: 3 additions & 0 deletions bin/filter_fasta.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

from Bio import SeqIO
Expand Down
3 changes: 3 additions & 0 deletions bin/get_oma_version.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import requests


Expand Down
3 changes: 3 additions & 0 deletions bin/make_score_table.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import csv
import re
import sys
Expand Down
3 changes: 3 additions & 0 deletions bin/make_stats.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import csv
import sys

Expand Down
3 changes: 3 additions & 0 deletions bin/map_uniprot.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

from ensembl2uniprot import ensembl2uniprot
Expand Down
3 changes: 3 additions & 0 deletions bin/oma2uniprot_local.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import gzip
import sys

Expand Down
18 changes: 11 additions & 7 deletions bin/plot_orthologs.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env Rscript

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

suppressMessages(library(ggplot2))
suppressMessages(library(reshape2))
suppressMessages(library(dplyr))
Expand All @@ -15,6 +18,7 @@ if (length(args) < 2) {
# Styles
text_color <- "#DDDDDD"
bg_color <- "transparent"
font_size <- 16

# Load the data
data <- read.csv(args[1], header = TRUE, stringsAsFactors = FALSE)
Expand All @@ -38,9 +42,9 @@ p <- ggplot(melted_crosstable, aes(x = method, y = count, fill = score)) +
labs(title = "Support for predictions", x = "Database", y = "Number of orthologs", fill = "Support") +
scale_fill_manual(values = c("#59B4C3", "#74E291", "#8F7AC2", "#EFF396", "#FF9A8D")) +
theme(legend.position = "right",
text = element_text(size = 12, color = text_color),
axis.text.x = element_text(color = text_color),
axis.text.y = element_text(color = text_color),
text = element_text(size = font_size, color = text_color),
axis.text.x = element_text(size = font_size, color = text_color),
axis.text.y = element_text(size = font_size, color = text_color),
plot.background = element_rect(color = bg_color, fill = bg_color),
panel.background = element_rect(color = bg_color, fill = bg_color))

Expand All @@ -54,7 +58,7 @@ for (i in colnames(data)[4:ncol(data)-1]) {
}
venn.plot <- ggVennDiagram(venn.data, set_color = text_color) +
theme(legend.position = "none",
text = element_text(size = 12, color = text_color),
text = element_text(size = font_size, color = text_color),
plot.background = element_rect(color = bg_color, fill = bg_color),
panel.background = element_rect(color = bg_color, fill = bg_color))
ggsave(paste0(args[2], "_venn.png"), plot = venn.plot, width = 6, height = 6, dpi = 300)
Expand All @@ -81,9 +85,9 @@ p <- ggplot(jaccard, aes(x = method1, y = method2, fill = jaccard)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Jaccard Index", x = "", y = "", fill = "Jaccard Index") +
theme(legend.position = "right",
text = element_text(size = 12, color = text_color),
axis.text.x = element_text(color = text_color),
axis.text.y = element_text(color = text_color),
text = element_text(size = font_size, color = text_color),
axis.text.x = element_text(size = font_size, color = text_color),
axis.text.y = element_text(size = font_size, color = text_color),
plot.background = element_rect(color = bg_color, fill = bg_color),
panel.background = element_rect(color = bg_color, fill = bg_color))

Expand Down
3 changes: 3 additions & 0 deletions bin/plot_tree.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env Rscript

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

library(treeio)
library(ggtree)
library(ggplot2)
Expand Down
3 changes: 3 additions & 0 deletions bin/refseq2uniprot.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

import requests
Expand Down
3 changes: 3 additions & 0 deletions bin/score_hits.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import csv
import sys

Expand Down
3 changes: 3 additions & 0 deletions bin/uniprot2oma_local.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import gzip
import sys

Expand Down
3 changes: 3 additions & 0 deletions bin/uniprot2uniprot.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

import requests
Expand Down
3 changes: 3 additions & 0 deletions bin/uniprotize_oma_local.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import gzip
import sys

Expand Down
3 changes: 3 additions & 0 deletions bin/uniprotize_oma_online.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#!/usr/bin/env python3

# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details

import sys

from map_uniprot import map_uniprot
Expand Down
4 changes: 4 additions & 0 deletions bin/utils.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# Written by Igor Trujnara, released under the MIT license
# See https://opensource.org/license/mit for details
# Includes code written by UniProt contributors published under CC-BY 4.0 license

import time
from typing import Any

Expand Down
Loading

0 comments on commit 61581ea

Please sign in to comment.