Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trimgalore and cutadapt in two steps? #147

Open
marwa38 opened this issue Nov 21, 2022 · 6 comments
Open

trimgalore and cutadapt in two steps? #147

marwa38 opened this issue Nov 21, 2022 · 6 comments

Comments

@marwa38
Copy link

marwa38 commented Nov 21, 2022

Hi team
I know that trimgalore is a wrapper around cutadapt but I want to check what I ran is considered fine
I ran trimgalore to remove adaptors and primers at 5' and 3' ends but then

trim_galore -q 30 --phred33 --paired --gzip --fastqc --fastqc_args "--nogroup" -o ./trimGalored --length 20 --stringency 1 -e 0 --clip_R1 17 --clip_R2 21 *_S"$i"_L001_R1_001.fastq.gz *_S"$i"_L001_R2_001.fastq.gz

I found that I need to check for primers within the sequences themselves and I found and I removed them
Things are ok that I did my analysis in that way? I don't want to reran the analysis again but I want to make sure that this is fine.

# Place filtered files in filtN/ subdirectory
fnFs.filtN <- file.path(path, "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory
fnRs.filtN <- file.path(path, "filtN", basename(fnRs))
filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = TRUE)

#Check if primers are still inside (N, W and H nucleotides are identified by cutadapt)
FWD <- "CCTACGGGNGGCWGCAG"  
REV <- "GACTACHVGGGTATCTAATCC"

allOrients <- function(primer) {
  # Create all orientations of the input sequence
  require(Biostrings)
  dna <- DNAString(primer)  # The Biostrings works w/ DNAString objects rather than character vectors
  orients <- c(Forward = dna, Complement = complement(dna), Reverse = reverse(dna), 
               RevComp = reverseComplement(dna))
  return(sapply(orients, toString))  # Convert back to character vector
}
FWD.orients <- allOrients(FWD)
REV.orients <- allOrients(REV)
FWD.orients

# now ready to count the number of times the primers appear in the forward and reverse read, while considering all possible primer orientations. 
#Identifying and counting the primers on one set of paired end FASTQ files is sufficient, assuming all the files were created using the same library preparation, so we’ll just process the first sample.

primerHits <- function(primer, fn) {
  # Counts number of reads in which the primer is found
  nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE)
  return(sum(nhits > 0))
}
library (ShortRead)
rbind(FWD.ForwardReads = sapply(FWD.orients, primerHits, fn = fnFs.filtN[[1]]), 
      FWD.ReverseReads = sapply(FWD.orients, primerHits, fn = fnRs.filtN[[1]]), 
      REV.ForwardReads = sapply(REV.orients, primerHits, fn = fnFs.filtN[[1]]), 
      REV.ReverseReads = sapply(REV.orients, primerHits, fn = fnRs.filtN[[1]]))
#                   Forward Complement Reverse RevComp
# FWD.ForwardReads       0          0       0       0
# FWD.ReverseReads       0          0       0     112
# REV.ForwardReads       0          0       0       8
# REV.ReverseReads      99          0       0       0

@FelixKrueger
Copy link
Owner

Hi @marwa38

I am just trying to understand exactly what you were trying to achieve here. Just to say, Trim Galore attempts to identify the adapter used in your system, by looking for exact matches either the Illumina standard, Nextera or Illumina small RNA adapters. If there is a clear 'winner', that adapter sequence is selected for removal (without any IUPAC ambiguity codes).

Also, the sequences you specified are lacking the A from the A-tailing procedure. In other words, Trim Galore is not looking for these sequences, as they tend not to be read-through contaminations that inhibit read mapping, but often result from primer-dimers and the like. Such sequences do not align to any genome, so they get effectively pruned by the alignment process itself.

Is this the answer you were looking for?

@marwa38
Copy link
Author

marwa38 commented Nov 23, 2022

Thank you for the discussion @FelixKrueger

Trim Galore attempts to identify the adapter used in your system,

Yeah I was having Nextera adaptors which is good trimGalore have removed.

Also, the sequences you specified are lacking the A from the A-tailing procedure.

I didn't mention any sequences in trimGalore code
trim_galore -q 30 --phred33 --paired --gzip --fastqc --fastqc_args "--nogroup" -o ./trimGalored --length 20 --stringency 1 -e 0 --clip_R1 17 --clip_R2 21 *_S"$i"_L001_R1_001.fastq.gz *_S"$i"_L001_R2_001.fastq.gz

What do you mean regarding "A"? you mean that I can't use V3-V4 primers that I above mentioned in my cutadapt codes to remove primers? you mean that TrimGalore --clip_R1 17 --clip_R2 21 have already removed all primers?

@FelixKrueger
Copy link
Owner

I was referring the code snippets in the original post:

#Check if primers are still inside (N, W and H nucleotides are identified by cutadapt)
FWD <- "CCTACGGGNGGCWGCAG"  
REV <- "GACTACHVGGGTATCTAATCC"

but at second glance these sequences to not appear to be related to adapters after all - so please just ignore the comment about A-tailing.

I suppose my question is: why do you want to remove primer sequences at all? Trim Galore is really only meant to remove read-through adapter contamination and poor sequences from the 3'-end. Do you have a setup that requires you to remove and filter a lot more, including biases at the starts as well as ends of sequences, adapter contamination plus primer sequences?

@marwa38
Copy link
Author

marwa38 commented Nov 30, 2022

Thanks for your reply and sorry for responding after a week @FelixKrueger .
I think that removing primers along with the adaptors will make my dataset cleaner as part of the quality control. Is it a problem to remove primers?
I am following the primer removal pipeline from here https://benjjneb.github.io/dada2/ITS_workflow.html

Thank you
M

@FelixKrueger
Copy link
Owner

I just took a look at the dada2 workflow for ITS amplicon sequencing. I have to admit that I really don't feel that I know enough about 16S sequencing to give much advice on what to do, but it would appear that trimming of the amplicon primer sequences seems necessary. This doesn't necessarily mean that you couldn't accomplish your goals with Trim Galore, but you would have to know exactly what you want to be doing. Maybe it would just be easier to follow their workflow instead?

@marwa38
Copy link
Author

marwa38 commented Dec 5, 2022

Thanks for your comments
Just for more clarification;
I removed the Nexetra adaptors using trimGalore as I don't know their sequences
but I removed the primers using cutadapt as I know their sequences

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants