Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

valX files vs trimmed files? diff output same code? #162

Open
desmodus1984 opened this issue Apr 27, 2023 · 4 comments
Open

valX files vs trimmed files? diff output same code? #162

desmodus1984 opened this issue Apr 27, 2023 · 4 comments

Comments

@desmodus1984
Copy link

Hi,

I want to trim EM-SEQ fastq files.
I used the same code, first for a single pair, and then for a batch.
The code for the first pair was:

trim_galore --2colour 20 --illumina -o trim --paired V00001_R1.fastq.gz V00001_R2.fastq.gz

and the output was:
V00001_R1_val_1.fq.gz
V00001_R2_val_2.fq.gz

The summary stated trimming mode - paired end:

SUMMARISING RUN PARAMETERS

Input filename: V00001_R1.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.10
Cutadapt version: 1.18
Number of cores used for trimming: 1
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; user defined)
2-colour high quality G-trimming enabled, with quality cutoff: --nextseq-trim=20
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Output file will be GZIP compressed

Then, for a second pair I used the code:

trim_galore --2colour 20 --illumina --output_dir=trim -j 4 --paired V00021_R1.fastq.gz V00021_R2.fastq.gz

The output files were:
V00021_R1_trimmed.fq.gz
V00021_R2_trimmed.fq.gz

And the summary:

SUMMARISING RUN PARAMETERS

Input filename: V00021_R1.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.10
Cutadapt version: 1.18
Python version: could not detect
Number of cores used for trimming: 4
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; user defined)
2-colour high quality G-trimming enabled, with quality cutoff: --nextseq-trim=20
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Output file will be GZIP compressed

Why the first pair had the prefix val* while the second just trimmed?

Is there something in the code that I didn't know or was it an effect of using multithreaded mode?

Thanks;

@FelixKrueger
Copy link
Owner

If you still have files called *trimmed.fq.gz around in paired-end mode, it is likely that the run hasn't completely finished. Once the validation process is complete, both intermediate trimmed.fq.gz files will be deleted.

As a side note, if this trimming is for methylation alignments, I would recommend the trimming setting described here: http://felixkrueger.github.io/Bismark/bismark/library_types/#em-seq-neb

@tamuanand
Copy link

tamuanand commented May 18, 2023

Hi @FelixKrueger

Related questions specific to EM-Seq:

  1. I assume one has to explicitly use trim_galore first on the R1/R2 files and then pass the trimmed R1/R2 files to bismark
  2. Based on your comment above, should I explicitly call out --clip_R1 10 --clip_R2 10 --three_prime_clip_R1 10 --three_prime_clip_R2 10 when using trim_galore or should I not - the legend below the table at https://felixkrueger.github.io/Bismark/bismark/library_types/ suggests Default settings (nothing in particular is required, just use Trim Galore or Bismark default parameters)
  3. If OK with you, would you know what would be the equivalent command with bbduk.sh - given that bbduk is java based, I would expect this step will be much faster

Thanks.

@FelixKrueger
Copy link
Owner

FelixKrueger commented May 19, 2023

You don't necessarily have to use Trim Galore, but yes some trimming is recommended. the nf-core/methylseq pipeline has an EM-seq switch which should work equally:

--EM-seq

@tamuanand
Copy link

I think this still uses Trim Galore under the hood

the nf-core/methylseq pipeline has an EM-seq switch which should work equally:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants