Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to specify non-internal/partial adapter/primer sequences to be trimmed when supplying a fasta file input #807

Open
molly-hetheringtonrauth opened this issue Sep 9, 2024 · 1 comment

Comments

@molly-hetheringtonrauth
Copy link

How do I specify to trim partial sequences when supplying a fasta file of the sequences to be trimmed?

I have amplicon sequencing data and I want to use cutadapt to remove FWD and REV amplicon primer sequences. I've read through the recipe in the documentation (https://cutadapt.readthedocs.io/en/stable/recipes.html#trimming-amplicon-primers-from-paired-end-reads), which anchors the primer sequences; however what if the primer sequence is only partial (e.g. APTORmysequence ---> mysequence, removes APTOR even though AD is missing if the fill primer sequence was ADAPTOR). I'm asking because I used the anchoring method and not all primer sequences were removed, so I'm thinking there might be partial primer sequences at the beginning of the reads.

When specifying a file, I see in the documentation how to anchor the sequences:

cutadapt -g ^ATTCCGTAC # if there was no file specified
cutadapt -g ^file:primer.fa # when a file is specified

However, when I tried to apply the same logic for non-internal/partical adapter sequences:

cutadapt -a XATTCCGTAC # if there was no file specified
cutadapt -g Xfile:primer.fa # when a file is specified

# my full command with bash variables
cutadapt --cores=4 -g ^file:${FWD} -G ^file:${REV} -o ${out1} -p ${out2} ${R1} ${R2}

I get the following error:

"
Character 'F' in adapter sequence 'FNLE:/HOME/FWD.PRNMERS.FASTA' is not a valid IUPAC code. Use only characters 'ABCDGHIKMNRSTUVWXY'.
"
Something weird about the error is that FWD.PRNMERS.FASTA is not the name of the file I specified. The name is FWD.primers.fasta (there is an "N" substituted for the "i" in primers).

How do I specify to trim partial sequences when supplying a fasta file of the sequences to be trimmed?

Versions
cutadpat v4.9
python v3.10.14
installed via conda

@marcelm
Copy link
Owner

marcelm commented Sep 9, 2024

Hi, the Xfile: syntax is not supported at the moment (see #361). You would need to manually add the X to each sequence in your fwd.primers.fasta file instead.

That said, because you just want to check whether there might be partial primer occurrences, I would just try it without the X first as an initial check. That doesn’t restrict where the 5' primer is allowed to be located at all. So it is less strict than using the X. If running it without the X does not give you an improvement, then adding the X will not help either.

Character 'F' in adapter sequence 'FNLE:/HOME/FWD.PRNMERS.FASTA' is not a valid IUPAC code. Use only characters 'ABCDGHIKMNRSTUVWXY'.

Something weird about the error is that FWD.PRNMERS.FASTA is not the name of the file I specified.

Yeah, that looks a bit weird because Cutadapt did not understand that you wanted it to read the adapters from a file. Instead, it interpreted the file:/home/fwd.primers.fasta string directly as an adapter sequence. It then did a couple of transformations (for example, converting all characters to uppercase) and complains about the first character that it doesn’t know how to interpret.

The name is FWD.primers.fasta (there is an "N" substituted for the "i" in primers).

Yes, the "I" is for inosine, which Cutadapt treats like an "N" wildcard, see #546.

Maybe I can improve that error message a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants