Align expanded sequences, trim them and then compute the consensus sequence #30

AndreaGuarracino · 2021-10-11T19:06:58Z

Hi Yaon,

I've expanded few sequences with flanking nucleotides, then I've aligned them, obtaining the MSA and consensus sequence.

However, after aligning the expanded sequences, I would like to first delete the additional nucleotides and then calculate the consensus sequence with only the unexpanded sequences, without having to recalculate the whole alignment.

Is it possible?

yangao07 · 2021-10-13T08:53:14Z

I don't fully get your idea.
Do all the aligned sequences have flanking nucleotides? Or only a part of the sequences have?

Then, I guess what you want is to obtain the alignment/consensus for only a subset of all the input aligned sequences that have no flanking nucleotides, am I right?

Yan (I know Yao is very famous:) )

AndreaGuarracino · 2021-10-13T09:16:23Z

All the aligned sequences have flanking nucleotides.

Let me try with an example.

These are the original sequences.

example_abpoa.fa

>Sequence1
AAAAAAAATTTT
>Sequence2
TTGGTTTTTT

abpoa --match 1 --mismatch 9 --gap-open 16,41 --gap-ext 2,1 example_abpoa.fa -r 2

>Multiple_sequence_alignment
------AAAAAAAATTTT
TTGGTT--------TTTT
------AAAAAAAATTTT

Instead of aligning the original sequences, I am extending them with few flanking nucleotides upstream and downstream, and then aligning the expanded sequences. For example, adding 4 flanking nucleotides:

example_abpoa_padded.fa

>Sequence1
GGGGAAAAAAAATTTTGGGG
>Sequence2
TTTTTTGGTTTTTTGGGG

abpoa --match 1 --mismatch 9 --gap-open 16,41 --gap-ext 2,1 example_abpoa_padded.fa -r 2

>Multiple_sequence_alignment
------GGGGAAAAAAAATTTTGGGG
TTTTTTGG--------TTTTTTGGGG
TTTTTTGGGGAAAAAAAATTTTGGGG

Concerning the sequences, I am taking this alignment and removing the added flanking nucleotides. However, regarding the consensus sequence, I can't just trim the flanking nucleotides out (this leads to rare, but blocking bugs).

I would like to trim the expanded sequences in the abPOA graph (where the alignment was already computed with the expanded sequences) and then re-ask for the consensus sequence generation using the trimmed graph.

Yan (I know Yao is very famous:) )

LOL, mutations, you've got me!

yangao07 · 2021-10-13T11:55:07Z

>Multiple_sequence_alignment
------GGGGAAAAAAAATTTTGGGG
TTTTTTGG--------TTTTTTGGGG
TTTTTTGGGGAAAAAAAATTTTGGGG

I guess, from this new graph, what you want is the consensus of the original (unexpanded) sequences, right? Then the graph should be transformed to:

>Multiple_sequence_alignment
----------AAAAAAAATTTT----
----TTGG--------TTTTTT----
>Consensus_sequence
----TTGG--AAAAAAAATTTT----

I think this is possible.
We need to record all the flanking-base nodes and then manipulate their weight/base before calling the consensus:

w = w-1, if w > 1
set base as - if w == 1

Edit: this manipulation may not be correct, I need to think about it.

Then for the final consensus, we just need to remove all the - symbols.

AndreaGuarracino · 2022-03-25T15:43:18Z

Hi @yangao07, any update on this?

yangao07 · 2022-03-28T03:13:45Z

See above, is that what you want?

AndreaGuarracino · 2022-03-28T07:34:54Z

I apologize, I didn't realize you were waiting for confirmation from me. Yes, I want the consensus of the original (unexpanded) sequences, as you wrote.

>Multiple_sequence_alignment
----------AAAAAAAATTTT----
----TTGG--------TTTTTT----
>Consensus_sequence
----TTGG--AAAAAAAATTTT---- <---- This one, generated from the trimmed (unexpanded) sequences

As I said, the consensus sequence can't be simply trimmed (this leads to rare bugs) but has to be generated starting from the original (unexpanded) sequences, using the alignment that we've got when we aligned the same sequences together with the flanking nucleotides.

After your considerations, do you think is it still doable?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align expanded sequences, trim them and then compute the consensus sequence #30

Align expanded sequences, trim them and then compute the consensus sequence #30

AndreaGuarracino commented Oct 11, 2021 •

edited

Loading

yangao07 commented Oct 13, 2021

AndreaGuarracino commented Oct 13, 2021

yangao07 commented Oct 13, 2021 •

edited

Loading

AndreaGuarracino commented Mar 25, 2022

yangao07 commented Mar 28, 2022

AndreaGuarracino commented Mar 28, 2022

Align expanded sequences, trim them and then compute the consensus sequence #30

Align expanded sequences, trim them and then compute the consensus sequence #30

Comments

AndreaGuarracino commented Oct 11, 2021 • edited Loading

yangao07 commented Oct 13, 2021

AndreaGuarracino commented Oct 13, 2021

yangao07 commented Oct 13, 2021 • edited Loading

AndreaGuarracino commented Mar 25, 2022

yangao07 commented Mar 28, 2022

AndreaGuarracino commented Mar 28, 2022

AndreaGuarracino commented Oct 11, 2021 •

edited

Loading

yangao07 commented Oct 13, 2021 •

edited

Loading