Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align expanded sequences, trim them and then compute the consensus sequence #30

Open
AndreaGuarracino opened this issue Oct 11, 2021 · 6 comments

Comments

@AndreaGuarracino
Copy link

AndreaGuarracino commented Oct 11, 2021

Hi Yaon,

I've expanded few sequences with flanking nucleotides, then I've aligned them, obtaining the MSA and consensus sequence.

However, after aligning the expanded sequences, I would like to first delete the additional nucleotides and then calculate the consensus sequence with only the unexpanded sequences, without having to recalculate the whole alignment.

Is it possible?

@yangao07
Copy link
Owner

I don't fully get your idea.
Do all the aligned sequences have flanking nucleotides? Or only a part of the sequences have?

Then, I guess what you want is to obtain the alignment/consensus for only a subset of all the input aligned sequences that have no flanking nucleotides, am I right?

Yan (I know Yao is very famous:) )

@AndreaGuarracino
Copy link
Author

All the aligned sequences have flanking nucleotides.

Let me try with an example.

These are the original sequences.

example_abpoa.fa

>Sequence1
AAAAAAAATTTT
>Sequence2
TTGGTTTTTT

abpoa --match 1 --mismatch 9 --gap-open 16,41 --gap-ext 2,1 example_abpoa.fa -r 2

>Multiple_sequence_alignment
------AAAAAAAATTTT
TTGGTT--------TTTT
------AAAAAAAATTTT

Instead of aligning the original sequences, I am extending them with few flanking nucleotides upstream and downstream, and then aligning the expanded sequences. For example, adding 4 flanking nucleotides:

example_abpoa_padded.fa

>Sequence1
GGGGAAAAAAAATTTTGGGG
>Sequence2
TTTTTTGGTTTTTTGGGG

abpoa --match 1 --mismatch 9 --gap-open 16,41 --gap-ext 2,1 example_abpoa_padded.fa -r 2

>Multiple_sequence_alignment
------GGGGAAAAAAAATTTTGGGG
TTTTTTGG--------TTTTTTGGGG
TTTTTTGGGGAAAAAAAATTTTGGGG

Concerning the sequences, I am taking this alignment and removing the added flanking nucleotides. However, regarding the consensus sequence, I can't just trim the flanking nucleotides out (this leads to rare, but blocking bugs).

I would like to trim the expanded sequences in the abPOA graph (where the alignment was already computed with the expanded sequences) and then re-ask for the consensus sequence generation using the trimmed graph.

Yan (I know Yao is very famous:) )

LOL, mutations, you've got me!

@yangao07
Copy link
Owner

yangao07 commented Oct 13, 2021

>Multiple_sequence_alignment
------GGGGAAAAAAAATTTTGGGG
TTTTTTGG--------TTTTTTGGGG
TTTTTTGGGGAAAAAAAATTTTGGGG

I guess, from this new graph, what you want is the consensus of the original (unexpanded) sequences, right? Then the graph should be transformed to:

>Multiple_sequence_alignment
----------AAAAAAAATTTT----
----TTGG--------TTTTTT----
>Consensus_sequence
----TTGG--AAAAAAAATTTT----

I think this is possible.
We need to record all the flanking-base nodes and then manipulate their weight/base before calling the consensus:

  1. w = w-1, if w > 1
  2. set base as - if w == 1

Edit: this manipulation may not be correct, I need to think about it.

Then for the final consensus, we just need to remove all the - symbols.

@AndreaGuarracino
Copy link
Author

Hi @yangao07, any update on this?

@yangao07
Copy link
Owner

See above, is that what you want?

@AndreaGuarracino
Copy link
Author

I apologize, I didn't realize you were waiting for confirmation from me. Yes, I want the consensus of the original (unexpanded) sequences, as you wrote.

>Multiple_sequence_alignment
----------AAAAAAAATTTT----
----TTGG--------TTTTTT----
>Consensus_sequence
----TTGG--AAAAAAAATTTT---- <---- This one, generated from the trimmed (unexpanded) sequences

As I said, the consensus sequence can't be simply trimmed (this leads to rare bugs) but has to be generated starting from the original (unexpanded) sequences, using the alignment that we've got when we aligned the same sequences together with the flanking nucleotides.

After your considerations, do you think is it still doable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants