Skip to content

Commit

Permalink
get_paths: new method for traversal
Browse files Browse the repository at this point in the history
  • Loading branch information
ktmeaton committed Jul 26, 2024
1 parent 5db962d commit b2b8631
Show file tree
Hide file tree
Showing 3 changed files with 147 additions and 158 deletions.
218 changes: 61 additions & 157 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,202 +35,106 @@ graph LR;

## Why pangonet?

1. **Quickly look up ancestors and descendants of any lineage.**
1. **Quickly look up phylogenetic relationships between lineages.**

```python
from pangonet import PangoNet
pango = PangoNet().build()

# All descendants
pango.get_descendants("JN.1.1")
['JN.1.1.1', 'XDK', 'XDK.1', 'XDK.1.1', 'XDK.1.2', 'XDK.2', 'XDK.3', 'XDK.4', 'XDK.4.1', 'XDK.5', 'XDK.6', 'JN.1.1.2', 'JN.1.1.3', 'LT.1', 'JN.1.1.4', 'JN.1.1.5', 'KR.1', 'KR.1.1', 'KR.1.2', 'KR.3', 'KR.4', 'KR.5', 'JN.1.1.6', 'KZ.1', 'KZ.1.1', 'KZ.1.1.1', 'JN.1.1.7', 'LC.1', 'JN.1.1.8', 'JN.1.1.9', 'JN.1.1.10', 'XDZ', 'XDN', 'XDR', 'XDR.1']

# All paths to the root. Recombination means there might be multiple!
pango.get_ancestors("XDB")
['XBB.1.16.19', 'XBB.1.16', 'XBB.1', 'XBB', 'BJ.1', 'BA.2.10.1', 'BA.2.10', 'BA.2', 'B.1.1.529', 'B.1.1', 'B.1', 'B', 'root', 'BM.1.1.1', 'BM.1.1', 'BM.1', 'BA.2.75.3', 'BA.2.75']

# Most recent common ancestor. Recombination means there might be multiple!
pango.get_mrca(["XE", "XG"])
["BA.1", "BA.2"]
```

Multiple parents due to recombination are handled, as is recursive recombination where a lineage has experienced recombination multiple times in its evolutionary history.

1. **Lots of output formats.**

The pango network can be exported to: `json`, `tsv`, `mermaid`, `dot` (graphviz), `newick` and [`extended newick`](https://en.wikipedia.org/wiki/Newick_format#Extended_Newick) for recombination.

1. **A command-line interface that requires no input files.**
1. **Command-line interface and python library that require no input files.**

All the required resources will be downloaded for you from [pango-designation](https://github.com/cov-lineages/pango-designation)!

```bash
pangonet --output-all --output-prefix output/pango

2024-07-18 14:05:20,587 INFO:Begin
2024-07-18 14:05:20,591 INFO:Downloading alias key: output/alias_key.json
2024-07-18 14:05:20,845 INFO:Downloading lineage notes: output/lineage_notes.txt
2024-07-18 14:05:21,298 INFO:Creating aliases.
2024-07-18 14:05:21,301 INFO:Creating network.
2024-07-18 14:05:21,517 INFO:Exporting table: output/pango.tsv
2024-07-18 14:05:21,569 INFO:Exporting standard newick: output/pango.nwk
2024-07-18 14:05:21,580 INFO:Exporting extended newick: output/pango.enwk
2024-07-18 14:05:21,589 INFO:Exporting mermaid: output/pango.mermaid
2024-07-18 14:05:21,597 INFO:Exporting dot: output/pango.dot
2024-07-18 14:05:21,602 INFO:Exporting json: output/pango.json
2024-07-18 14:05:21,662 INFO:Exporting condensed json: output/pango.condensed.json
2024-07-18 14:05:21,757 INFO:Done
```

- You can always use `--alias-key` and `--lineage-notes` to snapshot your network to a particular designation.
All the required resources will be downloaded for you from [pango-designation](https://github.com/cov-lineages/pango-designation)!

1. **`pangonet` is a single script with no dependencies aside from `python`.**

Sometimes you just need a really simple utility script for a quick lineage query, without having to bother with creating a `conda` environment. You can stick the single `pangonet.py` script in any python project and be good to go. Full package installation is still provided for those who need it.

## Install

- `pangonet` is written in standard python and has no dependencies aside from `python>=3.7`.
- PyPi and conda packages will be coming soon!

1. `pangonet` can be installed from source as a CLI tool and python package.

```bash
git clone https://github.com/phac-nml/pangonet.git
cd pangonet
pip install .
pangonet --help
```

1. `pangonet` can also be downloaded and run as a standlone script.

```bash
wget https://raw.githubusercontent.com/phac-nml/pangonet/main/src/pangonet/pangonet.py
python pangonet.py --help
```

## Usage

### Command-Line Interface

The command-line interface `pangonet` can be used to download the latest designated lineages and export a network for downstream applications.

1. Display help and usage.

```bash
$ pangonet --help

Create and manipulate SARS-CoV-2 pango lineages in a phylogenetic network.

options:
-h, --help show this help message and exit
--lineage-notes LINEAGE_NOTES
Path to the lineage_notes.txt
--alias-key ALIAS_KEY
Path to the alias_key.json
--output-prefix OUTPUT_PREFIX
Output prefix
--output-all Output all formats
--tsv Output metadata TSV
--json Output json
--nwk Output newick tree
--enwk Output extended newick tree for IcyTree
--mermaid Output mermaid graph
--dot Output dot for graphviz
-v, --version Print version
```

1. Create a network from the latest designated lineages.

```bash
$ pangonet --output-prefix output/pango --output-all

2024-07-18 14:05:20,587 INFO:Begin
2024-07-18 14:05:20,591 INFO:Downloading alias key: output/alias_key.json
2024-07-18 14:05:20,845 INFO:Downloading lineage notes: output/lineage_notes.txt
2024-07-18 14:05:21,298 INFO:Creating aliases.
2024-07-18 14:05:21,301 INFO:Creating network.
2024-07-18 14:05:21,517 INFO:Exporting table: output/pango.tsv
2024-07-18 14:05:21,569 INFO:Exporting standard newick: output/pango.nwk
2024-07-18 14:05:21,580 INFO:Exporting extended newick: output/pango.enwk
2024-07-18 14:05:21,589 INFO:Exporting mermaid: output/pango.mermaid
2024-07-18 14:05:21,597 INFO:Exporting dot: output/pango.dot
2024-07-18 14:05:21,602 INFO:Exporting json: output/pango.json
2024-07-18 14:05:21,662 INFO:Exporting condensed json: output/pango.condensed.json
2024-07-18 14:05:21,757 INFO:Done
```

Please see the [Visualize](#visualize) section for more information on the various output formats.

### Package

The `pangonet` python package provides functions to construct and manipulate a network of pango lineages.
### Python Library

```python
from pangonet import PangoNet

# Build the network by downloading the latest designations
pango = PangoNet().build()

# Or, build the network from local files
pango = PangoNet().build(alias_key="alias_key.json", lineage_notes="lineage_notes.txt")
```

Compress and uncompress aliases.

> ❗ See [pango_aliasor](https://github.com/corneliusroemer/pango_aliasor) for a more sophisticated approach to alias compression.

```python
# Alias manipulation
pango.uncompress("KP.3.1")
'B.1.1.529.2.86.1.1.11.1.3.1'

pango.compress('B.1.1.529.2.86.1.1.11')
'JN.1.11'
```

Get immediate children and parents.
# Get immediate parents and children
pango.get_parents("JN.1")
['JN.1']

```python
# Using a compressed lineage name
pango.get_children("JN.1.1")
pango.get_children("JN.1")
['JN.1.1.1', 'JN.1.1.2', 'JN.1.1.3', 'JN.1.1.4', 'JN.1.1.5', 'JN.1.1.6', 'JN.1.1.7', 'JN.1.1.8', 'JN.1.1.9', 'JN.1.1.10', 'XDN', 'XDR']

# Using the full lineage name
pango.get_parents('B.1.1.529.2.86.1.1.1')
['BA.2.86.1']
# Get specific paths between lineages, recombination means there might be multiple routes!
pango.get_paths(start="XE", end="B.1.1")
[['XE', 'BA.1', 'B.1.1.529', 'B.1.1'], ['XE', 'BA.2', 'B.1.1.529', 'B.1.1']]

# Recombinants (X*) will have multiple
pango.get_parents('XBL')
['XBB.1.5.57', 'BA.2.75']
```
# Or get all ancestors and descendants as big pile
pango.get_ancestors("XE")
['BA.1', 'B.1.1.529', 'B.1.1', 'B.1', 'B', 'root', 'BA.2']

Get comprehensive descendants and ancestors, following all possible paths.
pango.get_descendants("KP.1")
['KP.1.1', 'KP.1.1.1', 'MG.1', 'KP.1.1.2', 'KP.1.1.3', 'LP.1', 'LP.1.1', 'LP.2', 'LP.3', 'KP.1.1.4', 'KP.1.1.5', 'KP.1.2']

```python
# Follow all possible paths to terminals
pango.get_descendants("JN.1.1")
['JN.1.1.1', 'XDK', 'XDK.1', 'XDK.1.1', 'XDK.1.2', 'XDK.2', 'XDK.3', 'XDK.4', 'XDK.4.1', 'XDK.5', 'XDK.6', 'JN.1.1.2', 'JN.1.1.3', 'LT.1', 'JN.1.1.4', 'JN.1.1.5', 'KR.1', 'KR.1.1', 'KR.1.2', 'KR.3', 'KR.4', 'KR.5', 'JN.1.1.6', 'KZ.1', 'KZ.1.1', 'KZ.1.1.1', 'JN.1.1.7', 'LC.1', 'JN.1.1.8', 'JN.1.1.9', 'JN.1.1.10', 'XDZ', 'XDN', 'XDR', 'XDR.1']
# Most recent common ancestor(s) MRCA, recombination means there might be multiple!
pango.get_mrca(["BQ.1", "BA.2.4"])
['B.1.1.529']

# Follow all possible paths to the root
pango.get_ancestors("XDB")
['XBB.1.16.19', 'XBB.1.16', 'XBB.1', 'XBB', 'BJ.1', 'BA.2.10.1', 'BA.2.10', 'BA.2', 'B.1.1.529', 'B.1.1', 'B.1', 'B', 'root', 'BM.1.1.1', 'BM.1.1', 'BM.1', 'BA.2.75.3', 'BA.2.75']
pango.get_mrca(["XE", "XG"])
["BA.1", "BA.2"]
```

Filter the network to lineages of interest.

```python
# Create a network of the following lineages and their ancestors.
lineages = []
for l in ["XDB", "XBL", "AY.4"]:
lineages += l
lineages += pango.network[l]["ancestors"]
pango_filter = pango.filter(lineages)
### Command-Line Interface

print(list(pango_filter.network))
['root', 'B', 'B.1', 'B.1.1', 'B.1.1.529', 'BA.2', 'BA.2.10', 'BA.2.10.1', 'BJ.1', 'BA.2.75', 'BA.2.75.3', 'BM.1', 'BM.1.1', 'BM.1.1.1', 'B.1.617', 'B.1.617.2', 'AY.4', 'XBB', 'XBB.1', 'XBB.1.5', 'XBB.1.5.57', 'XBB.1.16', 'XBB.1.16.19', 'XBL', 'XDB']
```bash
$ pangonet --output-prefix output/pango --output-all

2024-07-18 14:05:20,587 INFO:Begin
2024-07-18 14:05:20,591 INFO:Downloading alias key: output/alias_key.json
2024-07-18 14:05:20,845 INFO:Downloading lineage notes: output/lineage_notes.txt
2024-07-18 14:05:21,298 INFO:Creating aliases.
2024-07-18 14:05:21,301 INFO:Creating network.
2024-07-18 14:05:21,517 INFO:Exporting table: output/pango.tsv
2024-07-18 14:05:21,569 INFO:Exporting standard newick: output/pango.nwk
2024-07-18 14:05:21,580 INFO:Exporting extended newick: output/pango.enwk
2024-07-18 14:05:21,589 INFO:Exporting mermaid: output/pango.mermaid
2024-07-18 14:05:21,597 INFO:Exporting dot: output/pango.dot
2024-07-18 14:05:21,602 INFO:Exporting json: output/pango.json
2024-07-18 14:05:21,662 INFO:Exporting condensed json: output/pango.condensed.json
2024-07-18 14:05:21,757 INFO:Done
```

### Visualize
## Install

- `pangonet` is written in standard python and has no dependencies aside from `python>=3.7`.
- PyPi and conda packages will be coming soon!

1. `pangonet` can be installed from source as a CLI tool and python package.

```bash
git clone https://github.com/phac-nml/pangonet.git
cd pangonet
pip install .
pangonet --help
```

1. `pangonet` can also be downloaded and run as a standlone script.

```bash
wget https://raw.githubusercontent.com/phac-nml/pangonet/main/src/pangonet/pangonet.py
python pangonet.py --help
```

## Visualize

`pangonet` also allows you to export the network in a wide variety of formats. We will filter down the lineages to better demonstrate visualization.

Expand Down
56 changes: 55 additions & 1 deletion src/pangonet/pangonet.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,19 @@
import copy
import logging
import urllib.request
from enum import Enum

logging.basicConfig(level=logging.INFO, stream=sys.stdout, format='%(asctime)s %(levelname)s:%(message)s')

# Github Download setup and credentials
ALIAS_KEY_URL = "https://api.github.com/repos/cov-lineages/pango-designation/contents/pango_designation/alias_key.json"
LINEAGE_NOTES_URL = "https://api.github.com/repos/cov-lineages/pango-designation/contents/lineage_notes.txt"

class Direction(Enum):
ToRoot = 0
ToTips = 1
Unknown = 2

class PangoNet:

def __init__(self, root: str = "root"):
Expand Down Expand Up @@ -234,7 +240,7 @@ def get_ancestors(self, lineage: str, network : OrderedDict = None):
ancestors += [parent] + parent_ancestors
# remove duplicates (python 3.7+ preserves order)
ancestors = list(dict.fromkeys(ancestors))
return ancestors
return ancestors

def get_children(self, lineage: str, network : OrderedDict = None):
if not network:
Expand Down Expand Up @@ -298,6 +304,54 @@ def get_parents(self, lineage: str, network : OrderedDict = None):
network = self.network
return network[lineage]["parents"]

def get_paths(
self,
start: str,
end: str,
network : OrderedDict = None,
direction: Direction = Direction.Unknown,
depth: int = 0,
):
'''
'''
if not network:
network = self.network

# Recursion bottom out, found our target
if start == end:
return [[start]]

# If we don't know the direction yet
if direction == Direction.Unknown:
# If end is an ancestor of start, we need to move towards the root
if end in self.get_ancestors(network=network, lineage=start):
direction = Direction.ToRoot
# If end is a descendant of start, we need to move towards the tips
elif end in self.get_descendants(network=network, lineage=start):
direction = Direction.ToTips
# Otherwise, unclear relationship for movement, stop now
else:
return []

# Figure out where we should go next in our search
next_nodes = []
if direction == Direction.ToRoot:
parents = self.get_parents(network=network, lineage=start)
next_nodes = [p for p in parents if p == end or end in self.get_ancestors(network=network, lineage=p)]
elif direction == Direction.ToTips:
children = self.get_children(network=network, lineage=start)
next_nodes = [c for c in children if c == end or end in self.get_descendants(network=network, lineage=c)]

# Recursively search and update paths
paths = []
for lineage in next_nodes:
next_paths = self.get_paths(network=network, start=lineage, end=end, direction=direction, depth=depth + 1)
for p in next_paths:
p = [start] + p
paths.append(p)

return paths

def get_recombinants(self, descendants=False, network: OrderedDict = None):
'''
'''
Expand Down
31 changes: 31 additions & 0 deletions tests/test_pangonet.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,37 @@ def test_pangonet_get_parents():
assert pango.get_parents("XBB") == ['BJ.1', 'BM.1.1.1']
assert pango.get_parents("XBB.1.5") == ['XBB.1']

def test_pangonet_get_paths():
pango = PangoNet().build(alias_key=new_alias_key, lineage_notes=new_lineage_notes)
# Going towards root
assert pango.get_paths(start="BA.1", end="BA.1") == [["BA.1"]]
assert pango.get_paths(start="BA.1", end="B.1.1") == [["BA.1", "B.1.1.529", "B.1.1" ]]
assert pango.get_paths(start="XE", end="B.1") == [
['XE', 'BA.1', 'B.1.1.529', 'B.1.1', 'B.1'],
['XE', 'BA.2', 'B.1.1.529', 'B.1.1', 'B.1']
]
# Going towards root, recursive recombination
assert pango.get_paths(start="XBL", end="B.1.1") == [
['XBL', 'XBB.1.5.57', 'XBB.1.5', 'XBB.1', 'XBB', 'BJ.1', 'BA.2.10.1', 'BA.2.10', 'BA.2', 'B.1.1.529', 'B.1.1'],
['XBL', 'XBB.1.5.57', 'XBB.1.5', 'XBB.1', 'XBB', 'BM.1.1.1', 'BM.1.1', 'BM.1', 'BA.2.75.3', 'BA.2.75', 'BA.2', 'B.1.1.529', 'B.1.1'],
['XBL', 'BA.2.75', 'BA.2', 'B.1.1.529', 'B.1.1']
]


# Going towards tips
assert pango.get_paths(start="B.1.1.529", end="BA.2.3") == [["B.1.1.529", "BA.2", "BA.2.3"]]
assert pango.get_paths(start="B.1.1.529", end="BQ.1") == [['B.1.1.529', 'BA.5', 'BA.5.3', 'BA.5.3.1', 'BE.1', 'BE.1.1', 'BE.1.1.1', 'BQ.1']]
# Going towards tip, recursive recombination
assert pango.get_paths(start="B.1.1.529", end="XDB") == [
['B.1.1.529', 'BA.2', 'BA.2.10', 'BA.2.10.1', 'BJ.1', 'XBB', 'XBB.1', 'XBB.1.16', 'XBB.1.16.19', 'XDB'],
['B.1.1.529', 'BA.2', 'BA.2.10', 'BA.2.10.1', 'BJ.1', 'XBB', 'XDB'],
['B.1.1.529', 'BA.2', 'BA.2.75', 'BA.2.75.3', 'BM.1', 'BM.1.1', 'BM.1.1.1', 'XBB', 'XBB.1', 'XBB.1.16', 'XBB.1.16.19', 'XDB'],
['B.1.1.529', 'BA.2', 'BA.2.75', 'BA.2.75.3', 'BM.1', 'BM.1.1', 'BM.1.1.1', 'XBB', 'XDB']
]

# Going sideways, nope?
assert pango.get_paths(start="BA.1", end="BA.2") == []

def test_pangonet_get_recombinants():
...

Expand Down

0 comments on commit b2b8631

Please sign in to comment.