Skip to content

heathcliff233/phylopandas

 
 

Repository files navigation

Gitter chat Documentation Status Build Status Binder

Bringing the Pandas DataFrame to phylogenetics.

PhyloPandas provides a Pandas-like interface for reading sequence and phylogenetic tree data into pandas DataFrames. This enables easy manipulation of phylogenetic data using familiar Python/Pandas functions. Finally, phylogenetics for humans!

How does it work?

Don't worry, we didn't reinvent the wheel. PhyloPandas is simply a DataFrame (great for human-accessible data storage) interface on top of Biopython (great for parsing/writing sequence data) and DendroPy (great for reading tree data).

PhyloPandas does two things:

  1. It offers new read functions to read sequence/tree data directly into a DataFrame.
  2. It attaches a new phylo accessor to the Pandas DataFrame. This accessor provides writing methods for sequencing/tree data (powered by Biopython and dendropy).

Basic Usage

Sequence data:

Read in a sequence file.

import phylopandas as ph

df1 = ph.read_fasta('sequences.fasta')    # for fasta
df2 = ph.read_fasta_dev('sequence.fasta') # for two-line fasta
df3 = ph.read_phylip('sequences.phy')

Write to various sequence file formats.

df1.phylo.to_clustal('sequences.clustal')

Convert between formats.

# Read a format.
df = ph.read_fasta('sequences.fasta')

# Write to a different format.
df.phylo.to_phylip('sequences.phy')
df.phylo.to_fasta_dev('seq_two_line.fasta') # to two line fasta
df.phylo.to_embl(mtype='protein', filename='sequences.embl') # set type of SeqRecord annotation property with mtype arg

Tree data:

Read newick tree data

df = ph.read_newick('tree.newick')

Contributing

If you have ideas for the project, please share them on the project's Gitter chat.

It's easy to create new read/write functions and methods for PhyloPandas. If you have a format you'd like to add, please submit PRs! There are many more formats in Biopython that I haven't had the time to add myself, so please don't be afraid to add them! I thank you ahead of time!

Testing

PhyloPandas includes a small pytest suite. Run these tests from base directory.

$ cd phylopandas
$ pytest

Install

Install from PyPI:

pip install phylopandas

Install from source:

git clone https://github.com/heathcliff233/phylopandas
cd phylopandas
pip install -e .

Dependencies

  • BioPython: Library for managing and manipulating biological data.
  • DendroPy: Library for phylogenetic scripting, simulation, data processing and manipulation
  • Pandas: Flexible and powerful data analysis / manipulation library for Python
  • pandas_flavor: Flavor pandas objects with new accessors using pandas' new register API (with backwards compatibility).

About

Pandas DataFrames for phylogenetics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%