Pandas for phylogenetics
Project description
Bringing the Pandas DataFrame
to phylogenetics.
PhyloPandas provides a Pandas-like interface for reading sequence and phylogenetic tree data into pandas DataFrames. This enables easy manipulation of phylogenetic data using familiar Python/Pandas functions. Finally, phylogenetics for humans!
How does it work?
Don't worry, we didn't reinvent the wheel. PhyloPandas is simply a DataFrame (great for human-accessible data storage) interface on top of Biopython (great for parsing/writing sequence data) and DendroPy (great for reading tree data).
PhyloPandas does two things:
- It offers new
read
functions to read sequence/tree data directly into a DataFrame. - It attaches a new
phylo
accessor to the Pandas DataFrame. This accessor provides writing methods for sequencing/tree data (powered by Biopython and dendropy).
Basic Usage
Sequence data:
Read in a sequence file.
import phylopandas as ph
df1 = ph.read_fasta('sequences.fasta')
df2 = ph.read_phylip('sequences.phy')
Write to various sequence file formats.
df1.phylo.to_clustal('sequences.clustal')
Convert between formats.
# Read a format.
df = ph.read_fasta('sequences.fasta')
# Write to a different format.
df.phylo.to_phylip('sequences.phy')
Tree data:
Read newick tree data
df = ph.read_newick('tree.newick')
Plot newick data (using phylovega).
# Import PhyloVega.
from phylovega import VegaTree
# Initialize a Vega Tree object.
vt = VegaTree(df)
# Display the tree.
vt.display()
Contributing
If you have ideas for the project, please share them on the project's Gitter chat.
It's easy to create new read/write functions and methods for PhyloPandas. If you have a format you'd like to add, please submit PRs! There are many more formats in Biopython that I haven't had the time to add myself, so please don't be afraid to add them! I thank you ahead of time!
Testing
PhyloPandas includes a small pytest suite. Run these tests from base directory.
$ cd phylopandas
$ pytest
Install
Install from PyPi:
pip install phylopandas
Install from source:
git clone https://github.com/Zsailer/phylopandas
cd phylopandas
pip install -e .
Dependencies
- BioPython: Library for managing and manipulating biological data.
- DendroPy: Library for phylogenetic scripting, simulation, data processing and manipulation
- Pandas: Flexible and powerful data analysis / manipulation library for Python
- pandas_flavor: Flavor pandas objects with new accessors using pandas' new register API (with backwards compatibility).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for phylopandas-0.7.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f14b60ae8a10e3d7abf47409759fe85d3d205458576f027ea70a3af2a455e780 |
|
MD5 | e52cd42ab9f9c8a5e89eef41f4c20e77 |
|
BLAKE2b-256 | febf2f309474029382eb3676b1272eac3df868a3685c5c7a564ec22ef34da4b0 |