Skip to main content

Pandas for phylogenetics

Project description


# PhyloPandas #

**Bringing the [Pandas](https://github.com/pandas-dev/pandas) `DataFrame` to phylogenetics.**

PhyloPandas provides a Pandas-like interface for reading various sequence formats into DataFrames. This enables easy manipulation of phylogenetic data using familiar Python/Pandas functions. Finally, phylogenetics for humans!

<img src='docs/_images/jlab.png' align="middle">

## How does it work?

Don't worry, we didn't reinvent the wheel. **PhyloPandas** is simply a [DataFrame](https://github.com/pandas-dev/pandas)
(great for human-accessible data storage) interface on top of [Biopython](https://github.com/biopython/biopython) (great for parsing/writing sequence data).

When you import PhyloPandas, you import Pandas with a PhyloPandas flavor. That means, the usual `read_` functions
are available ('read_csv', 'read_excel', etc.), but the returned DataFrame includes extra `to_` methods (`to_fasta`, `to_phylip`, etc.)

## Basic Usage

1. Read any format:
```python
import phylopandas as pd

df1 = pd.read_fasta('sequences.fasta')
df2 = pd.read_phylip('sequences.phy')
```
2. Write any format:
```python
df1.to_clustal('sequences.clustal')
```
3. Convert formats:
```python
df = phypd.read_fasta('sequences.fasta')
df.to_phylip('sequences.phy')
```
4. Merge two **ordered** sequence files (like raw sequence file and its alignment).
```python
# Read sequence file into dataframe
df = pd.read_fasta('sequences.fasta')

# Read alignment into dataframe
align = pd.read_fasta('alignment.fasta')

# Add alignment using standard pandas functions
# NOTE: this assumes the alignment and sequence
# file are ordered.
df = df.assign(alignment=align['sequence'])
```
5. Write out alignment in last example.
```python
df.to_fasta('new_alignment.fasta', sequence_col='alignment')
```

## Contributing

It's *easy* to create new read/write functions and methods for PhyloPandas. If you
have a format you'd like to add, please submit PRs! There are many more formats
in Biopython that I haven't had the time to add myself, so please don't be afraid
to add them! I thank you ahead of time!

## Testing

PhyloPandas includes a small [pytest]() suite. Run these tests from base directory.
```
$ cd phylopandas
$ pytest
```

## Install

Install from PyPi:
```
pip install phylopandas
```

Install from source:

```
git clone https://github.com/Zsailer/phylopandas
cd phylopandas
pip install -e .
```

## Dependencies

* [BioPython](https://github.com/biopython/biopython): Library for managing and manipulating biological data.
* [Pandas](https://github.com/pandas-dev/pandas): Flexible and powerful data analysis / manipulation library for Python


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phylopandas-0.1.3.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phylopandas-0.1.3-py2.py3-none-any.whl (7.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file phylopandas-0.1.3.tar.gz.

File metadata

  • Download URL: phylopandas-0.1.3.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for phylopandas-0.1.3.tar.gz
Algorithm Hash digest
SHA256 1378d8112d7a903e621edeb84dc5ce5a7591ca760caedcc8eb801ffa5fd29b5a
MD5 fa16c707c23be054d1204b42dafe5840
BLAKE2b-256 83847ae6b85a786fb5bf36cff853ee0d5fe8a6eedc478c110dab588b71d30d61

See more details on using hashes here.

File details

Details for the file phylopandas-0.1.3-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for phylopandas-0.1.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 183c4865a34215b51ddff9f9734fd28b00afda48a35abb57930cd33457096a1f
MD5 92a070206974d53ab15a17f3427f6b2a
BLAKE2b-256 531232de60bd4c0dac0c4c35033cc5e08355a3d84cfcfe536fe4a79412f66652

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page