Molecular sequence analysis with fast C++ backends
Project description
evolution
Molecular sequence analysis for Python — fast pairwise distances and neighbour-joining trees for DNA, RNA, and protein sequences.
import fastphylo
aln = fastphylo.read("sequences.fasta") # FASTA, Stockholm, or Phylip
dm = fastphylo.distance_matrix(aln) # k2p for DNA, WAG for protein
tree = fastphylo.fnj(dm)
print(tree.to_newick())
Features
- Reads FASTA, Stockholm, and Phylip files; format auto-detected from extension and content
- DNA/RNA distances: Hamming, Jukes-Cantor, Kimura 2-parameter (default), Tamura-Nei 93 — computed by a fast C++ backend with SIMD (SSE2 / NEON)
- Protein distances: maximum-likelihood estimation under WAG (default), LG, JTT, Dayhoff, BLOSUM62, VT, cpREV, MtREV, RtREV, HIVb, HIVw, DCMUT, JTT-DCMut, and PMB, using a Brent optimizer written in C++
- Tree reconstruction: Neighbour-Joining, Fast NJ, and BioNJ (with branch lengths) via the FastPhylo library
- Branch length estimation:
fit_branch_lengthsfits branch lengths to any tree topology by L1-minimisation against a distance matrix (requires scipy) - Distance matrix access: integer or taxon-name indexing,
copy(),zeros()factory, NumPy and Phylip export - Multiple alignment: optional FAMSA integration for unaligned input
- Pure-Python sequence types with Stockholm annotation support (organism, AC,
description); edge-set
Treewithto_newick()andmerge()
Installation
pip install fastphylo
Optional extras:
pip install fastphylo[align] # multiple-sequence alignment (pyfamsa)
pip install scipy # branch length fitting (fit_branch_lengths)
Quick start
Aligned input → tree
import fastphylo
aln = fastphylo.read("alignment.sto") # Stockholm, FASTA, or Phylip
dm = fastphylo.distance_matrix(aln, model="k2p")
tree = fastphylo.fnj(dm)
print(tree.to_newick())
Unaligned input → align → tree
import fastphylo
seqs = fastphylo.read("sequences.fasta") # unaligned OK
aln = fastphylo.align(seqs) # requires fastphylo[align]
dm = fastphylo.distance_matrix(aln)
tree = fastphylo.fnj(dm)
print(tree.to_newick())
Protein sequences
import fastphylo
aln = fastphylo.read("proteins.fasta")
dm = fastphylo.distance_matrix(aln, model="LG")
tree = fastphylo.bionj(dm) # BioNJ with branch lengths
print(tree.to_newick())
Branch length fitting
NJ and FNJ return topology only (branch lengths are not computed). Use
fit_branch_lengths to fit branch lengths to any tree topology by minimising
the L1 deviation from the distance matrix:
import fastphylo
aln = fastphylo.read("alignment.fasta")
dm = fastphylo.distance_matrix(aln)
tree = fastphylo.fnj(dm) # fast topology, no lengths
tree = fastphylo.fit_branch_lengths(tree, dm) # requires scipy
print(tree.to_newick()) # now includes branch lengths
This solves a linear program: branch lengths are chosen to minimise
sum |path_distance(i,j) − dm[i,j]| over all leaf pairs, subject to
non-negative branch lengths.
Distance matrix access
dm = fastphylo.distance_matrix(aln)
# Integer or name-based indexing
d = dm[0, 1]
d = dm["human", "mouse"]
# Set elements
dm["human", "mouse"] = 0.15
dm["mouse", "human"] = 0.15
# Build a matrix manually
dm = fastphylo.DistanceMatrix.zeros(["human", "mouse", "rat"])
dm["human", "mouse"] = dm["mouse", "human"] = 0.15
dm["human", "rat"] = dm["rat", "human"] = 0.22
dm["mouse", "rat"] = dm["rat", "mouse"] = 0.08
# Copy, NumPy array, Phylip string
dm2 = dm.copy()
arr = dm.to_numpy()
text = dm.to_phylip()
API overview
| Function / class | Description |
|---|---|
fastphylo.read(path) |
Read FASTA / Stockholm / Phylip → SequenceCollection or Alignment |
fastphylo.align(seqs) |
Align with FAMSA → Alignment |
fastphylo.distance_matrix(aln, model=…) |
Compute pairwise distances → DistanceMatrix |
evolution.nj(dm) / fnj(dm) / bionj(dm) |
Tree reconstruction → Tree |
fastphylo.fit_branch_lengths(tree, dm) |
L1-optimal branch lengths for a given topology |
DistanceMatrix.zeros(names) |
Create an all-zero matrix with taxon names |
DistanceMatrix.copy() |
Deep copy |
DistanceMatrix.to_numpy() |
Export as NumPy array |
DistanceMatrix.to_phylip() |
Export as Phylip-format string |
Tree.to_newick() |
Newick string |
Tree.merge(other) |
Union of two edge-set trees |
Distance models
| Sequences | Model string | Notes |
|---|---|---|
| DNA / RNA | "hamming" |
Raw mismatch count |
| DNA / RNA | "jc" |
Jukes-Cantor |
| DNA / RNA | "k2p" (default) |
Kimura 2-parameter |
| DNA / RNA | "tn93" |
Tamura-Nei 93 |
| Protein | "WAG" (default) |
Whelan & Goldman |
| Protein | "LG", "JTT", "Dayhoff", … |
14 models total |
RNA is handled transparently (U → T at the C++ boundary).
Requirements
- Python ≥ 3.12
- NumPy ≥ 1.24
- A C compiler (for the bundled FastPhylo extension, built automatically by pip)
Optional:
- pyfamsa ≥ 0.6.0 — multiple-sequence alignment (
pip install fastphylo[align]) - scipy — branch length fitting via
fit_branch_lengths(pip install scipy)
License
GPLv3 — see FastPhylo for the upstream C++ library.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastphylo-0.1.0.tar.gz.
File metadata
- Download URL: fastphylo-0.1.0.tar.gz
- Upload date:
- Size: 118.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4bbf13d2bd54395caf15fa8244dfedc9c2d4c3caccd4b8864af6ee931ace0e76
|
|
| MD5 |
3575a17d2bce2bb7c624d6bdf795fc81
|
|
| BLAKE2b-256 |
033344a4fa0c77785c057d43b597b756760267c5076cb0f90b030b118106ed15
|
Provenance
The following attestation bundles were made for fastphylo-0.1.0.tar.gz:
Publisher:
release.yml on arvestad/fastphylo-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastphylo-0.1.0.tar.gz -
Subject digest:
4bbf13d2bd54395caf15fa8244dfedc9c2d4c3caccd4b8864af6ee931ace0e76 - Sigstore transparency entry: 1573041528
- Sigstore integration time:
-
Permalink:
arvestad/fastphylo-py@acce818576c0a0d733592f4229ada79ab651c173 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/arvestad
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@acce818576c0a0d733592f4229ada79ab651c173 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fastphylo-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: fastphylo-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 206.3 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26a7ce2f4365daf313036b6864b6645b9a75366ef9fb7c95808991b2cd0a0b25
|
|
| MD5 |
f39f918c95b7d5236d7b11774757dbbb
|
|
| BLAKE2b-256 |
3ac6fe33ea5cdf9aca2fdd2f9b7bd66c88b1d1b6d2710de77e307d0cfbc0155f
|
Provenance
The following attestation bundles were made for fastphylo-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on arvestad/fastphylo-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastphylo-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
26a7ce2f4365daf313036b6864b6645b9a75366ef9fb7c95808991b2cd0a0b25 - Sigstore transparency entry: 1573041544
- Sigstore integration time:
-
Permalink:
arvestad/fastphylo-py@acce818576c0a0d733592f4229ada79ab651c173 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/arvestad
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@acce818576c0a0d733592f4229ada79ab651c173 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fastphylo-0.1.0-cp312-cp312-macosx_10_9_universal2.whl.
File metadata
- Download URL: fastphylo-0.1.0-cp312-cp312-macosx_10_9_universal2.whl
- Upload date:
- Size: 306.5 kB
- Tags: CPython 3.12, macOS 10.9+ universal2 (ARM64, x86-64)
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
adddbe15f91904d5a69044070951e3667285343cc72e5483747bc6ba133105fe
|
|
| MD5 |
98121b3babfefab73e86869d044bfa41
|
|
| BLAKE2b-256 |
35fee7d523d88e31852d8bfa48dd5f01245296b930b89c9e927f8a2a8b3e110f
|
Provenance
The following attestation bundles were made for fastphylo-0.1.0-cp312-cp312-macosx_10_9_universal2.whl:
Publisher:
release.yml on arvestad/fastphylo-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fastphylo-0.1.0-cp312-cp312-macosx_10_9_universal2.whl -
Subject digest:
adddbe15f91904d5a69044070951e3667285343cc72e5483747bc6ba133105fe - Sigstore transparency entry: 1573041539
- Sigstore integration time:
-
Permalink:
arvestad/fastphylo-py@acce818576c0a0d733592f4229ada79ab651c173 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/arvestad
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@acce818576c0a0d733592f4229ada79ab651c173 -
Trigger Event:
push
-
Statement type: