Skip to main content

Molecular sequence analysis with fast C++ backends

Project description

evolution

Molecular sequence analysis for Python — fast pairwise distances and neighbour-joining trees for DNA, RNA, and protein sequences.

import fastphylo

aln  = fastphylo.read("sequences.fasta")          # FASTA, Stockholm, or Phylip
dm   = fastphylo.distance_matrix(aln)             # k2p for DNA, WAG for protein
tree = fastphylo.fnj(dm)
print(tree.to_newick())

Features

  • Reads FASTA, Stockholm, and Phylip files; format auto-detected from extension and content
  • DNA/RNA distances: Hamming, Jukes-Cantor, Kimura 2-parameter (default), Tamura-Nei 93 — computed by a fast C++ backend with SIMD (SSE2 / NEON)
  • Protein distances: maximum-likelihood estimation under WAG (default), LG, JTT, Dayhoff, BLOSUM62, VT, cpREV, MtREV, RtREV, HIVb, HIVw, DCMUT, JTT-DCMut, and PMB, using a Brent optimizer written in C++
  • Tree reconstruction: Neighbour-Joining, Fast NJ, and BioNJ (with branch lengths) via the FastPhylo library
  • Branch length estimation: fit_branch_lengths fits branch lengths to any tree topology by L1-minimisation against a distance matrix (requires scipy)
  • Distance matrix access: integer or taxon-name indexing, copy(), zeros() factory, NumPy and Phylip export
  • Multiple alignment: optional FAMSA integration for unaligned input
  • Pure-Python sequence types with Stockholm annotation support (organism, AC, description); edge-set Tree with to_newick() and merge()

Installation

pip install fastphylo

Optional extras:

pip install fastphylo[align]    # multiple-sequence alignment (pyfamsa)
pip install scipy               # branch length fitting (fit_branch_lengths)

Quick start

Aligned input → tree

import fastphylo

aln  = fastphylo.read("alignment.sto")            # Stockholm, FASTA, or Phylip
dm   = fastphylo.distance_matrix(aln, model="k2p")
tree = fastphylo.fnj(dm)
print(tree.to_newick())

Unaligned input → align → tree

import fastphylo

seqs = fastphylo.read("sequences.fasta")          # unaligned OK
aln  = fastphylo.align(seqs)                      # requires fastphylo[align]
dm   = fastphylo.distance_matrix(aln)
tree = fastphylo.fnj(dm)
print(tree.to_newick())

Protein sequences

import fastphylo

aln  = fastphylo.read("proteins.fasta")
dm   = fastphylo.distance_matrix(aln, model="LG")
tree = fastphylo.bionj(dm)                        # BioNJ with branch lengths
print(tree.to_newick())

Branch length fitting

NJ and FNJ return topology only (branch lengths are not computed). Use fit_branch_lengths to fit branch lengths to any tree topology by minimising the L1 deviation from the distance matrix:

import fastphylo

aln  = fastphylo.read("alignment.fasta")
dm   = fastphylo.distance_matrix(aln)
tree = fastphylo.fnj(dm)                          # fast topology, no lengths

tree = fastphylo.fit_branch_lengths(tree, dm)     # requires scipy
print(tree.to_newick())                           # now includes branch lengths

This solves a linear program: branch lengths are chosen to minimise sum |path_distance(i,j) − dm[i,j]| over all leaf pairs, subject to non-negative branch lengths.

Distance matrix access

dm = fastphylo.distance_matrix(aln)

# Integer or name-based indexing
d = dm[0, 1]
d = dm["human", "mouse"]

# Set elements
dm["human", "mouse"] = 0.15
dm["mouse", "human"] = 0.15

# Build a matrix manually
dm = fastphylo.DistanceMatrix.zeros(["human", "mouse", "rat"])
dm["human", "mouse"] = dm["mouse", "human"] = 0.15
dm["human", "rat"]   = dm["rat",   "human"] = 0.22
dm["mouse", "rat"]   = dm["rat",   "mouse"] = 0.08

# Copy, NumPy array, Phylip string
dm2  = dm.copy()
arr  = dm.to_numpy()
text = dm.to_phylip()

API overview

Function / class Description
fastphylo.read(path) Read FASTA / Stockholm / Phylip → SequenceCollection or Alignment
fastphylo.align(seqs) Align with FAMSA → Alignment
fastphylo.distance_matrix(aln, model=…) Compute pairwise distances → DistanceMatrix
evolution.nj(dm) / fnj(dm) / bionj(dm) Tree reconstruction → Tree
fastphylo.fit_branch_lengths(tree, dm) L1-optimal branch lengths for a given topology
DistanceMatrix.zeros(names) Create an all-zero matrix with taxon names
DistanceMatrix.copy() Deep copy
DistanceMatrix.to_numpy() Export as NumPy array
DistanceMatrix.to_phylip() Export as Phylip-format string
Tree.to_newick() Newick string
Tree.merge(other) Union of two edge-set trees

Distance models

Sequences Model string Notes
DNA / RNA "hamming" Raw mismatch count
DNA / RNA "jc" Jukes-Cantor
DNA / RNA "k2p" (default) Kimura 2-parameter
DNA / RNA "tn93" Tamura-Nei 93
Protein "WAG" (default) Whelan & Goldman
Protein "LG", "JTT", "Dayhoff", … 14 models total

RNA is handled transparently (U → T at the C++ boundary).

Requirements

  • Python ≥ 3.12
  • NumPy ≥ 1.24
  • A C compiler (for the bundled FastPhylo extension, built automatically by pip)

Optional:

  • pyfamsa ≥ 0.6.0 — multiple-sequence alignment (pip install fastphylo[align])
  • scipy — branch length fitting via fit_branch_lengths (pip install scipy)

License

GPLv3 — see FastPhylo for the upstream C++ library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastphylo-0.1.0.tar.gz (118.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastphylo-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (206.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

fastphylo-0.1.0-cp312-cp312-macosx_10_9_universal2.whl (306.5 kB view details)

Uploaded CPython 3.12macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file fastphylo-0.1.0.tar.gz.

File metadata

  • Download URL: fastphylo-0.1.0.tar.gz
  • Upload date:
  • Size: 118.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fastphylo-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4bbf13d2bd54395caf15fa8244dfedc9c2d4c3caccd4b8864af6ee931ace0e76
MD5 3575a17d2bce2bb7c624d6bdf795fc81
BLAKE2b-256 033344a4fa0c77785c057d43b597b756760267c5076cb0f90b030b118106ed15

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastphylo-0.1.0.tar.gz:

Publisher: release.yml on arvestad/fastphylo-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastphylo-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fastphylo-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 26a7ce2f4365daf313036b6864b6645b9a75366ef9fb7c95808991b2cd0a0b25
MD5 f39f918c95b7d5236d7b11774757dbbb
BLAKE2b-256 3ac6fe33ea5cdf9aca2fdd2f9b7bd66c88b1d1b6d2710de77e307d0cfbc0155f

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastphylo-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on arvestad/fastphylo-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastphylo-0.1.0-cp312-cp312-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for fastphylo-0.1.0-cp312-cp312-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 adddbe15f91904d5a69044070951e3667285343cc72e5483747bc6ba133105fe
MD5 98121b3babfefab73e86869d044bfa41
BLAKE2b-256 35fee7d523d88e31852d8bfa48dd5f01245296b930b89c9e927f8a2a8b3e110f

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastphylo-0.1.0-cp312-cp312-macosx_10_9_universal2.whl:

Publisher: release.yml on arvestad/fastphylo-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page