Dataframe-based tools for working with phylogenetic trees.

These details have not been verified by PyPI

Project links

Project description

phyloframe

Dataframe-based tools for working with phylogenetic trees.

Free software: MIT license
Documentation: https://phyloframe.readthedocs.io
Repository: https://github.com/mmore500/phyloframe

Why a DataFrame-based Tree Representation?

The R ecosystem's success with the ape data structure demonstrates the utility of edge matrix tree representations --- phyloframe pushes this idea further with a fully tabular format hosted within DataFrame objects (e.g., pd.DataFrame, pl.LazyFrame, pl.DataFrame, etc.).

DataFrames are scripting-friendly and end-user extensible, enabling a composable, interoperable, high-performance ecosystem for phylogenetic analysis --- in applications to our work, scalable to billion-tip phylogenies.

Fast and highly portable load/save. Use pandas.read_csv, polars.read_parquet, etc. --- libraries transparently fetch from URLs, cloud providers (S3, Google Cloud, etc.). Fast, full-featured Newick format I/O support.

Benefit from modern tabular data formats. Granular deserialization of selected columns (e.g., Parquet), transparent compression configuration (e.g., Parquet), columnar compression for efficient storage, categorical strings, and explicit column typing.

Benefit from modern high-performance dataframe tooling. Lazy query optimization (e.g. Polars), larger-than-memory streaming operations (e.g., Polars), distributed computing operations (e.g., Dask), multithreaded operations (e.g., Polars), vectorized operations (e.g., NumPy), and just-in-time compilation (e.g., Numba).

Benefit from rich, expressive dataframe functionality. Leverage powerful querying and transformation APIs (e.g., Polars expressions, Pandas indexing), enabling flexible filtering, bulk column calculations, grouped aggregations, join/merge operations, and chained transformations directly over tree data without manual loops.

Cache-friendly, memory-efficient, flexible data structure. Data occupies contiguous arrays, expediting tree creation and topological order traversals (e.g., parents before children or vice versa). Base memory footprint is lightweight (e.g., as little as 32 bits per node), but can be dynamically augmented to expedite traversals and calculations (e.g., linked list over children via DataFrame columns for first child/next sibling indices).

Rich interoperative ecosystem. Multi-language interoperation (e.g., possible future support for zero-copy interop between R and Python via reticulate and Arrow; possible future support for zero-copy Polars DataFrames shared between Rust and Python). Multi-library interoperation (e.g., highly-optimized or zero-copy interoperation between Polars and Pandas; Python dataframe protocol). Compatibility with existing alife data standards ecosystem.

Performance

benchmark results Computational throughput (left) and memory efficiency (right) across tree sizes. Higher is better.

At large tree sizes, phyloframe improves speed and memory-efficiency for most operations.

Notably, newick reads and topological-order tree traversal (i.e., parents before children) are up to 10× faster than existing tools --- including implementations backed by native code. Newick writes are up to 2× faster.

Benchmarked operations include tree traversals, newick read/write, and pairwise operations.

Install

python3 -m pip install "phyloframe[jit]==0.8.0"

The [jit] extra installs Numba for just-in-time compilation, providing native-level performance for many operations. Jit dependency is strongly recommended.

A containerized release of phyloframe is available via ghcr.io

singularity exec docker://ghcr.io/mmore500/phyloframe:v0.8.0 python3 -m phyloframe --help

Quickstart

Phyloframe represents phylogenies as DataFrames in the alife standard format.

from phyloframe import legacy as pfl

# Parse a Newick tree (already in working format)
df = pfl.alifestd_from_newick("((A:1,B:2):3,(C:4,D:5):6);")

# Mark properties and transform using df.pipe() (pandas syntactic sugar)
df = (
    df.pipe(pfl.alifestd_mark_leaves)
    .pipe(pfl.alifestd_mark_node_depth_asexual)
    .pipe(pfl.alifestd_collapse_unifurcations)
)

print("leaf count:", pfl.alifestd_count_leaf_nodes(df))
print(df[["id", "ancestor_id", "is_leaf", "node_depth"]].head())

The legacy module (from phyloframe import legacy) provides all current phyloframe operations. The legacy API is stable and will continue to be maintained for backward compatibility. A redesigned API will accompany phyloframe v1.0.0.

For a deeper introduction covering tree representation semantics, tree creation, tree computations, tree transforms, Polars, CLI, JIT compilation, and more, see the full quickstart guide.

Citing

If phyloframe contributes to a scholarly work, please cite it as

Matthew Andres Moreno. (2026). mmore500/phyloframe. Zenodo. https://doi.org/10.5281/zenodo.18842674

@software{moreno2026phyloframe,
  author = {Matthew Andres Moreno},
  title = {mmore500/phyloframe},
  month = mar,
  year = 2026,
  publisher = {Zenodo},
  doi = {10.5281/zenodo.18842674},
  url = {https://doi.org/10.5281/zenodo.18842674}
}

And don't forget to leave a star on GitHub!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.8.0

Apr 17, 2026

0.7.0

Apr 3, 2026

0.6.1

Mar 17, 2026

0.6.0

Mar 17, 2026

0.5.3

Mar 16, 2026

0.5.2

Mar 16, 2026

0.5.1

Mar 15, 2026

0.5.0

Mar 15, 2026

0.4.0

Mar 14, 2026

0.3.0

Mar 14, 2026

0.2.3

Mar 14, 2026

0.2.2

Mar 13, 2026

0.2.1

Mar 11, 2026

0.2.0

Mar 11, 2026

0.1.1

Mar 9, 2026

0.1.0

Mar 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phyloframe-0.8.0.tar.gz (1.4 MB view details)

Uploaded Apr 17, 2026 Source

File details

Details for the file phyloframe-0.8.0.tar.gz.

File metadata

Download URL: phyloframe-0.8.0.tar.gz
Upload date: Apr 17, 2026
Size: 1.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for phyloframe-0.8.0.tar.gz
Algorithm	Hash digest
SHA256	`77b7c0c7973cba0269b9ec223592ce574a1be81028b427beeb8b60f9527cf2a3`
MD5	`ed98ec79646b6ce3636d351df1e6aa90`
BLAKE2b-256	`b45c5ce93127e9edd2ea461ebc42e5b57f2bb16b4452399d107e175547b2fcd6`

See more details on using hashes here.

phyloframe 0.8.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

phyloframe

Why a DataFrame-based Tree Representation?

Performance

Install

Quickstart

Citing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes