Skip to main content

High-performance Python library for fitting high-order epistatic interactions in genotype-phenotype maps.

Project description

epistasis-v2

CI PyPI Python License Open in Streamlit

High-performance Python library for fitting high-order epistatic interactions in genotype-phenotype maps. A clean-break rewrite of harmslab/epistasis.

Status: alpha. Phase 1 port, Phase 2 Rust kernel, the Phase 3 Walsh-Hadamard OLS fast path, sparse design matrices for high-order Lasso/ElasticNet, the power and spline nonlinear variants, a monotonic global-epistasis variant (MAVE-NN style), and the remaining classifiers (LDA / QDA / GP / GMM) are all in.

A multi-page Streamlit showcase lives under examples/ and is published at epistasis-v2.streamlit.app.

What changed from v1

  • Rust hot-path kernels via PyO3 (epistasis._core) instead of a shipped Cython .c blob.
  • uv + maturin build. pyproject.toml only; no setup.py.
  • Python 3.10 through 3.13. Older interpreters dropped.
  • Type hints on the public API; mypy --strict in CI.
  • Composition over @use_sklearn MRO injection. Concrete models hold an sklearn estimator as an attribute and forward calls explicitly, which unlocks modern sklearn (>=1.2) that broke the v1 trick when normalize= was removed.
  • Walsh-Hadamard fast-path for Hadamard-encoded OLS fits: O(n log n) closed-form solve, no dense design matrix. Auto-engaged in EpistasisLinearRegression.fit when the attached GPM is a full-order biallelic library under global encoding; everything else falls back to the sklearn path.
  • Sparse design-matrix path for Lasso / ElasticNet via scipy.sparse.csc_matrix. sparse="auto" (default) engages for model_type="local" where the per-site product columns are 0/1; pass sparse=True / False to override. This is the memory fix for L >= 20 where the dense float64 design matrix used to OOM.
  • Monotonic global-epistasis variant EpistasisMonotonicGE modeled as a sum of K tanh sigmoids with b_k, c_k >= 0, following Tareen et al. 2022 (MAVE-NN). Identifiable by construction; modern alternative to the power transform when the nonlinearity isn't a clean Box-Cox shape.
  • Coordinated rewrite of the gpmap dependency as gpmap-v2. Consumes binary_packed (uint8 2D) and encoding_table with site_index instead of the deprecated genotype_index.
  • No backward compatibility with v1. Pin the v1 package if you need that behavior.

Repository layout

epistasis-v2/
├── pyproject.toml          uv + maturin build, ruff + mypy + pytest config
├── Cargo.toml              Rust workspace
├── python/epistasis/       Python source (installed as `epistasis`)
├── crates/epistasis-core/  Rust crate, exposed as `epistasis._core`
├── tests/                  pytest suite
├── benches/                pytest-benchmark suites (matrix kernels + FWHT)
├── docs/                   Sphinx docs (Phase 5)
├── .github/workflows/      CI (lint, test, matrix) + release (semantic-release, maturin wheels, PyPI OIDC)
├── CHANGELOG.md            generated by python-semantic-release
└── CONTRIBUTING.md         commit conventions, dev workflow

Installation (dev)

Requires Python >= 3.10 and a Rust toolchain. gpmap-v2 is pulled from PyPI.

uv sync
uv run maturin develop --release
uv run pytest

For lint and type-check:

uv run ruff check .
uv run ruff format --check .
uv run mypy python/epistasis

Current progress

Phase 0 (scaffold), Phase 1 (port), Phase 2 (Rust kernels), and Phase 3 (FWHT fast path + sparse design matrices for Lasso/ElasticNet) are complete.

Ported modules:

  • epistasis.mapping (sites, coefficients, EpistasisMap)
  • epistasis.matrix (encoded vectors and design matrix; Rust-backed)
  • epistasis.exceptions (EpistasisError, XMatrixError, FittingError)
  • epistasis.utils (genotypes_to_X)
  • epistasis.models.base (AbstractEpistasisModel, EpistasisBaseModel)
  • epistasis.models.linear (EpistasisLinearRegression with analytic coefficient standard errors and a Walsh-Hadamard fast path for full-order biallelic fits, EpistasisRidge, EpistasisLasso and EpistasisElasticNet with an auto-engaged scipy.sparse design-matrix path)
  • epistasis.models.nonlinear (EpistasisNonlinearRegression, FunctionMinimizer, EpistasisPowerTransform (Sailer & Harms 2017), EpistasisSpline (smoothing spline via scipy.interpolate.UnivariateSpline), EpistasisMonotonicGE (monotone tanh-sum global epistasis, Tareen et al. 2022))
  • epistasis.models.classifiers (EpistasisLogisticRegression, EpistasisLDA, EpistasisQDA, EpistasisGaussianProcess, EpistasisGaussianMixture)
  • epistasis.simulate (simulate_linear_gpm, simulate_random_linear_gpm)
  • epistasis.stats (Pearson, R^2, RMSD, SS residuals, AIC, split_gpm)
  • epistasis.validate (k_fold, holdout)
  • epistasis.sampling.bayesian (BayesianSampler via emcee 3)
  • epistasis.fast (fwht_ols_coefficients: closed-form OLS via FWHT)

Rust hot-path kernels in epistasis._core:

  • encode_vectors (uint8 binary_packed to int8 Hadamard/local encoding)
  • build_model_matrix (parallel site-product over genotype rows; flat ragged sites layout)
  • fwht (iterative butterfly Fast Walsh-Hadamard Transform)

Benchmarks vs v1

Measured on Windows 11 against epistasis==0.7.5 + gpmap==0.7.0. Full biallelic space (AT alphabet), timeit best-of-5. See benchmarks/vs_v1.py for reproducible scripts and setup instructions.

Note on v1 times: the Cython extension in epistasis 0.7.5 requires MSVC to compile and produced no pre-built Windows wheel; times below use the pure-Python fallback, which is slower than actual v1+Cython. Even so, the FWHT fast path in v2 is orders of magnitude faster at full order.

fit() order=1 (sklearn lstsq path in both versions)

L genotypes v1 (ms) v2 (ms) speedup
8 256 12.98 1.81 7x
10 1,024 44.07 2.02 22x
12 4,096 183.13 2.61 70x
14 16,384 807.37 5.08 159x
16 65,536 3,771.14 19.35 195x

fit() full order (v1: dense lstsq, v2: FWHT O(N log N))

L genotypes v1 (ms) v2 (ms) speedup
8 256 195.16 1.75 111x
10 1,024 3,004.81 3.10 969x
12 4,096 59,344.00 8.97 >6,000x
14 16,384 (hours) 35.50
16 65,536 (hours) 154.15

Rust kernel vs NumPy reference (internal; release build, 16 threads; see benches/)

kernel input Rust NumPy reference speedup
build_model_matrix L=12, order=3 1.7 ms 10.1 ms ~6x
build_model_matrix L=16, order=3 50 ms 283 ms ~5.7x
encode_vectors L=16 (65k genotypes) 1.06 ms 3.24 ms ~3x
EpistasisLinearRegression.fit full-order L=10 0.78 ms 292 ms (lstsq) ~375x
EpistasisLinearRegression.fit full-order L=12 3.4 ms 15.4 s (lstsq) ~4500x

Contributing

See CONTRIBUTING.md. Commits follow Conventional Commits; releases and the changelog are automated by python-semantic-release.

License

Unlicense (public domain). See UNLICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epistasis_v2-1.2.0.tar.gz (43.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

epistasis_v2-1.2.0-cp310-abi3-win_amd64.whl (241.7 kB view details)

Uploaded CPython 3.10+Windows x86-64

epistasis_v2-1.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (379.1 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

epistasis_v2-1.2.0-cp310-abi3-macosx_11_0_arm64.whl (339.3 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

epistasis_v2-1.2.0-cp310-abi3-macosx_10_12_x86_64.whl (344.8 kB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file epistasis_v2-1.2.0.tar.gz.

File metadata

  • Download URL: epistasis_v2-1.2.0.tar.gz
  • Upload date:
  • Size: 43.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epistasis_v2-1.2.0.tar.gz
Algorithm Hash digest
SHA256 382f6dce919ce71f5d6476f35b57d14eaa035295f640ba6b4ad4ae96c9b982fd
MD5 e611d002741ad208137eba721a77f97f
BLAKE2b-256 d96b0d085bc259055dae70af2f8d1dcc979e2dd7ff3995add5c2c6f799621749

See more details on using hashes here.

Provenance

The following attestation bundles were made for epistasis_v2-1.2.0.tar.gz:

Publisher: release.yml on lperezmo/epistasis-v2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file epistasis_v2-1.2.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: epistasis_v2-1.2.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 241.7 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epistasis_v2-1.2.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 4276eacaa2a960fd5bc610357abfb02eacb29153a64360f24d34a4ba51b012a6
MD5 0f78c00512ceebf0728acb863ee2cf89
BLAKE2b-256 1e41442c640f9fde072b9fa7e661231758af301a79b3af081e99fec1765d53f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for epistasis_v2-1.2.0-cp310-abi3-win_amd64.whl:

Publisher: release.yml on lperezmo/epistasis-v2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file epistasis_v2-1.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for epistasis_v2-1.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 eb58eb1c92d517f7e4876c1d065e466ce246bbd5b85759c4d7401baca2717f15
MD5 7f5993bc956c82be4891bc924732eebd
BLAKE2b-256 576a228011769c9844467423d6b63547722ed2521f666281440118d5f729390e

See more details on using hashes here.

Provenance

The following attestation bundles were made for epistasis_v2-1.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on lperezmo/epistasis-v2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file epistasis_v2-1.2.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for epistasis_v2-1.2.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 440424c44247aa3ad7b704cd96ecb477841c0b841d8b22ab03e9cd85f0363923
MD5 2389998f9dc4039b48db231347110085
BLAKE2b-256 eb12cbaa1e68b966d762d3000249820901fc98dfa37e65658b95aa12782bc353

See more details on using hashes here.

Provenance

The following attestation bundles were made for epistasis_v2-1.2.0-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on lperezmo/epistasis-v2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file epistasis_v2-1.2.0-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for epistasis_v2-1.2.0-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 c4aa30dd4aa7dfff3bb7f14ec1ee68e8a2afda5d255eea2c3193c197e74ec83e
MD5 ca10bfb0a480801c7906f8bd5230af18
BLAKE2b-256 47bec04e7113d8869e400f2ec6703760276e0c349b5ca99632c9f3443ce18e5a

See more details on using hashes here.

Provenance

The following attestation bundles were made for epistasis_v2-1.2.0-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on lperezmo/epistasis-v2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page