Skip to main content

High-performance Python library for fitting high-order epistatic interactions in genotype-phenotype maps.

Project description

epistasis-v2

CI PyPI Python License Open in Streamlit

High-performance Python library for fitting high-order epistatic interactions in genotype-phenotype maps. A clean-break rewrite of harmslab/epistasis.

Status: alpha. Phase 1 port, Phase 2 Rust kernel, and the Phase 3 Walsh-Hadamard OLS fast path are all in. Sparse design matrices for high-order Lasso and remaining polish items are still to come.

A multi-page Streamlit showcase lives under examples/ and is published at epistasis-v2.streamlit.app.

What changed from v1

  • Rust hot-path kernels via PyO3 (epistasis._core) instead of a shipped Cython .c blob.
  • uv + maturin build. pyproject.toml only; no setup.py.
  • Python 3.10 through 3.13. Older interpreters dropped.
  • Type hints on the public API; mypy --strict in CI.
  • Composition over @use_sklearn MRO injection. Concrete models hold an sklearn estimator as an attribute and forward calls explicitly, which unlocks modern sklearn (>=1.2) that broke the v1 trick when normalize= was removed.
  • Walsh-Hadamard fast-path for Hadamard-encoded OLS fits: O(n log n) closed-form solve, no dense design matrix. Auto-engaged in EpistasisLinearRegression.fit when the attached GPM is a full-order biallelic library under global encoding; everything else falls back to the sklearn path.
  • Sparse design matrix path for Lasso / ElasticNet at high order (pending; a memory concern at L >= 20).
  • Coordinated rewrite of the gpmap dependency as gpmap-v2. Consumes binary_packed (uint8 2D) and encoding_table with site_index instead of the deprecated genotype_index.
  • No backward compatibility with v1. Pin the v1 package if you need that behavior.

Repository layout

epistasis-v2/
├── pyproject.toml          uv + maturin build, ruff + mypy + pytest config
├── Cargo.toml              Rust workspace
├── python/epistasis/       Python source (installed as `epistasis`)
├── crates/epistasis-core/  Rust crate, exposed as `epistasis._core`
├── tests/                  pytest suite
├── benches/                pytest-benchmark suites (matrix kernels + FWHT)
├── docs/                   Sphinx docs (Phase 5)
├── .github/workflows/      CI (lint, test, matrix) + release (semantic-release, maturin wheels, PyPI OIDC)
├── CHANGELOG.md            generated by python-semantic-release
└── CONTRIBUTING.md         commit conventions, dev workflow

Installation (dev)

Requires Python >= 3.10 and a Rust toolchain. gpmap-v2 is pulled from PyPI.

uv sync
uv run maturin develop --release
uv run pytest

For lint and type-check:

uv run ruff check .
uv run ruff format --check .
uv run mypy python/epistasis

Current progress

Phase 0 (scaffold), Phase 1 (port), Phase 2 (Rust kernels), and most of Phase 3 (FWHT fast path) are complete.

Ported modules:

  • epistasis.mapping (sites, coefficients, EpistasisMap)
  • epistasis.matrix (encoded vectors and design matrix; Rust-backed)
  • epistasis.exceptions (EpistasisError, XMatrixError, FittingError)
  • epistasis.utils (genotypes_to_X)
  • epistasis.models.base (AbstractEpistasisModel, EpistasisBaseModel)
  • epistasis.models.linear (EpistasisLinearRegression with analytic coefficient standard errors and a Walsh-Hadamard fast path for full-order biallelic fits, EpistasisRidge, EpistasisLasso, EpistasisElasticNet)
  • epistasis.models.nonlinear (EpistasisNonlinearRegression, FunctionMinimizer; power and spline variants deferred)
  • epistasis.models.classifiers (EpistasisLogisticRegression; LDA, QDA, Gaussian Process, and GMM deferred)
  • epistasis.simulate (simulate_linear_gpm, simulate_random_linear_gpm)
  • epistasis.stats (Pearson, R^2, RMSD, SS residuals, AIC, split_gpm)
  • epistasis.validate (k_fold, holdout)
  • epistasis.sampling.bayesian (BayesianSampler via emcee 3)
  • epistasis.fast (fwht_ols_coefficients: closed-form OLS via FWHT)

Rust hot-path kernels in epistasis._core:

  • encode_vectors (uint8 binary_packed to int8 Hadamard/local encoding)
  • build_model_matrix (parallel site-product over genotype rows; flat ragged sites layout)
  • fwht (iterative butterfly Fast Walsh-Hadamard Transform)

Benchmarks vs v1

Measured on Windows 11 against epistasis==0.7.5 + gpmap==0.7.0. Full biallelic space (AT alphabet), timeit best-of-5. See benchmarks/vs_v1.py for reproducible scripts and setup instructions.

Note on v1 times: the Cython extension in epistasis 0.7.5 requires MSVC to compile and produced no pre-built Windows wheel; times below use the pure-Python fallback, which is slower than actual v1+Cython. Even so, the FWHT fast path in v2 is orders of magnitude faster at full order.

fit() order=1 (sklearn lstsq path in both versions)

L genotypes v1 (ms) v2 (ms) speedup
8 256 12.98 1.81 7x
10 1,024 44.07 2.02 22x
12 4,096 183.13 2.61 70x
14 16,384 807.37 5.08 159x
16 65,536 3,771.14 19.35 195x

fit() full order (v1: dense lstsq, v2: FWHT O(N log N))

L genotypes v1 (ms) v2 (ms) speedup
8 256 195.16 1.75 111x
10 1,024 3,004.81 3.10 969x
12 4,096 59,344.00 8.97 >6,000x
14 16,384 (hours) 35.50
16 65,536 (hours) 154.15

Rust kernel vs NumPy reference (internal; release build, 16 threads; see benches/)

kernel input Rust NumPy reference speedup
build_model_matrix L=12, order=3 1.7 ms 10.1 ms ~6x
build_model_matrix L=16, order=3 50 ms 283 ms ~5.7x
encode_vectors L=16 (65k genotypes) 1.06 ms 3.24 ms ~3x
EpistasisLinearRegression.fit full-order L=10 0.78 ms 292 ms (lstsq) ~375x
EpistasisLinearRegression.fit full-order L=12 3.4 ms 15.4 s (lstsq) ~4500x

Pending:

  • Sparse design matrix path for Lasso / ElasticNet (memory at L >= 20)
  • power.py and spline.py nonlinear variants
  • Remaining classifier implementations if demand surfaces
  • ReadTheDocs build

Contributing

See CONTRIBUTING.md. Commits follow Conventional Commits; releases and the changelog are automated by python-semantic-release.

License

Unlicense (public domain). See UNLICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epistasis_v2-1.1.1.tar.gz (32.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

epistasis_v2-1.1.1-cp310-abi3-win_amd64.whl (225.1 kB view details)

Uploaded CPython 3.10+Windows x86-64

epistasis_v2-1.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (361.8 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

epistasis_v2-1.1.1-cp310-abi3-macosx_11_0_arm64.whl (321.9 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

epistasis_v2-1.1.1-cp310-abi3-macosx_10_12_x86_64.whl (327.4 kB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file epistasis_v2-1.1.1.tar.gz.

File metadata

  • Download URL: epistasis_v2-1.1.1.tar.gz
  • Upload date:
  • Size: 32.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epistasis_v2-1.1.1.tar.gz
Algorithm Hash digest
SHA256 8c5eacf11586f8ec096d962b0858435eb0d742af9b644e918df414fd6d5fc321
MD5 a484a2bb1474f85c770081a0df727fdc
BLAKE2b-256 1f0a2a117b931e8026884fa65aa25cb6d580651011255e2cd4c193cc802a5cde

See more details on using hashes here.

Provenance

The following attestation bundles were made for epistasis_v2-1.1.1.tar.gz:

Publisher: release.yml on lperezmo/epistasis-v2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file epistasis_v2-1.1.1-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: epistasis_v2-1.1.1-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 225.1 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epistasis_v2-1.1.1-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 688c10474357a6c4195e9f04e006ead12ea32186d7a9812ca928e4d87258b8b0
MD5 09ae20b59d95e6fd17125d6ecde7e6a4
BLAKE2b-256 c7fea72880f343f86ed65af604c4541630b64e279f5fdeb084c481c72029fdd1

See more details on using hashes here.

Provenance

The following attestation bundles were made for epistasis_v2-1.1.1-cp310-abi3-win_amd64.whl:

Publisher: release.yml on lperezmo/epistasis-v2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file epistasis_v2-1.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for epistasis_v2-1.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ece2a2515c90bb521aae4da3e8eb69b82d1f34d2a16981eca5599a159d53b2eb
MD5 6b725f09167c034eff02ba6bfce9090e
BLAKE2b-256 64f116294141c2a24779b6a7e94b877b2d9ae87718d592132d3041f2869b1a06

See more details on using hashes here.

Provenance

The following attestation bundles were made for epistasis_v2-1.1.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on lperezmo/epistasis-v2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file epistasis_v2-1.1.1-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for epistasis_v2-1.1.1-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ab64fee40a51982e935aa6094ca7e92a56d8d7c53eb8b920cc5ae06c9f293537
MD5 0250483c31959fbd6dafd3de2d1d6fda
BLAKE2b-256 335ad31b988d44e76970a92012ef57f4554bcb0b43b6f47050b5fd72b0be2756

See more details on using hashes here.

Provenance

The following attestation bundles were made for epistasis_v2-1.1.1-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on lperezmo/epistasis-v2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file epistasis_v2-1.1.1-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for epistasis_v2-1.1.1-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 074208e3113b85c8d0ae4fe753787d23f62f35d4adb22b949c951edff81df19b
MD5 8c1334171abf5fe3400fb01dc3dda67c
BLAKE2b-256 737005d0c3611375a246a355961438e3d881d2cc7f3440f479c02a1d806d0ea8

See more details on using hashes here.

Provenance

The following attestation bundles were made for epistasis_v2-1.1.1-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on lperezmo/epistasis-v2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page