Precomputed-P supervised-ADMIXTURE projection cache: build slow once, project fast per target.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

admixture-cache

Precomputed-P supervised-ADMIXTURE projection cache. Build the slow training pass once per panel × K × clusters_yaml combo; project new targets in ~2 seconds.

Why this exists

Supervised ADMIXTURE training on a real-world panel takes hours to days per restart (K=21 regional cache: ~12-14 hr × 5 restarts; K=4 ancestral_cluster: ~5-7 hr × 5 restarts). For consumer pipelines serving many users, re-running this training per target is wasteful — the P matrix is determined almost entirely by the panel, not the target.

admixture-cache splits the supervised-ADMIXTURE workflow into:

Panel cache build (operator, slow, one-time per panel update): stock ADMIXTURE × N restarts → cache best-LL P matrix + multimodality SD + manifest.
Per-target projection (consumer, fast, every run): align target.bed to cached panel variants + axes (plink2), load dosages, solve for Q via scipy SLSQP under the standard binomial admixture likelihood.

The projection math matches stock ADMIXTURE Q values to within ~1e-5 absolute on representative workloads (15K × 850K matrix at K=4).

Install

pip install admixture-cache

Python 3.11 through 3.14 are supported. End-to-end paths require ADMIXTURE (for build) and plink2 (for project / verify) on PATH. Pure-library use without those binaries is fine — only the build/projection orchestrators shell out.

Quickstart — library

from pathlib import Path
from admixture_cache import build_panel_cache, project_target

# One-time, slow (~hours per restart per cache)
manifest = build_panel_cache(
    panel_bed=Path("panel.bed"),
    panel_pop_file=Path("panel.pop"),
    clusters_yaml=Path("clusters.yaml"),
    k=21,
    cache_dir=Path("data/regional_k21_cache/"),
    admixture_runner=my_tool_runner,  # see ToolRunner Protocol below
    track="regional",
    panel_id="aadr_v66_ho",
    panel_version="v66.0",
    admixture_version="1.4.0",
    seeds=[1, 2, 3, 4, 5],
    sd_threshold=0.02,
)

# Per-target, fast (~2 seconds end-to-end)
result = project_target(
    target_bed=Path("target.bed"),
    cache_dir=Path("data/regional_k21_cache/"),
    plink2_runner=my_plink2_runner,
    work_dir=Path("scratch/projection/"),
)
print(result.target_q)               # K-vector
print(result.cluster_order)          # K names
print(result.panel_stability_max_sd) # cached panel restart_sd

Quickstart — CLI

Installing the package registers the admixture-cache console script with four subcommands:

# 1. Build a panel cache (slow, one-time).
admixture-cache build \
    --panel-bed panel.bed \
    --panel-pop panel.pop \
    --clusters-yaml clusters.yaml \
    --k 21 \
    --cache-dir data/regional_k21_cache/ \
    --track regional \
    --panel-id aadr_v66_ho \
    --panel-version v66.0 \
    --seeds 1,2,3,4,5

# 2. Project a target against an existing cache (fast).
admixture-cache project \
    --target-bed target.bed \
    --cache-dir data/regional_k21_cache/ \
    --work-dir scratch/projection/

# 3. Check whether a cache matches the current panel/YAML/K config.
admixture-cache verify \
    --panel-bed panel.bed \
    --clusters-yaml clusters.yaml \
    --k 21 \
    --cache-dir data/regional_k21_cache/

# 4. Fetch a canonical published cache from GitHub Releases.
admixture-cache download --list                            # enumerate
admixture-cache download regional_k21_aadr_v66_ho          # install
admixture-cache download regional_k21_aadr_v66_ho \
    --cache-root ~/.admixture-cache/caches \
    --cache-version v2 \
    --force                                                # pin + overwrite

Caches install at <cache-root>/<name>/ (default: ~/.admixture-cache/caches/, or $ADMIXTURE_CACHE_ROOT if set). The downloader streams the tarball, verifies its SHA-256, validates the extracted manifest, and atomically renames into place — partial downloads never leave a half-installed cache.

Publishing your own canonical caches: see docs/PUBLISH_CACHE.md for the tag convention + tarball format the discovery code expects.

The default SubprocessToolRunner runs the local admixture / plink2 binaries on PATH; override with --admixture-binary / --plink2-binary to point at a specific build.

build, project, and verify all surface a non-zero exit code on failure with a descriptive error: … line on stderr. project --json emits machine-readable JSON instead of human-readable text.

ToolRunner Protocol

When calling the library from Python (rather than via the CLI), pass any object satisfying the ToolRunner Protocol:

from collections.abc import Callable
from pathlib import Path

class MyToolRunner:
    def run(
        self,
        *,
        args: list[str],
        cwd: Path,
        log_dir: Path,
        timeout_seconds: int = 600,
        # The two kwargs below are OPTIONAL but REQUIRED for
        # parallel `build_panel_cache` (max_parallel_restarts > 1):
        log_name: str | None = None,
        pid_callback: Callable[[int], None] | None = None,
    ) -> object:
        ...

log_name — admixture-cache passes the per-restart canonical log filename (e.g. restart_3.out). Honor it when set; fall back to your own naming scheme when None. Required for parallel mode (concurrent restarts share log_dir and need disambiguated filenames).
pid_callback — call with the subprocess PID immediately after spawning. admixture-cache uses this to SIGTERM in-flight restarts on first-failure cancellation. Required for parallel mode.
Spawn subprocesses with start_new_session=True so each child gets its own process group. The cancellation path signals the pgid (via os.killpg) rather than the bare PID — avoids the classic UNIX PID-recycle race when a subprocess exits between PID capture and the cancellation pass.

Adapters that forward via **kwargs (e.g. def run(self, **kwargs): return self._inner.run(**kwargs)) are recognized as supporting both extensions — but the inner runner MUST actually honor them. A **kwargs forwarder that silently strips unknown kwargs will pass the parallel-mode guard but produce incoherent logs and broken cancellation.

For non-parallel use (max_parallel_restarts=1), both extensions are optional — only the four baseline kwargs are required.

Cache directory layout

After build_panel_cache succeeds, cache_dir contains:

cache_dir/
├── panel.K.P              # Best-LL restart's allele freqs (M × K)
├── panel.K.Q              # Best-LL restart's non-target Q (N × K)
├── panel.bim              # Variant set + REF/ALT axes (alignment ref)
├── restart_sd.json        # Per-cluster SD across restarts
├── cluster_order.json     # K column → cluster name mapping
├── manifest.json          # Panel SHA + YAML SHA + K + version pins
└── build_logs/            # ADMIXTURE stdout/stderr per restart

Cache validity is determined by manifest.json SHAs matching the current config (panel.bim, clusters_yaml, K, optional geo-filter YAMLs). Any mismatch → consumer code can fall back to a full ADMIXTURE training pass or rebuild the cache.

When to use this

Multi-user services: cache once, project for every user (~5,000× per-target speedup at scale)
Reproducibility: published canonical caches (forthcoming via GitHub Releases) give byte-identical P across consumers
CI/CD: faster integration tests once you have a cache

When NOT to use this

One-time analyses with a custom panel that won't be reused — full ADMIXTURE is simpler
Novel methodologies requiring per-target P refinement — the projection assumes P is fully determined by the panel

Status

v1.4 — Current. Drops the consumer-specific track enum constraint; the track and continent manifest fields are now free-text provenance labels.
v1.3 — Adds the admixture-cache download command + download_cache / list_available_caches / CacheRelease Python API. Canonical caches are published as GitHub Releases following the cache-<name>-<version> tag convention (see docs/PUBLISH_CACHE.md).
v1.2 — End-to-end integration suite against real ADMIXTURE 1.4 + plink2.
v1.1 — NUMA pinning, PGEN target format support, Hypothesis property tests.
v1.0 — First PyPI release. Cache directory layout stable at schema v1; numerical parity validated against stock ADMIXTURE.

See CHANGELOG.md for the full per-release detail.

Contributing

See CONTRIBUTING.md for dev setup, the three local validation gates (pytest / ruff / mypy), commit conventions, and the tag → OIDC PyPI release procedure. See DEVELOPMENT.md for the architecture map, design rationale, and module-level walkthroughs.

Acknowledgments

This library was extracted from ancestry-pipeline's in-pipeline supervised-ADMIXTURE projection module (pop_automation/admixture_projection.py, ~744 LOC, validated against real-world workloads). The split lets sibling projects depend on the cache layer without pulling in the larger orchestrator.

License

MIT. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

carstenerickson

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.4.2

May 29, 2026

This version

1.4.1

May 29, 2026

1.4.0

May 27, 2026

1.3.0

May 27, 2026

1.2.0

May 26, 2026

1.1.1

May 26, 2026

1.1.0

May 26, 2026

1.0.0

May 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

admixture_cache-1.4.1.tar.gz (55.7 kB view details)

Uploaded May 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

admixture_cache-1.4.1-py3-none-any.whl (58.2 kB view details)

Uploaded May 29, 2026 Python 3

File details

Details for the file admixture_cache-1.4.1.tar.gz.

File metadata

Download URL: admixture_cache-1.4.1.tar.gz
Upload date: May 29, 2026
Size: 55.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for admixture_cache-1.4.1.tar.gz
Algorithm	Hash digest
SHA256	`ec2a10698619daacc2db8dc05acb39586ecae055e995ba854f4d7f220e14c09b`
MD5	`1ad165e3616e0872436438c7108e3500`
BLAKE2b-256	`9c20315bc024b3c1b8433dde95acd7c43bd937383233bc8c58c3299a222bde8d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for admixture_cache-1.4.1.tar.gz:

Publisher: release.yml on carstenerickson/admixture-cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: admixture_cache-1.4.1.tar.gz
- Subject digest: ec2a10698619daacc2db8dc05acb39586ecae055e995ba854f4d7f220e14c09b
- Sigstore transparency entry: 1672178751
- Sigstore integration time: May 29, 2026
Source repository:
- Permalink: carstenerickson/admixture-cache@6780d66012c829628ee6e0ab08aaca71e33dfaae
- Branch / Tag: refs/tags/v1.4.1
- Owner: https://github.com/carstenerickson
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@6780d66012c829628ee6e0ab08aaca71e33dfaae
- Trigger Event: push

File details

Details for the file admixture_cache-1.4.1-py3-none-any.whl.

File metadata

Download URL: admixture_cache-1.4.1-py3-none-any.whl
Upload date: May 29, 2026
Size: 58.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for admixture_cache-1.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ec5e92abeb48b5d186cdaaf3c96e2315cb914967a9d54389d1877c23ad753974`
MD5	`d106f9ef7a6a268e72f04b59a34562da`
BLAKE2b-256	`cc2bd9e98461402043853aaa92e5d6d7934cb3e4a0f9c646e0a7502ffa7e7e4c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for admixture_cache-1.4.1-py3-none-any.whl:

Publisher: release.yml on carstenerickson/admixture-cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: admixture_cache-1.4.1-py3-none-any.whl
- Subject digest: ec5e92abeb48b5d186cdaaf3c96e2315cb914967a9d54389d1877c23ad753974
- Sigstore transparency entry: 1672178761
- Sigstore integration time: May 29, 2026
Source repository:
- Permalink: carstenerickson/admixture-cache@6780d66012c829628ee6e0ab08aaca71e33dfaae
- Branch / Tag: refs/tags/v1.4.1
- Owner: https://github.com/carstenerickson
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@6780d66012c829628ee6e0ab08aaca71e33dfaae
- Trigger Event: push

admixture-cache 1.4.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

admixture-cache

Why this exists

Install

Quickstart — library

Quickstart — CLI

ToolRunner Protocol

Cache directory layout

When to use this

When NOT to use this

Status

Contributing

Acknowledgments

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance