Skip to main content

Biological extensions for manylatents: popgen and central dogma encoders

Project description

  A T G . . C A T        .  . .
  . G C A T . G .   -->  . .. .  -->  λ(·)
   T . A G C . A T        .  . .
     A T G . C A

     m a n y l a t e n t s - o m i c s

        from sequence to manifold

license python uv PyPI docs


Population genetics, single-cell, and foundation model encoders for manylatents. Extends the core DR framework with biological data types and domain-specific metrics.

Install

uv add manylatents-omics

Optional extras:

uv add "manylatents-omics[popgen]"      # population genetics
uv add "manylatents-omics[singlecell]"  # single-cell (scanpy, anndata)
uv add "manylatents-omics[dogma]"       # protein + RNA encoders (ESM3, Orthrus)

DNA encoder (Evo2) requires a separate venv due to torch version conflicts. See scripts/setup-dna-venv.sh.

Or from the core manylatents repo:

uv sync --extra omics   # installs manylatents-omics as a namespace extension
development install
git clone https://github.com/latent-reasoning-works/manylatents-omics.git
cd manylatents-omics && uv sync

Architecture

manylatents-omics is a namespace extension of manylatents. It lives alongside the core repo and adds domain-specific modules under the manylatents.* namespace via pkgutil.extend_path().

lrw/
├── manylatents/    # core DR engine
├── omics/          # this repo — popgen, singlecell, dogma encoders
└── shop/           # cluster infrastructure

Design decision: The core engine stays domain-agnostic. Each "flavor pack" (omics, vision, etc.) is a separate repo/package that extends the manylatents namespace without polluting the core with domain-specific dependencies. Experiment configs (ClinVar pipelines, fusion sweeps, cluster resource presets) belong in downstream experiment repos, not here — this package ships only instantiation configs that define what encoders, datasets, and algorithms are.

Quick start

Omics configs are auto-discovered when the package is installed:

python -m manylatents.main --config-name=config \
  experiment=single_algorithm data=pbmc_3k

# Sweep on cluster
python -m manylatents.main -m \
  cluster=tamia resources=gpu \
  data=hgdp,pbmc_10k algorithms/latent=umap,phate

Modules

popgen — Population genetics via the manifold-genetics CSV pipeline. HGDP+1KGP, UK Biobank, All of Us. Admixture proportions, geographic metadata, QC/relatedness filtering. Requires preprocessing via manifold-genetics (a separate tool, not a Python dependency). Configs: popgen/configs/

singlecell — AnnData .h5ad loader for scRNA-seq, scATAC-seq, CITE-seq. Ships with PBMC 3k/10k/68k and Embryoid Body. Any .h5ad works via AnnDataset. Configs: singlecell/configs/

dogma — Foundation model encoders for DNA, RNA, and protein sequences. Supports single-modality encoding, multi-layer extraction, and cross-modal fusion. All encoders inherit from FoundationEncoder — lazy model loading, batched encoding with OOM retry, standard fit()/transform() interface. Configs: dogma/configs/

  • ESM3 — Protein, 1536-dim, masked mean-pool, true batched forward
  • Evo2 — DNA, 1920/4096/8192-dim (1B/7B/40B), multi-layer extraction, 1M bp context
  • Orthrus — RNA, 256/512-dim (4-track/6-track), Mamba SSM re-implementation for mamba-ssm 2.x
  • AlphaGenome — DNA, 1536/3072-dim (1bp/128bp), JAX-based, regulatory track predictions, chunked encoding

ClinVar pipeline

Reference pipeline for variant-effect analysis via geometric methods. Encodes DNA and protein sequences flanking ClinVar variants, then applies dimensionality reduction to study how pathogenic vs. benign variants separate in embedding space. Three stages: DNA encoding, protein encoding, and geometric analysis (fusion + DR). See docs/clinvar_pipeline.md for full details. Experiment configs live in downstream repos (e.g. merging_dogma), not in this package.

Development

uv sync
pytest tests/ -v

Citing

If manylatents-omics was useful in your research, a citation goes a long way:

@software{manylatents_omics2026,
  title     = {manyLatents-Omics: Biological Extensions for Unified Dimensionality Reduction},
  author    = {Scicluna, Matthew and Valdez C{\'o}rdova, C{\'e}sar Miguel},
  year      = {2026},
  url       = {https://github.com/latent-reasoning-works/manylatents-omics},
  license   = {MIT}
}

MIT License · Latent Reasoning Works

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

manylatents_omics-0.1.0.tar.gz (124.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

manylatents_omics-0.1.0-py3-none-any.whl (98.8 kB view details)

Uploaded Python 3

File details

Details for the file manylatents_omics-0.1.0.tar.gz.

File metadata

  • Download URL: manylatents_omics-0.1.0.tar.gz
  • Upload date:
  • Size: 124.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for manylatents_omics-0.1.0.tar.gz
Algorithm Hash digest
SHA256 43bb5a1f16a88737f40368b9983a71a9dc3cb795d85afe857f0a5c3435357ddb
MD5 dddf90a916eb15b4587aa74ac55cc89b
BLAKE2b-256 f63eaf7b9a9832fa36c14e30ae6990f9044f283c8cef94a3979f7dd875c3fac5

See more details on using hashes here.

Provenance

The following attestation bundles were made for manylatents_omics-0.1.0.tar.gz:

Publisher: publish.yml on latent-reasoning-works/manylatents-omics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file manylatents_omics-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for manylatents_omics-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8cce63f5a27966873f14f1f558d7fe947e4efbf981591cb8f918b2e2e5e298e5
MD5 7c0f540bf85edb3a8ed027725008c375
BLAKE2b-256 ea995a7ba4149ce06c25b9f910dc8ee97dbd52ad40cba275f72874ac8ae7a92e

See more details on using hashes here.

Provenance

The following attestation bundles were made for manylatents_omics-0.1.0-py3-none-any.whl:

Publisher: publish.yml on latent-reasoning-works/manylatents-omics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page