Skip to main content

Biological extensions for manylatents: popgen and central dogma encoders

Project description

  A T G . . C A T        .  . .
  . G C A T . G .   -->  . .. .  -->  λ(·)
   T . A G C . A T        .  . .
     A T G . C A

     m a n y l a t e n t s - o m i c s

        from sequence to manifold

license python uv PyPI docs


Population genetics, single-cell, and foundation model encoders for manylatents. Extends the core DR framework with biological data types and domain-specific metrics.

Install

uv add manylatents-omics

Optional extras:

uv add "manylatents-omics[popgen]"      # population genetics
uv add "manylatents-omics[singlecell]"  # single-cell (scanpy, anndata)
uv add "manylatents-omics[dogma]"       # protein + RNA encoders (ESM3, Orthrus)

DNA encoder (Evo2) requires a separate venv due to torch version conflicts. See scripts/setup-dna-venv.sh.

Or from the core manylatents repo:

uv sync --extra omics   # installs manylatents-omics as a namespace extension
development install
git clone https://github.com/latent-reasoning-works/manylatents-omics.git
cd manylatents-omics && uv sync

Architecture

manylatents-omics is a namespace extension of manylatents. It lives alongside the core repo and adds domain-specific modules under the manylatents.* namespace via pkgutil.extend_path().

lrw/
├── manylatents/    # core DR engine
├── omics/          # this repo — popgen, singlecell, dogma encoders
└── shop/           # cluster infrastructure

Design decision: The core engine stays domain-agnostic. Each "flavor pack" (omics, vision, etc.) is a separate repo/package that extends the manylatents namespace without polluting the core with domain-specific dependencies. Experiment configs (ClinVar pipelines, fusion sweeps, cluster resource presets) belong in downstream experiment repos, not here — this package ships only instantiation configs that define what encoders, datasets, and algorithms are.

Quick start

Omics configs are auto-discovered when the package is installed:

python -m manylatents.main --config-name=config \
  experiment=single_algorithm data=pbmc_3k

# Sweep on cluster
python -m manylatents.main -m \
  cluster=tamia resources=gpu \
  data=hgdp,pbmc_10k algorithms/latent=umap,phate

Modules

popgen — Population genetics via the manifold-genetics CSV pipeline. HGDP+1KGP, UK Biobank, All of Us. Admixture proportions, geographic metadata, QC/relatedness filtering. Requires preprocessing via manifold-genetics (a separate tool, not a Python dependency). Configs: popgen/configs/

singlecell — AnnData .h5ad loader for scRNA-seq, scATAC-seq, CITE-seq. Ships with PBMC 3k/10k/68k and Embryoid Body. Any .h5ad works via AnnDataset. Configs: singlecell/configs/

dogma — Foundation model encoders for DNA, RNA, and protein sequences. Supports single-modality encoding, multi-layer extraction, and cross-modal fusion. All encoders inherit from FoundationEncoder — lazy model loading, batched encoding with OOM retry, standard fit()/transform() interface. Configs: dogma/configs/

  • ESM3 — Protein, 1536-dim, masked mean-pool, true batched forward
  • Evo2 — DNA, 1920/4096/8192-dim (1B/7B/40B), multi-layer extraction, 1M bp context
  • Orthrus — RNA, 256/512-dim (4-track/6-track), Mamba SSM re-implementation for mamba-ssm 2.x
  • AlphaGenome — DNA, 1536/3072-dim (1bp/128bp), JAX-based, regulatory track predictions, chunked encoding

ClinVar pipeline

Reference pipeline for variant-effect analysis via geometric methods. Encodes DNA and protein sequences flanking ClinVar variants, then applies dimensionality reduction to study how pathogenic vs. benign variants separate in embedding space. Three stages: DNA encoding, protein encoding, and geometric analysis (fusion + DR). See docs/clinvar_pipeline.md for full details. Experiment configs live in downstream repos (e.g. merging_dogma), not in this package.

Development

uv sync
pytest tests/ -v

Citing

If manylatents-omics was useful in your research, a citation goes a long way:

@software{manylatents_omics2026,
  title     = {manyLatents-Omics: Biological Extensions for Unified Dimensionality Reduction},
  author    = {Scicluna, Matthew and Valdez C{\'o}rdova, C{\'e}sar Miguel},
  year      = {2026},
  url       = {https://github.com/latent-reasoning-works/manylatents-omics},
  license   = {MIT}
}

MIT License · Latent Reasoning Works

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

manylatents_omics-0.1.2.tar.gz (120.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

manylatents_omics-0.1.2-py3-none-any.whl (124.5 kB view details)

Uploaded Python 3

File details

Details for the file manylatents_omics-0.1.2.tar.gz.

File metadata

  • Download URL: manylatents_omics-0.1.2.tar.gz
  • Upload date:
  • Size: 120.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for manylatents_omics-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5971e60886e1eccdee3c992d345531a278080a48de1cc866c1fed50fd7b31ee6
MD5 6ce97a433b524aec8287820f95d4771f
BLAKE2b-256 a7a971d27e4bd017d5eb3a622e8761b6c1770fbec85865ef18187170464057df

See more details on using hashes here.

Provenance

The following attestation bundles were made for manylatents_omics-0.1.2.tar.gz:

Publisher: publish.yml on latent-reasoning-works/manylatents-omics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file manylatents_omics-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for manylatents_omics-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c5e3ae695b903b7b1b2b0baad18ac4c4e71df761606a3c41ef5ac4f75e207b42
MD5 0f55bb66daff64d07bb11923a1e6983e
BLAKE2b-256 9a552904f30be3f88986998ec832f099d43776a6874d423fc29bc2ab3396fe31

See more details on using hashes here.

Provenance

The following attestation bundles were made for manylatents_omics-0.1.2-py3-none-any.whl:

Publisher: publish.yml on latent-reasoning-works/manylatents-omics

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page