Biological extensions for manylatents: popgen and central dogma encoders
Project description
A T G . . C A T . . .
. G C A T . G . --> . .. . --> λ(·)
T . A G C . A T . . .
A T G . C A
m a n y l a t e n t s - o m i c s
from sequence to manifold
Population genetics, single-cell, and foundation model encoders for manylatents. Extends the core DR framework with biological data types and domain-specific metrics.
Install
uv add manylatents-omics
Optional extras:
uv add "manylatents-omics[popgen]" # population genetics
uv add "manylatents-omics[singlecell]" # single-cell (scanpy, anndata)
uv add "manylatents-omics[dogma]" # protein + RNA encoders (ESM3, Orthrus)
DNA encoder (Evo2) requires a separate venv due to torch version conflicts. See
scripts/setup-dna-venv.sh.
Or from the core manylatents repo:
uv sync --extra omics # installs manylatents-omics as a namespace extension
development install
git clone https://github.com/latent-reasoning-works/manylatents-omics.git
cd manylatents-omics && uv sync
Architecture
manylatents-omics is a namespace extension of manylatents. It lives alongside the core repo and adds domain-specific modules under the manylatents.* namespace via pkgutil.extend_path().
lrw/
├── manylatents/ # core DR engine
├── omics/ # this repo — popgen, singlecell, dogma encoders
└── shop/ # cluster infrastructure
Design decision: The core engine stays domain-agnostic. Each "flavor pack" (omics, vision, etc.) is a separate repo/package that extends the manylatents namespace without polluting the core with domain-specific dependencies. Experiment configs (ClinVar pipelines, fusion sweeps, cluster resource presets) belong in downstream experiment repos, not here — this package ships only instantiation configs that define what encoders, datasets, and algorithms are.
Quick start
Omics configs are auto-discovered when the package is installed:
python -m manylatents.main --config-name=config \
experiment=single_algorithm data=pbmc_3k
# Sweep on cluster
python -m manylatents.main -m \
cluster=tamia resources=gpu \
data=hgdp,pbmc_10k algorithms/latent=umap,phate
Modules
popgen — Population genetics via the manifold-genetics CSV pipeline. HGDP+1KGP, UK Biobank, All of Us. Admixture proportions, geographic metadata, QC/relatedness filtering. Requires preprocessing via manifold-genetics (a separate tool, not a Python dependency). Configs: popgen/configs/
- GeographicPreservation — Spearman correlation between haversine and embedding distances
- AdmixturePreservation — Geodesic distance fidelity in admixture simplex vs. latent space
singlecell — AnnData .h5ad loader for scRNA-seq, scATAC-seq, CITE-seq. Ships with PBMC 3k/10k/68k and Embryoid Body. Any .h5ad works via AnnDataset. Configs: singlecell/configs/
dogma — Foundation model encoders for DNA, RNA, and protein sequences. Supports single-modality encoding, multi-layer extraction, and cross-modal fusion. All encoders inherit from FoundationEncoder — lazy model loading, batched encoding with OOM retry, standard fit()/transform() interface. Configs: dogma/configs/
- ESM3 — Protein, 1536-dim, masked mean-pool, true batched forward
- Evo2 — DNA, 1920/4096/8192-dim (1B/7B/40B), multi-layer extraction, 1M bp context
- Orthrus — RNA, 256/512-dim (4-track/6-track), Mamba SSM re-implementation for mamba-ssm 2.x
- AlphaGenome — DNA, 1536/3072-dim (1bp/128bp), JAX-based, regulatory track predictions, chunked encoding
ClinVar pipeline
Reference pipeline for variant-effect analysis via geometric methods. Encodes DNA and protein sequences flanking ClinVar variants, then applies dimensionality reduction to study how pathogenic vs. benign variants separate in embedding space. Three stages: DNA encoding, protein encoding, and geometric analysis (fusion + DR). See docs/clinvar_pipeline.md for full details. Experiment configs live in downstream repos (e.g. merging_dogma), not in this package.
Development
uv sync
pytest tests/ -v
Citing
If manylatents-omics was useful in your research, a citation goes a long way:
@software{manylatents_omics2026,
title = {manyLatents-Omics: Biological Extensions for Unified Dimensionality Reduction},
author = {Scicluna, Matthew and Valdez C{\'o}rdova, C{\'e}sar Miguel},
year = {2026},
url = {https://github.com/latent-reasoning-works/manylatents-omics},
license = {MIT}
}
MIT License · Latent Reasoning Works
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file manylatents_omics-0.1.2.tar.gz.
File metadata
- Download URL: manylatents_omics-0.1.2.tar.gz
- Upload date:
- Size: 120.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5971e60886e1eccdee3c992d345531a278080a48de1cc866c1fed50fd7b31ee6
|
|
| MD5 |
6ce97a433b524aec8287820f95d4771f
|
|
| BLAKE2b-256 |
a7a971d27e4bd017d5eb3a622e8761b6c1770fbec85865ef18187170464057df
|
Provenance
The following attestation bundles were made for manylatents_omics-0.1.2.tar.gz:
Publisher:
publish.yml on latent-reasoning-works/manylatents-omics
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
manylatents_omics-0.1.2.tar.gz -
Subject digest:
5971e60886e1eccdee3c992d345531a278080a48de1cc866c1fed50fd7b31ee6 - Sigstore transparency entry: 1231364705
- Sigstore integration time:
-
Permalink:
latent-reasoning-works/manylatents-omics@cccb206df482db377806b37dc7bdcfa39b07b61a -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/latent-reasoning-works
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cccb206df482db377806b37dc7bdcfa39b07b61a -
Trigger Event:
push
-
Statement type:
File details
Details for the file manylatents_omics-0.1.2-py3-none-any.whl.
File metadata
- Download URL: manylatents_omics-0.1.2-py3-none-any.whl
- Upload date:
- Size: 124.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5e3ae695b903b7b1b2b0baad18ac4c4e71df761606a3c41ef5ac4f75e207b42
|
|
| MD5 |
0f55bb66daff64d07bb11923a1e6983e
|
|
| BLAKE2b-256 |
9a552904f30be3f88986998ec832f099d43776a6874d423fc29bc2ab3396fe31
|
Provenance
The following attestation bundles were made for manylatents_omics-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on latent-reasoning-works/manylatents-omics
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
manylatents_omics-0.1.2-py3-none-any.whl -
Subject digest:
c5e3ae695b903b7b1b2b0baad18ac4c4e71df761606a3c41ef5ac4f75e207b42 - Sigstore transparency entry: 1231364779
- Sigstore integration time:
-
Permalink:
latent-reasoning-works/manylatents-omics@cccb206df482db377806b37dc7bdcfa39b07b61a -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/latent-reasoning-works
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cccb206df482db377806b37dc7bdcfa39b07b61a -
Trigger Event:
push
-
Statement type: