Multi-modal Gene Expression Decomposition for Integration (CPU; GPU backend optional) - scverse-compliant multi-omics integration
Project description
multigedi
Multi-modal joint factor analysis for single-cell data — Python port of the R
multigedi package, with an optional
CUDA backend for accelerated training. scverse-compliant API.
Status: alpha. The API may change. Bit-identical numerical outputs to the R reference are validated by
tests/test_cpu_vs_r.py(relative diff ~1e-12 after 20 iterations).
What it does
multigedi jointly factorizes several modalities of single-cell data into a
shared latent space (K-dimensional cell embeddings) plus modality-specific
metagenes. The hub-and-spoke MultiGEDI model handles three observation types:
M— count matrix (e.g. gene expression)M_paired— paired counts (e.g. spliced / unspliced for RNA velocity)X— binary indicator matrix (e.g. cluster / chromatin state)
Output: a joint PCA in mdata[<first_modality>].obsm["X_multigedi_pca"] that
plugs into the standard scanpy workflow (sc.pp.neighbors, sc.tl.umap,
sc.tl.leiden).
Install
CPU only
Requires Python ≥3.10, a C++14 compiler, Eigen3 ≥3.3, and pybind11 (auto-pulled by the build).
# On HPC with module system:
module load eigen/3.4.0 # or set CMAKE_PREFIX_PATH to your Eigen install
pip install -e .
The build uses scikit-build-core;
no separate setup.py step needed.
GPU (optional)
The CUDA backend (libmultigedi_gpu.so) is opt-in at install time.
Requires CUDA ≥11.8 and an NVCC toolchain on the build machine.
Recommended (bundled in the wheel):
pip install --config-settings=cmake.define.MULTIGEDI_BUILD_GPU=ON .
The CMake option MULTIGEDI_BUILD_GPU=ON triggers a build of
libmultigedi_gpu.so and bundles it inside the wheel at
<site-packages>/multigedi/_gpu/libmultigedi_gpu.so with
INSTALL_RPATH "$ORIGIN". The Python wrapper (_ctypes_api._find_lib)
locates it automatically — no MULTIGEDI_GPU_LIB env var needed.
Legacy (standalone build, .so outside the wheel):
cd src/_multigedi_gpu
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j 4
export MULTIGEDI_GPU_LIB=$(pwd)/build/libmultigedi_gpu.so
This path is still supported for HPC sites that ship a custom build
alongside a stock CPU-only wheel — set MULTIGEDI_GPU_LIB to override
the wheel-bundled .so.
The library auto-detects up to 16 modalities of arbitrary obs_type mix
at runtime. Tested on NVIDIA H100; should work on any compute capability ≥7.0.
Production deployment
The default build targets a fatbin spanning Volta through Hopper so a
single libmultigedi_gpu.so runs on the common production fleet:
| Compute capability | Generation | Examples |
|---|---|---|
sm_70 |
Volta | V100 |
sm_75 |
Turing | T4, RTX 20-series |
sm_80 |
Ampere | A100 |
sm_86 |
Ampere | A40, RTX 30-series |
sm_90 |
Hopper | H100 |
CUDA toolkit ≥ 12.0 is recommended; 11.8 is the floor. The build
matrix is overridable: pass -DCMAKE_CUDA_ARCHITECTURES="<arches>"
at configure time to trim the fatbin (e.g. "90" for H100-only CI,
or "native" to autodetect the local device).
For HPC deployments where the same .so is shared across login and
compute nodes, set MULTIGEDI_GPU_LIB=/path/to/libmultigedi_gpu.so
in the user's environment instead of relying on LD_LIBRARY_PATH —
the Python wrapper checks this env var first (see
src/multigedi/_gpu/_ctypes_api.py::_find_lib for the full search
order).
Stay on the default gpu_low_memory=True (arena mode) for any
production run with more than two modalities; the legacy
high-memory path is unsupported there. See
docs/architecture/gpu_backend.md
for the v2 metadata layout, MAX_MODALITIES bump procedure, and the
full list of known limitations.
Quickstart
import mudata as md
import multigedi as gd
mdata = md.read_h5mu("your_data.h5mu") # 'sample' column required in .obs
gd.tl.multigedi(
mdata,
modalities={
"gene": {"obs_type": "M", "orthoZ": True},
"splicing": {"obs_type": "M_list", "orthoZ": False, "layers": (None, "M2")},
},
sample_key="sample",
K=20,
max_iterations=30,
use_gpu=False, # set True to use libmultigedi_gpu.so
)
# Joint embedding (cells × K) — feed to scanpy
import scanpy as sc
sc.pp.neighbors(mdata["gene"], use_rep="X_multigedi_pca")
sc.tl.umap(mdata["gene"])
Full results land in mdata["gene"].uns["multigedi"]["model"] (per-modality
Z, D, Bi, sigma2, tracking) and the joint PCA in
mdata["gene"].obsm["X_multigedi_pca"].
Tutorial
See notebooks/multigedi_tutorial.ipynb
for a runnable end-to-end walkthrough on bundled 5K-cell test data: load HDF5
counts → assemble MuData → fit on CPU → fit on GPU → UMAP visualization.
Reproducibility & relationship to R multigedi
multigedi's BCD optimizer is mathematically identical to the R reference's
— given the same iter_0 state, both produce bit-identical (≤1e-12 relative)
output after any number of iterations. The tests/test_cpu_vs_r.py regression
check loads R's exported iter_0 directly into Python via the
add_modality(init_state=...) path and verifies this at machine precision.
Cross-backend independent runs do not converge to the same factorization. The randomized SVD initializer draws its random projection matrix from a language-specific RNG (R's Mersenne-Twister vs numpy's PCG64). Different draws land BCD into different deep local minima of the non-convex objective. After 50 iterations on the bundled 5K-cell dataset:
- Both backends reach virtually identical sigma² (R 0.366 / Py 0.370 for gene)
- Both backends are internally converged (col-correlation 0.86–0.92 between iter_20 and iter_50 within each)
- But cross-backend Z subspaces remain ~89° apart, and pairwise cell distances in the joint embedding correlate at only ~0.03 between R and Python
Implication for users: pick one tool per dataset. Don't mix R-trained and
Python-trained multigedi outputs in the same downstream analysis. Within a
single backend, identical seed parameter produces identical results. If you
need bit-exact agreement with a specific R-multigedi reference run, export
its iter_0 and load it into Python via
MultiGEDIModel.add_modality(..., init_state=ref_iter_0).
Tests
# CPU bit-identicality vs R reference (~1 minute)
python tests/test_cpu_vs_r.py
# GPU end-to-end (needs MULTIGEDI_GPU_LIB and a GPU device)
python tests/test_gpu_e2e.py
# GPU bit-identicality vs R at iter_10 (needs the v2 R fixture)
# Maintainer-only: regenerates the fixture (stored in tests/data/); ordinary contributors skip this.
python scripts/maintainer/migrate_tiny_1iter_to_v2.py
python tests/test_gpu_vs_r.py
Layout
multigedi/
├── pyproject.toml scikit-build-core + pybind11
├── CMakeLists.txt top-level CPU build
├── src/
│ ├── multigedi/ pure-Python API
│ │ ├── _core/ MultiGEDIModel class
│ │ ├── _gpu/ ctypes wrapper for libmultigedi_gpu
│ │ ├── tools/ tl.multigedi(), tl.pca, tl.umap, ...
│ │ ├── plotting/ pl.*
│ │ └── preprocessing/ pp.*
│ ├── _multigedi_cpp/ CPU C++ extension (pybind11, Eigen, OpenMP)
│ └── _multigedi_gpu/ CUDA library (separate CMake build)
├── notebooks/ end-user tutorials
├── tests/ pytest-style + R-parity tests
├── scripts/ data prep & one-off utilities
└── benchmarks/ performance harness
Credits
Algorithm and reference R implementation by the
multigedi authors. The CUDA backend
was originally a standalone library (cumultigedi) and is folded in here
under a single coherent name. Python port and packaging maintained in this
repository.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file multigedi-1.1.0.tar.gz.
File metadata
- Download URL: multigedi-1.1.0.tar.gz
- Upload date:
- Size: 75.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5650e35385f31b1ae9ddd24f3c2535fafc9260b4724306a6833f4dedeaa145dc
|
|
| MD5 |
da0769e0c61061ab9b10d0a32de65661
|
|
| BLAKE2b-256 |
7a3849e07311ea130793260e171ea7ce677f03ffb09fb253a3f307a0aba7ca57
|