Gaussian Mixture VAE for multi-modal biological module discovery in omics data.
Project description
BSVAE: Gaussian Mixture VAE for Module Discovery
BSVAE is a PyTorch package for feature-level module discovery in omics data. The main model, GMMModuleVAE, learns latent structure across features such as genes, transcripts, or proteins, then supports downstream network extraction, module assignment, latent analysis, and simulation-based benchmarking.
What The Package Provides
bsvae-trainfor training a two-phase GMM-VAEbsvae-sweep-kfor selecting the number of modules (K) with held-out validation and optional stability replicatesbsvae-networksfor post-training network extraction, module extraction, latent export, and latent analysisbsvae-simulatefor synthetic data generation, scenario-grid simulation, and module-recovery benchmarking
Installation
Install from PyPI:
pip install bsvae
Install from source:
git clone https://github.com/heart-gen/BSVAE.git
cd BSVAE
pip install -e .
Optional dependency:
anndatais only needed for.h5adinput or.h5adlatent export helpers
Input Data Contract
Training and most downstream commands expect an expression matrix in features x samples orientation:
- rows are features such as genes, transcripts, or proteins
- columns are sample IDs
- the first column is the feature index when using CSV/TSV
Supported loaders:
.csv/.csv.gz.tsv/.tsv.gz.h5/.hdf5.h5adwith optionalanndata
CLI Entry Points
bsvae-trainbsvae-sweep-kbsvae-networksbsvae-simulate
Quickstart
Train a small model:
bsvae-train pilot_run \
--dataset data/expression.csv \
--epochs 50 \
--n-modules 12 \
--latent-dim 16
Recommended production flow: select K first, then use the retrained final model.
bsvae-sweep-k sweep1 \
--dataset data/expression.csv \
--k-grid 8,12,16,24,32 \
--sweep-epochs 60 \
--stability-reps 5 \
--val-frac 0.1
This writes sweep outputs under results/sweep1/sweep_k/ and retrains the selected model under results/sweep1/final_k<K>/.
Extract module assignments from the final model:
bsvae-networks extract-modules \
--model-path results/sweep1/final_k16 \
--dataset data/expression.csv \
--output-dir results/sweep1/final_k16/modules \
--expr data/expression.csv \
--soft-eigengenes
Export latents:
bsvae-networks export-latents \
--model-path results/sweep1/final_k16 \
--dataset data/expression.csv \
--output results/sweep1/final_k16/latents
This command writes results/sweep1/final_k16/latents.npz containing mu, logvar, gamma, and feature_ids.
Extract feature-feature networks:
bsvae-networks extract-networks \
--model-path results/sweep1/final_k16 \
--dataset data/expression.csv \
--output-dir results/sweep1/final_k16/networks \
--methods mu_cosine gamma_knn \
--top-k 50
Run latent-space analysis:
bsvae-networks latent-analysis \
--model-path results/sweep1/final_k16 \
--dataset data/expression.csv \
--output-dir results/sweep1/final_k16/latent_analysis \
--kmeans-k 16 \
--umap
Training Outputs
bsvae-train writes to results/<experiment>/ by default:
model.ptspecs.jsontrain_losses.csvmodel-<epoch>.ptcheckpoints when--checkpoint-everyis greater than zero
bsvae-sweep-k writes:
results/<name>/sweep_k/sweep_results.csvresults/<name>/sweep_k/sweep_summary.jsonresults/<name>/sweep_k/k<K>/rep_<rep>/...per-sweep run outputsresults/<name>/final_k<K>/...for the final retrained model when--train-finalis enabled
Simulation and Benchmarking
Generate one synthetic dataset:
bsvae-simulate generate \
--output data/sim_expr.csv \
--save-ground-truth data/sim_truth.csv
Generate a scenario grid:
bsvae-simulate init-config --output sim.yaml
bsvae-simulate generate-grid \
--config sim.yaml \
--outdir results/sim_pub_v1 \
--reps 30 \
--base-seed 13
Validate the generated grid:
bsvae-simulate validate-grid --grid-dir results/sim_pub_v1
Each scenario replicate writes method-ready files under results/sim_pub_v1/scenarios/<scenario_id>/rep_<rep>/, including:
expr/features_x_samples.tsv.gzexpr/samples_x_features.tsv.gztruth/modules_hard.csvmethod_inputs.json
Documentation
See the docs for the full workflow and command reference:
docs/index.mddocs/quickstart.mddocs/tutorial.mddocs/cli.mddocs/networks.mddocs/hyperparameters.md
License
This project is licensed under the GNU General Public License v3.0.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bsvae-0.3.2.tar.gz.
File metadata
- Download URL: bsvae-0.3.2.tar.gz
- Upload date:
- Size: 86.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.0 CPython/3.10.9 Linux/4.18.0-553.22.1.el8_10.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2664f9fab2d932d59b5bd6f37a601d23dc1ae2761c848ebae8bcf8d44bb9194
|
|
| MD5 |
203d2403cf0da79a2f85a56b5ab8e690
|
|
| BLAKE2b-256 |
ea1b0b502862145188b1b39e0b247df096db94595cded4f789085b1becd63394
|
File details
Details for the file bsvae-0.3.2-py3-none-any.whl.
File metadata
- Download URL: bsvae-0.3.2-py3-none-any.whl
- Upload date:
- Size: 103.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.0 CPython/3.10.9 Linux/4.18.0-553.22.1.el8_10.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bb0afe101e6562bad66ed3e3857ec6b74f698d46f5350dc00a482c4f880b85e
|
|
| MD5 |
b79571f11fcbb6960dad823fe400945e
|
|
| BLAKE2b-256 |
de08b9c85488b25660871c13eb425f0869ae9b186a9f89981b2578d8dba166c6
|