Skip to main content

Gaussian Mixture VAE for multi-modal biological module discovery in omics data.

Project description

BSVAE: Gaussian Mixture VAE for Module Discovery

PyTorch Python License Documentation

BSVAE is a PyTorch package for feature-level module discovery in omics data. The main model, GMMModuleVAE, learns latent structure across features such as genes, transcripts, or proteins, then supports downstream network extraction, module assignment, latent analysis, and simulation-based benchmarking.

What The Package Provides

  • bsvae-train for training a two-phase GMM-VAE
  • bsvae-sweep-k for selecting the number of modules (K) with held-out validation and optional stability replicates
  • bsvae-networks for post-training network extraction, module extraction, latent export, and latent analysis
  • bsvae-simulate for synthetic data generation, scenario-grid simulation, and module-recovery benchmarking

Installation

Install from PyPI:

pip install bsvae

Install from source:

git clone https://github.com/heart-gen/BSVAE.git
cd BSVAE
pip install -e .

Optional dependency:

  • anndata is only needed for .h5ad input or .h5ad latent export helpers

Input Data Contract

Training and most downstream commands expect an expression matrix in features x samples orientation:

  • rows are features such as genes, transcripts, or proteins
  • columns are sample IDs
  • the first column is the feature index when using CSV/TSV

Supported loaders:

  • .csv / .csv.gz
  • .tsv / .tsv.gz
  • .h5 / .hdf5
  • .h5ad with optional anndata

CLI Entry Points

  • bsvae-train
  • bsvae-sweep-k
  • bsvae-networks
  • bsvae-simulate

Quickstart

Train a small model:

bsvae-train pilot_run \
  --dataset data/expression.csv \
  --epochs 50 \
  --n-modules 12 \
  --latent-dim 16

Recommended production flow: select K first, then use the retrained final model.

bsvae-sweep-k sweep1 \
  --dataset data/expression.csv \
  --k-grid 8,12,16,24,32 \
  --sweep-epochs 60 \
  --stability-reps 5 \
  --val-frac 0.1

This writes sweep outputs under results/sweep1/sweep_k/ and retrains the selected model under results/sweep1/final_k<K>/.

Extract module assignments from the final model:

bsvae-networks extract-modules \
  --model-path results/sweep1/final_k16 \
  --dataset data/expression.csv \
  --output-dir results/sweep1/final_k16/modules \
  --expr data/expression.csv \
  --soft-eigengenes

Export latents:

bsvae-networks export-latents \
  --model-path results/sweep1/final_k16 \
  --dataset data/expression.csv \
  --output results/sweep1/final_k16/latents

This command writes results/sweep1/final_k16/latents.npz containing mu, logvar, gamma, and feature_ids.

Extract feature-feature networks:

bsvae-networks extract-networks \
  --model-path results/sweep1/final_k16 \
  --dataset data/expression.csv \
  --output-dir results/sweep1/final_k16/networks \
  --methods mu_cosine gamma_knn \
  --top-k 50

Run latent-space analysis:

bsvae-networks latent-analysis \
  --model-path results/sweep1/final_k16 \
  --dataset data/expression.csv \
  --output-dir results/sweep1/final_k16/latent_analysis \
  --kmeans-k 16 \
  --umap

Training Outputs

bsvae-train writes to results/<experiment>/ by default:

  • model.pt
  • specs.json
  • train_losses.csv
  • model-<epoch>.pt checkpoints when --checkpoint-every is greater than zero

bsvae-sweep-k writes:

  • results/<name>/sweep_k/sweep_results.csv
  • results/<name>/sweep_k/sweep_summary.json
  • results/<name>/sweep_k/k<K>/rep_<rep>/... per-sweep run outputs
  • results/<name>/final_k<K>/... for the final retrained model when --train-final is enabled

Simulation and Benchmarking

Generate one synthetic dataset:

bsvae-simulate generate \
  --output data/sim_expr.csv \
  --save-ground-truth data/sim_truth.csv

Generate a scenario grid:

bsvae-simulate init-config --output sim.yaml

bsvae-simulate generate-grid \
  --config sim.yaml \
  --outdir results/sim_pub_v1 \
  --reps 30 \
  --base-seed 13

Validate the generated grid:

bsvae-simulate validate-grid --grid-dir results/sim_pub_v1

Each scenario replicate writes method-ready files under results/sim_pub_v1/scenarios/<scenario_id>/rep_<rep>/, including:

  • expr/features_x_samples.tsv.gz
  • expr/samples_x_features.tsv.gz
  • truth/modules_hard.csv
  • method_inputs.json

Documentation

See the docs for the full workflow and command reference:

  • docs/index.md
  • docs/quickstart.md
  • docs/tutorial.md
  • docs/cli.md
  • docs/networks.md
  • docs/hyperparameters.md

License

This project is licensed under the GNU General Public License v3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bsvae-0.3.2.tar.gz (86.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bsvae-0.3.2-py3-none-any.whl (103.6 kB view details)

Uploaded Python 3

File details

Details for the file bsvae-0.3.2.tar.gz.

File metadata

  • Download URL: bsvae-0.3.2.tar.gz
  • Upload date:
  • Size: 86.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.0 CPython/3.10.9 Linux/4.18.0-553.22.1.el8_10.x86_64

File hashes

Hashes for bsvae-0.3.2.tar.gz
Algorithm Hash digest
SHA256 c2664f9fab2d932d59b5bd6f37a601d23dc1ae2761c848ebae8bcf8d44bb9194
MD5 203d2403cf0da79a2f85a56b5ab8e690
BLAKE2b-256 ea1b0b502862145188b1b39e0b247df096db94595cded4f789085b1becd63394

See more details on using hashes here.

File details

Details for the file bsvae-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: bsvae-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 103.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.0 CPython/3.10.9 Linux/4.18.0-553.22.1.el8_10.x86_64

File hashes

Hashes for bsvae-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0bb0afe101e6562bad66ed3e3857ec6b74f698d46f5350dc00a482c4f880b85e
MD5 b79571f11fcbb6960dad823fe400945e
BLAKE2b-256 de08b9c85488b25660871c13eb425f0869ae9b186a9f89981b2578d8dba166c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page