Skip to main content

Gaussian Mixture VAE for multi-modal biological module discovery in omics data.

Project description

BSVAE: Gaussian Mixture VAE for Module Discovery

PyTorch Python License Documentation

BSVAE is a PyTorch package centered on GMMModuleVAE, a Gaussian-mixture variational autoencoder for feature-level module discovery in omics data.

What It Does

  • Trains a two-phase GMM-VAE (bsvae-train)
  • Extracts feature-feature networks from trained models (bsvae-networks)
  • Extracts module assignments and optional eigengenes (bsvae-networks)
  • Exports latents (mu, logvar, gamma) as .npz
  • Runs latent analysis (UMAP/t-SNE, clustering, covariate correlation) (bsvae-networks latent-analysis)
  • Simulates synthetic datasets and benchmarks module recovery (bsvae-simulate)

Installation

From PyPI:

pip install bsvae

From source:

git clone https://github.com/heart-gen/BSVAE.git
cd BSVAE
pip install -e .

CLI Entry Points

  • bsvae-train
  • bsvae-networks
  • bsvae-simulate

Quickstart

For a full walkthrough (minimal run, production run, post-training analysis, simulation benchmark, troubleshooting, and migration), see docs/tutorial.md.

Tune K (Recommended)

--n-modules (K) sets the expected number of modules. Recommended approach is bsvae-sweep-k with stability replicates:

bsvae-sweep-k sweep1 \
  --dataset data/expression.csv \
  --k-grid 8,12,16,24,32 \
  --sweep-epochs 60 \
  --stability-reps 5 \
  --val-frac 0.1

The selected model is retrained on the full dataset at: results/sweep1/final_k<K>/.

1. Train

Input matrix must be features x samples with feature IDs in row index and sample IDs in columns.

bsvae-train exp1 \
  --dataset data/expression.csv \
  --epochs 100 \
  --n-modules 20 \
  --latent-dim 32

2. Extract networks

bsvae-networks extract-networks \
  --model-path results/exp1 \
  --dataset data/expression.csv \
  --output-dir results/exp1/networks \
  --methods mu_cosine gamma_knn

3. Extract modules

bsvae-networks extract-modules \
  --model-path results/exp1 \
  --dataset data/expression.csv \
  --output-dir results/exp1/modules

4. Export latents

bsvae-networks export-latents \
  --model-path results/exp1 \
  --dataset data/expression.csv \
  --output results/exp1/latents.npz

5. Simulate and benchmark

Single dataset:

bsvae-simulate generate \
  --output data/sim_expr.csv \
  --save-ground-truth data/sim_truth.csv

bsvae-simulate benchmark \
  --dataset data/sim_expr.csv \
  --ground-truth data/sim_truth.csv \
  --model-path results/exp1 \
  --output results/exp1/sim_metrics.json

Scenario grid for publication-style benchmarking:

bsvae-simulate init-config --output sim.yaml

bsvae-simulate generate-grid \
  --config sim.yaml \
  --outdir results/sim_pub_v1 \
  --reps 30 \
  --base-seed 13

bsvae-simulate validate-grid --grid-dir results/sim_pub_v1

Each scenario replicate writes method-ready files under results/sim_pub_v1/scenarios/<scenario_id>/rep_<rep>/, including:

  • expr/features_x_samples.tsv.gz (BSVAE, GNVAE)
  • expr/samples_x_features.tsv.gz (WGCNA)
  • truth/modules_hard.csv (hard labels for ARI/NMI)
  • method_inputs.json (canonical paths for each method)

Training Outputs

bsvae-train writes to results/<experiment>/:

  • model.pt (weights)
  • specs.json (metadata and run args)
  • train_losses.csv (epoch/component losses)
  • model-<epoch>.pt checkpoints when --checkpoint-every is set

Data Formats

The loader supports:

  • .csv / .csv.gz
  • .tsv / .tsv.gz
  • .h5 / .hdf5
  • .h5ad (optional anndata dependency)

Python API (Minimal)

from bsvae.utils.modelIO import load_model
from bsvae.networks.extract_networks import create_dataloader_from_expression, run_extraction

model = load_model("results/exp1", is_gpu=False)
loader, feature_ids, _ = create_dataloader_from_expression("data/expression.csv", batch_size=128)
results = run_extraction(model, loader, feature_ids=feature_ids, methods=["mu_cosine"], top_k=50)
print(results[0].method, results[0].adjacency.shape)

License

This project is licensed under the GNU General Public License v3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bsvae-0.3.1.tar.gz (80.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bsvae-0.3.1-py3-none-any.whl (97.5 kB view details)

Uploaded Python 3

File details

Details for the file bsvae-0.3.1.tar.gz.

File metadata

  • Download URL: bsvae-0.3.1.tar.gz
  • Upload date:
  • Size: 80.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.0 CPython/3.10.9 Linux/4.18.0-553.22.1.el8_10.x86_64

File hashes

Hashes for bsvae-0.3.1.tar.gz
Algorithm Hash digest
SHA256 478e76280189220978fb1d846c769d71fad103bd8a9e5f8eb78d2c0115ffa42b
MD5 8b4512a2271f50e94801de3181d1b609
BLAKE2b-256 3b2283a6e91a9e121bd604b69fcd21044be710c59984106c616be3780919f526

See more details on using hashes here.

File details

Details for the file bsvae-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: bsvae-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 97.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.0 CPython/3.10.9 Linux/4.18.0-553.22.1.el8_10.x86_64

File hashes

Hashes for bsvae-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a056404c6d6a7fdb19cc01ad9baa3ce741f93521acec39a5b3c3b91159ebd760
MD5 f3556b6b19f3601344088c8dabd4e27e
BLAKE2b-256 b99094fd4f1d562298bf8b3806b487bdbc0cca7f4969c73cd211194192f0078b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page