Skip to main content

A VAE framework for batch effect correction in biological data

Project description

BioBatchNet

PyPI version Python 3.9+ License: MIT

A dual-encoder VAE framework for batch effect correction in single-cell RNA-seq and Imaging Mass Cytometry (IMC) data.


Installation

pip install biobatchnet

For development:

git clone https://github.com/UoM-HealthAI/BioBatchNet
cd BioBatchNet
pip install -e .

Quick Start

import scanpy as sc
from biobatchnet import correct_batch_effects

adata = sc.read_h5ad('your_data.h5ad')

bio_emb, batch_emb = correct_batch_effects(
    adata,
    batch_key='BATCH',
    data_type='imc',    # 'imc' or 'seq'
)

adata.obsm['X_biobatchnet'] = bio_emb

# Visualize
sc.pp.neighbors(adata, use_rep='X_biobatchnet')
sc.tl.umap(adata)
sc.pl.umap(adata, color=['BATCH', 'celltype'])

See tutorial.ipynb for an interactive walkthrough.


Full Parameters

bio_emb, batch_emb = correct_batch_effects(
    adata,                      # AnnData object
    batch_key='BATCH',          # obs column for batch labels
    cell_type_key='celltype',   # optional: obs column for cell types
    data_type='imc',            # 'imc' or 'seq' (seq auto-applies HVG + normalize + log1p)
    latent_dim=20,              # latent space dimension
    epochs=100,                 # max training epochs
    lr=1e-4,                    # learning rate
    batch_size=128,             # training batch size
    device='auto',              # 'auto', 'cuda', or 'cpu'
    loss_weights=None,          # optional: override default loss weights
)

Custom Loss Weights

Pass a dict to loss_weights to override defaults:

bio_emb, batch_emb = correct_batch_effects(
    adata,
    batch_key='BATCH',
    data_type='imc',
    loss_weights={
        'discriminator': 0.1,   # batch mixing strength (lower = more mixing)
        'kl_bio': 0.01,         # bio encoder regularization
    },
)
Parameter IMC default scRNA-seq default Description
recon 10.0 10.0 Reconstruction quality
discriminator 0.3 0.04 Batch mixing strength
classifier 1.0 1.0 Batch info retention
kl_bio 0.005 1e-6 Bio encoder regularization
kl_batch 0.1 0.01 Batch encoder regularization
ortho 0.01 0.0002 Bio/batch orthogonality
kl_size - 0.002 Size factor regularization (seq only)

Config-based Training

For reproducible experiments using preset hyperparameters:

python -m biobatchnet.train --config <preset> --data <path_to_h5ad>

Available Presets

Preset Data type Epochs
damond IMC 30
hoch IMC 30
immucan IMC 30
pancreas scRNA-seq 50
macaque scRNA-seq 50
lung scRNA-seq 100
mousebrain scRNA-seq 50

Override Parameters

python -m biobatchnet.train --config immucan --data data.h5ad \
    --loss.discriminator 0.1 \
    --loss.kl_bio 0.01 \
    --trainer.epochs 50 \
    --seed 42

Data Format

Input: AnnData object (.h5ad file) with:

  • adata.X: expression matrix (cells x features), dense or sparse
  • adata.obs[batch_key]: batch labels (string or integer)
  • adata.obs[cell_type_key] (optional): cell type annotations

Note: For scRNA-seq data (data_type='seq'), preprocessing is applied automatically: highly variable gene selection (top 2000), normalization (target sum 1e4), and log1p transform. IMC data is used as-is.

Output:

  • bio_embeddings: (n_cells, latent_dim) — batch-corrected representations
  • batch_embeddings: (n_cells, latent_dim) — batch-specific information

Data


Citation

If you use BioBatchNet in your research, please cite:

Liu H, Zhang S, Mao S, et al. BioBatchNet: A Dual-Encoder Framework for Robust Batch Effect
Correction in Imaging Mass Cytometry[J]. bioRxiv, 2025: 2025.03.15.643447.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biobatchnet-0.2.0.tar.gz (45.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

biobatchnet-0.2.0-py3-none-any.whl (27.2 kB view details)

Uploaded Python 3

File details

Details for the file biobatchnet-0.2.0.tar.gz.

File metadata

  • Download URL: biobatchnet-0.2.0.tar.gz
  • Upload date:
  • Size: 45.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for biobatchnet-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4ba88958f6d5678ff347e76b2af9415fe4b1454e314bfc969077e08621e678be
MD5 0d79e48740123c67f3cfa55b67d71a58
BLAKE2b-256 fd21a2e1de58739f29f84e5b0744af01a7fd0e04970452472bcbdc3b4a52f34c

See more details on using hashes here.

File details

Details for the file biobatchnet-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: biobatchnet-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 27.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for biobatchnet-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 788876c08bd28b1e9b364c4621cf33ff41c776ebb69dbba3ca89e5e4082642aa
MD5 21cb45274ca8c1b4411f05b8ba46af34
BLAKE2b-256 05209c12ff2b652087c01e2407673b5a370477f872dac950245677492bb6ddf7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page