A VAE framework for batch effect correction in biological data
Project description
BioBatchNet
A dual-encoder VAE framework for batch effect correction in single-cell RNA-seq and Imaging Mass Cytometry (IMC) data.
Installation
pip install biobatchnet
For development:
git clone https://github.com/UoM-HealthAI/BioBatchNet
cd BioBatchNet
pip install -e .
Quick Start
import scanpy as sc
from biobatchnet import correct_batch_effects
adata = sc.read_h5ad('your_data.h5ad')
bio_emb, batch_emb = correct_batch_effects(
adata,
batch_key='BATCH',
data_type='imc', # 'imc' or 'seq'
)
adata.obsm['X_biobatchnet'] = bio_emb
# Visualize
sc.pp.neighbors(adata, use_rep='X_biobatchnet')
sc.tl.umap(adata)
sc.pl.umap(adata, color=['BATCH', 'celltype'])
See tutorial.ipynb for an interactive walkthrough.
Full Parameters
bio_emb, batch_emb = correct_batch_effects(
adata, # AnnData object
batch_key='BATCH', # obs column for batch labels
cell_type_key='celltype', # optional: obs column for cell types
data_type='imc', # 'imc' or 'seq' (seq auto-applies HVG + normalize + log1p)
latent_dim=20, # latent space dimension
epochs=100, # max training epochs
lr=1e-4, # learning rate
batch_size=128, # training batch size
device='auto', # 'auto', 'cuda', or 'cpu'
loss_weights=None, # optional: override default loss weights
)
Custom Loss Weights
Pass a dict to loss_weights to override defaults:
bio_emb, batch_emb = correct_batch_effects(
adata,
batch_key='BATCH',
data_type='imc',
loss_weights={
'discriminator': 0.1, # batch mixing strength (lower = more mixing)
'kl_bio': 0.01, # bio encoder regularization
},
)
| Parameter | IMC default | scRNA-seq default | Description |
|---|---|---|---|
recon |
10.0 | 10.0 | Reconstruction quality |
discriminator |
0.3 | 0.04 | Batch mixing strength |
classifier |
1.0 | 1.0 | Batch info retention |
kl_bio |
0.005 | 1e-6 | Bio encoder regularization |
kl_batch |
0.1 | 0.01 | Batch encoder regularization |
ortho |
0.01 | 0.0002 | Bio/batch orthogonality |
kl_size |
- | 0.002 | Size factor regularization (seq only) |
Config-based Training
For reproducible experiments using preset hyperparameters:
python -m biobatchnet.train --config <preset> --data <path_to_h5ad>
Available Presets
| Preset | Data type | Epochs |
|---|---|---|
damond |
IMC | 30 |
hoch |
IMC | 30 |
immucan |
IMC | 30 |
pancreas |
scRNA-seq | 50 |
macaque |
scRNA-seq | 50 |
lung |
scRNA-seq | 100 |
mousebrain |
scRNA-seq | 50 |
Override Parameters
python -m biobatchnet.train --config immucan --data data.h5ad \
--loss.discriminator 0.1 \
--loss.kl_bio 0.01 \
--trainer.epochs 50 \
--seed 42
Data Format
Input: AnnData object (.h5ad file) with:
adata.X: expression matrix (cells x features), dense or sparseadata.obs[batch_key]: batch labels (string or integer)adata.obs[cell_type_key](optional): cell type annotations
Note: For scRNA-seq data (
data_type='seq'), preprocessing is applied automatically: highly variable gene selection (top 2000), normalization (target sum 1e4), and log1p transform. IMC data is used as-is.
Output:
bio_embeddings:(n_cells, latent_dim)— batch-corrected representationsbatch_embeddings:(n_cells, latent_dim)— batch-specific information
Data
- IMC datasets: Bodenmiller Group IMC datasets
- scRNA-seq datasets: Google Drive
Citation
If you use BioBatchNet in your research, please cite:
Liu H, Zhang S, Mao S, et al. BioBatchNet: A Dual-Encoder Framework for Robust Batch Effect
Correction in Imaging Mass Cytometry[J]. bioRxiv, 2025: 2025.03.15.643447.
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file biobatchnet-0.2.0.tar.gz.
File metadata
- Download URL: biobatchnet-0.2.0.tar.gz
- Upload date:
- Size: 45.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ba88958f6d5678ff347e76b2af9415fe4b1454e314bfc969077e08621e678be
|
|
| MD5 |
0d79e48740123c67f3cfa55b67d71a58
|
|
| BLAKE2b-256 |
fd21a2e1de58739f29f84e5b0744af01a7fd0e04970452472bcbdc3b4a52f34c
|
File details
Details for the file biobatchnet-0.2.0-py3-none-any.whl.
File metadata
- Download URL: biobatchnet-0.2.0-py3-none-any.whl
- Upload date:
- Size: 27.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
788876c08bd28b1e9b364c4621cf33ff41c776ebb69dbba3ca89e5e4082642aa
|
|
| MD5 |
21cb45274ca8c1b4411f05b8ba46af34
|
|
| BLAKE2b-256 |
05209c12ff2b652087c01e2407673b5a370477f872dac950245677492bb6ddf7
|