A VAE framework for batch effect correction in biological data
Project description
BioBatchNet
BioBatchNet is a VAE framework for batch effect correction in biological data, supporting both single-cell RNA-seq (scRNA-seq) and Imaging Mass Cytometry (IMC) data.
Features
- Multi-modal Support: Works with both scRNA-seq and IMC data
- Easy-to-Use API: One-line batch correction with
correct_batch_effects() - Flexible Architecture: Customizable neural network parameters
- Adaptive Loss Weights: Automatically adjusts based on dataset characteristics
- Comprehensive Documentation: Detailed usage examples and interactive tutorials
Installation
Create Environment (Required for All Users)
conda env create -f environment.yml
conda activate biobatchnet
Install BioBatchNet
For Users (Recommended):
pip install biobatchnet
For Development:
git clone https://github.com/Manchester-HealthAI/BioBatchNet
cd BioBatchNet
pip install -e .
Usage
Python API (Recommended for Users)
The simplest way to use BioBatchNet is through the high-level API:
import pandas as pd
import numpy as np
import anndata as ad
from biobatchnet import correct_batch_effects
# Load your data
adata = ad.read_h5ad('your_data.h5ad')
X = adata.X.toarray() if hasattr(adata.X, 'toarray') else adata.X
# Prepare batch labels (must be integers)
unique_batches = np.unique(adata.obs['BATCH'].values)
batch_to_int = {batch: i for i, batch in enumerate(unique_batches)}
batch_labels = np.array([batch_to_int[b] for b in adata.obs['BATCH'].values])
# Correct batch effects
bio_embeddings, batch_embeddings = correct_batch_effects(
data=pd.DataFrame(X),
batch_info=pd.DataFrame({'BATCH': batch_labels}),
batch_key='BATCH',
data_type='imc', # 'imc' or 'scrna'
latent_dim=20,
epochs=100,
device='cuda' # or 'cpu'
)
# Add embeddings to AnnData
adata.obsm['X_biobatchnet'] = bio_embeddings
For detailed documentation and examples:
- 📖 USAGE.md - Complete API documentation and parameter guide
- 📓 tutorial.ipynb - Interactive tutorial with three usage patterns
Config-based Training (For Development/Research)
For reproducing research results or training with specific configurations:
# For IMC data
python biobatchnet/IMC.py --config biobatchnet/config/IMC/IMMUcan.yaml
# For scRNA-seq data
python biobatchnet/Gene.py --config biobatchnet/config/scRNA/pancreas.yaml
Configuration files:
- IMC datasets:
biobatchnet/config/IMC/ - scRNA-seq datasets:
biobatchnet/config/scRNA/
These scripts expect datasets under Data/ directory (see YAML files for exact paths).
CPC Usage
To use CPC, ensure you are running in the same environment as BioBatchNet.
All experiment results can be found in the following directory:
cd CPC/IMC_experiment
✅ Key Notes:
- CPC requires embeddings from BioBatchNet as input
- Sample data includes batch-corrected IMMUcan IMC embeddings
- Ensure the same computational environment as BioBatchNet before running CPC
Data
Download scRNA-seq Data:
- Available on Google Drive: Download Link
Download IMC Data:
The IMC dataset can be accessed from the Bodenmiller Group IMC datasets repository. Visit the link below to explore and download the datasets:
🔗 IMC Datasets - Bodenmiller Group
Citation
If you use BioBatchNet in your research, please cite:
Liu H, Zhang S, Mao S, et al. BioBatchNet: A Dual-Encoder Framework for Robust Batch Effect Correction in Imaging Mass Cytometry[J]. bioRxiv, 2025: 2025.03.15.643447.
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file biobatchnet-0.1.8.tar.gz.
File metadata
- Download URL: biobatchnet-0.1.8.tar.gz
- Upload date:
- Size: 8.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1835a29bd58d117fb6b39fd920a147dbdfd82099e331b5b7040e98c7a331d50f
|
|
| MD5 |
168e4d626a19c70b551bc09061caca32
|
|
| BLAKE2b-256 |
637f4b93de4f57e75ca2c887c1c5e368514c1f08c9623010c006f587d607452f
|
File details
Details for the file biobatchnet-0.1.8-py3-none-any.whl.
File metadata
- Download URL: biobatchnet-0.1.8-py3-none-any.whl
- Upload date:
- Size: 32.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d093fd65b65af4a16f7bc4e7bf3aaf67b6e7948cf90c9ddc534d4f67c4a6b81
|
|
| MD5 |
7eed3126c0f1aa3e5d74b9462697950f
|
|
| BLAKE2b-256 |
2b1ddc850f95b727f3974eb6fc47aae2894098218b69fa788dced29ade57ce19
|