Skip to main content

A VAE framework for batch effect correction in biological data

Project description

BioBatchNet

PyPI version Python 3.9+ License: MIT

BioBatchNet is a VAE framework for batch effect correction in biological data, supporting both single-cell RNA-seq (scRNA-seq) and Imaging Mass Cytometry (IMC) data.


Features

  • Multi-modal Support: Works with both scRNA-seq and IMC data
  • Easy-to-Use API: One-line batch correction with correct_batch_effects()
  • Flexible Architecture: Customizable neural network parameters
  • Adaptive Loss Weights: Automatically adjusts based on dataset characteristics
  • Comprehensive Documentation: Detailed usage examples and interactive tutorials

Installation

Create Environment (Required for All Users)

conda env create -f environment.yml
conda activate biobatchnet

Install BioBatchNet

For Users (Recommended):

pip install biobatchnet

For Development:

git clone https://github.com/UoM-HealthAI/BioBatchNet
cd BioBatchNet
pip install -e .

Usage

Python API (Recommended for Users)

The simplest way to use BioBatchNet is through the high-level API:

import pandas as pd
import numpy as np
import anndata as ad
from biobatchnet import correct_batch_effects

# Load your data
adata = ad.read_h5ad('your_data.h5ad')
X = adata.X.toarray() if hasattr(adata.X, 'toarray') else adata.X

# Prepare batch labels (must be integers)
unique_batches = np.unique(adata.obs['BATCH'].values)
batch_to_int = {batch: i for i, batch in enumerate(unique_batches)}
batch_labels = np.array([batch_to_int[b] for b in adata.obs['BATCH'].values])

# Correct batch effects
bio_embeddings, batch_embeddings = correct_batch_effects(
    data=pd.DataFrame(X),
    batch_info=pd.DataFrame({'BATCH': batch_labels}),
    batch_key='BATCH',
    data_type='imc',        # 'imc' or 'scrna'
    latent_dim=20,
    epochs=100,
    device='cuda'           # or 'cpu'
)

# Add embeddings to AnnData
adata.obsm['X_biobatchnet'] = bio_embeddings

For detailed documentation and examples:
(Note: PyPI pages do not support previewing relative links from the repository, so absolute links and nbviewer previews are provided here. If you cannot open the original relative links on PyPI, this is expected.)

Config-based Training (For Development/Research)

For reproducing research results or training with specific configurations:

# For IMC data
python biobatchnet/IMC.py --config biobatchnet/config/IMC/IMMUcan.yaml

# For scRNA-seq data
python biobatchnet/Gene.py --config biobatchnet/config/scRNA/pancreas.yaml

Configuration files:

  • IMC datasets: biobatchnet/config/IMC/
  • scRNA-seq datasets: biobatchnet/config/scRNA/

These scripts expect datasets under Data/ directory (see YAML files for exact paths).


CPC Usage

To use CPC, ensure you are running in the same environment as BioBatchNet.

All experiment results can be found in the following directory:

cd CPC/IMC_experiment

✅ Key Notes:

  • CPC requires embeddings from BioBatchNet as input
  • Sample data includes batch-corrected IMMUcan IMC embeddings
  • Ensure the same computational environment as BioBatchNet before running CPC

Data

Download scRNA-seq Data:

Download IMC Data:

The IMC dataset can be accessed from the Bodenmiller Group IMC datasets repository. Visit the link below to explore and download the datasets:

🔗 IMC Datasets - Bodenmiller Group


Citation

If you use BioBatchNet in your research, please cite:

Liu H, Zhang S, Mao S, et al. BioBatchNet: A Dual-Encoder Framework for Robust Batch Effect Correction in Imaging Mass Cytometry[J]. bioRxiv, 2025: 2025.03.15.643447.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biobatchnet-0.1.10.tar.gz (8.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

biobatchnet-0.1.10-py3-none-any.whl (32.7 kB view details)

Uploaded Python 3

File details

Details for the file biobatchnet-0.1.10.tar.gz.

File metadata

  • Download URL: biobatchnet-0.1.10.tar.gz
  • Upload date:
  • Size: 8.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for biobatchnet-0.1.10.tar.gz
Algorithm Hash digest
SHA256 0cc89271dbd14344c211ed2db8968adce417b4c4c5739f2d96b9f9dab691406a
MD5 eca2eb25cfc3c800820bdfd37763087a
BLAKE2b-256 26d9832c5ece774623385eccf160e1094b71c798a5fd53dcbf91d72577480271

See more details on using hashes here.

File details

Details for the file biobatchnet-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: biobatchnet-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 32.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for biobatchnet-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 bbaa95e6345abf061a80e0eb56c441830907e7499e0c13c39d3226073a101f7a
MD5 81fb62f620ffb97bdf20681bd85153cc
BLAKE2b-256 80ddd54299fc9f13c9475b9a0b9db55a9eaed834cbec376ec89f312a9ae6bf02

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page