Single-Cell Biological Insights via Optimal Transport and Omics Transformers

These details have not been verified by PyPI

Project links

Project description

scBIOT

scBIOT is a lightweight Python library for single-cell omics integration. It bundles the preprocessing, embedding, transfer label workflows we routinely apply to RNA, ATAC, and paired or unpaired multi-omics datasets. The library emphasizes reproducible data preparation, single-cell clustering using embeddings derived from optimal transport and Transformer-based VAEs, and concise APIs that work out of the box on AnnData data.

Highlights

Batteries-included preprocessing: scATAC-seq peak processing, iterative LSI, and gene activity annotation.
Accurate atlas integration: high-fidelity alignment with rare cell-type protection.
Unified scBIOT framework: a single framework for embedding RNA, ATAC, transfer learning, and paired or unpaired multi-omics.
Fast integration via Optimal Transport (OT): scalable alignment for large single-cell datasets.
Transformer-VAE: further enhanced integration for stronger representation learning and improved robustness.
Scales to 100M cells locally: memory-efficent scalable processing.
Label transfer: across multi-omics modalities and between spatial data and scRNA-seq references.

Installation

pip install scbiot

For documentation builds install pip install scbiot[docs].

Optional extras

Depending on your workflow you can pull in heavier scientific stacks as extras:

pip install scbiot installs the CUDA-enabled FAISS + PyTorch combo (CUDA 12) faiss-gpu-cu12 scib_metrics==0.5.1 leidenalg jaxlib scikit-misc "jax[cuda12]" pyranges.

For an exact replica of our Conda dev environment use pip install -r requirements.txt inside a fresh virtual environment.

Quick start

Detailed documentation is published on scbiot.readthedocs.io and mirrors the examples below.
Refer to examples/ folder for a runnable end-to-end notebook-friendly script.

import numpy as np
import pandas as pd
import scbiot as scb
import scanpy as sc


adata = sc.datasets.pbmc3k()

sc.pp.highly_variable_genes(adata, n_top_genes=2000, flavor="seurat_v3", batch_key='batch')
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.scale(adata)
sc.tl.pca(adata, n_comps=50, use_highly_variable=True)

adata, metrics = scb.ot.integrate(adata, preset='rna', obsm_key='X_pca', batch_key='batch', out_key='X_ot')
print(metrics)

sc.pp.neighbors(adata, use_rep='X_ot')
sc.tl.umap(adata)
sc.tl.leiden(adata, resolution=0.8, key_added='leiden_X_ot')

scb.models.setup_anndata(adata, var_key='X_ot', batch_key='batch', true_key=None)
model = scb.models.vae(adata, verbose=True)
model.train()

SCBIOT_LATENT_KEY = "scBIOT"
adata.obsm[SCBIOT_LATENT_KEY] = model.get_latent_representation(n_compoents=50, svd_solver='arpack', random_state=42)

sc.pp.neighbors(adata, use_rep=SCBIOT_LATENT_KEY)
sc.tl.umap(adata)
sc.tl.leiden(adata, resolution=0.8, key_added=f'leiden_{SCBIOT_LATENT_KEY}')

For stable tuning, use the meta-parameter interface:

adata, metrics = scb.ot.integrate(
    adata,
    preset="rna",
    epsilon=0.03,
    tau=0.40,
    knn_scale=1.0,
    batch_strength=1.0,
    gate_temperature=1.0,
    # optional supervision:
    label_key="semi_cell_type",
    unlabeled_category="Unknown",
    sup_strength=0.10,
)

Scaling options

For ultra-large datasets, use centroid-level OT:

adata, metrics = scb.ot.integrate(
    adata,
    preset="centroid",
    obsm_key="X_pca",
    batch_key="batch",
    out_key="scBIOT",
)

You can also enable centroid OT while keeping another preset's OT hyperparameters via centroid_ot=True.

For a faster approximate OT run on large datasets, enable the approximate OT solver while keeping your preset's data keys:

adata, metrics = scb.ot.integrate(
    adata,
    preset="atac",
    obsm_key="X_lsi",
    batch_key="batchname_all",
    out_key="X_ot",
    approximate_ot=True,
)

To process snATAC-seq dataset

# Usage
adata_top = scb.pp.remove_promoter_proximal_peaks(
    adata_atac,
    f"{dir}/inputs/gencode.vM25.chr_patch_hapl_scaff.annotation.gtf.gz"    
)

# Peak selection
scb.pp.find_variable_features(adata_top, batch_key="batchname_all")

# TF-IDF
scb.pp.add_iterative_lsi(adata_top, n_components=31, drop_first_component=True, add_key="X_lsi")

# Save back
adata.obsm["X_lsi"] = adata_top.obsm["X_lsi"]
adata.obsm["Unintegrated"] = adata_top.obsm["X_lsi"]

# Optimal transport
adata, metrics = scb.ot.integrate(
    adata,
    preset='atac',
    obsm_key="X_lsi",
    batch_key="batchname_all",
    out_key="X_ot",
    reference="largest",  
    
)
print(metrics)

# 1. Compute neighbors using Harmony-corrected PCA
sc.pp.neighbors(adata, use_rep='X_ot', metric='cosine')
sc.tl.umap(adata)
sc.tl.leiden(adata, resolution=0.02, key_added='leiden_X_ot')

# Model training
scb.models.setup_anndata(adata, var_key='X_ot', batch_key='batchname_all', true_key=None)
model = scb.models.vae(adata, prior_pcr=5., verbose=True)
model.train()
SCBIOT_LATENT_KEY = "scBIOT"
adata.obsm[SCBIOT_LATENT_KEY] = model.get_latent_representation(n_compoents=30, svd_solver='arpack', random_state=42)

sc.pp.neighbors(adata, use_rep=SCBIOT_LATENT_KEY)
sc.tl.umap(adata)
sc.tl.leiden(adata, resolution=0.8, key_added=f'leiden_{SCBIOT_LATENT_KEY}')

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.8

Mar 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scbiot-1.1.8.tar.gz (73.9 MB view details)

Uploaded Mar 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scbiot-1.1.8-py3-none-any.whl (115.5 kB view details)

Uploaded Mar 5, 2026 Python 3

File details

Details for the file scbiot-1.1.8.tar.gz.

File metadata

Download URL: scbiot-1.1.8.tar.gz
Upload date: Mar 5, 2026
Size: 73.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for scbiot-1.1.8.tar.gz
Algorithm	Hash digest
SHA256	`6ebb878b664f27efdf4d988fdcf0e3d69531914da670ea08ea6736650f4a6f8b`
MD5	`5f641fafa5e37f5d844ed10b3d3308a2`
BLAKE2b-256	`c74e195bfafb0dca7d2cc4303a5612d6fd54ba5b324064250db6c235615883c7`

See more details on using hashes here.

File details

Details for the file scbiot-1.1.8-py3-none-any.whl.

File metadata

Download URL: scbiot-1.1.8-py3-none-any.whl
Upload date: Mar 5, 2026
Size: 115.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for scbiot-1.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab64196bd126f6a78d514fd2ef133654b8da10076cc3b0f55af49b9d6c5f9549`
MD5	`e7d78391a55598e4d1b5700ced27a7eb`
BLAKE2b-256	`2dd526938b77b530e5e118b28ee40b051820af58d609d9ea422f64b7bb45c969`

See more details on using hashes here.

scbiot 1.1.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

scBIOT

Highlights

Installation

Optional extras

Quick start

Scaling options

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes