Skip to main content

Rust-backed ComBat batch-effect correction for dense biological matrices

Project description

combaters

Python PyO3 Rust sva::ComBat License: MIT

中文文档

combaters is a Rust/PyO3 rewrite of Bioconductor sva::ComBat for dense ComBat batch-effect correction in Python. It keeps the familiar ComBat behavior while moving the numerical core into Rust for predictable packaging, memory use, and runtime performance.

The public matrix contract is row-major samples x features: values[sample * n_features + feature].

Python Compatibility

Release wheels target CPython 3.10 through 3.14. The extension is built with PyO3 abi3-py310; expand this range only after the pinned PyO3 version supports building against the newer Python minor version.

Python API

import numpy as np
from combaters import combat

values = np.asarray(..., dtype=np.float64).reshape((n_samples, n_features))
batch = np.asarray(...)
mod = np.asarray(..., dtype=np.float64).reshape((n_samples, n_covariates))

result = combat(values, batch, mod=mod, par_prior=True, mean_only=False, ref_batch=None)
adjusted = result["adjusted"]

All public matrix inputs use shape (n_samples, n_features): rows are samples and columns are features. The Python wrapper converts numeric arrays to C-contiguous float64 before entering the Rust core.

Parameters

Parameter Applies to Description
values combat, combat_frame Samples x features data. Lists, tuples, NumPy arrays, pandas DataFrame, and SciPy sparse matrices are accepted by combat; sparse matrices are densified. combat_frame requires a pandas DataFrame.
adata combat_anndata AnnData-like object read from adata.X or adata.layers[layer]; it is not mutated.
batch all One-dimensional labels of length n_samples. Strings, negative integers, categories, and strided arrays are accepted. combat_anndata also accepts an obs column name.
mod all Optional sample covariates with n_samples rows. Numeric columns are used directly; non-numeric DataFrame-like columns are dummy-coded with the first observed level dropped.
formula all Optional patsy formula for constructing mod; requires optional patsy support and mod data.
par_prior all True uses parametric empirical Bayes; False uses non-parametric empirical Bayes.
mean_only all True adjusts batch location only. Singleton batches and degenerate feature cases can also force effective mean-only behavior.
ref_batch all Optional reference batch using the original batch label. Reference-batch rows are returned unchanged.
layer combat_anndata Optional AnnData layer name. None reads adata.X.

Returns

All entry points return a dictionary with adjusted, n_samples, n_features, and report. combat returns a NumPy array for array-like input and preserves pandas labels when values is a DataFrame. combat_frame always returns adjusted as a DataFrame. combat_anndata returns a DataFrame when pandas and AnnData labels are available; otherwise it returns a NumPy array.

from combaters import combat_frame

result = combat_frame(values_df, batch_series)
adjusted_df = result["adjusted"]
from combaters import combat_anndata

result = combat_anndata(adata, "batch", layer=None)
adjusted = result["adjusted"]

A formula keyword can be used when patsy is installed:

result = combat(values, batch, mod=metadata, formula="~ age + C(treatment)")

Install combaters[ecosystem] to pull in the optional pandas and SciPy helpers.

Missing values in values are ignored during fitting and preserved in adjusted; infinite values are rejected. Features with zero variance inside any multi-sample batch are copied unchanged and reported in result["report"]["zero_variance_features"]. prior.plots and BPPARAM are not exposed; plotting is not implemented, and parallel execution is automatic inside the Rust core.

Parallel Execution

Parallelism is automatic inside the Rust core and is not a Python or R-style BPPARAM API. Small matrices stay on the serial path. Larger matrices use Rayon when the matrix has at least 65,536 cells and at least 64 independent feature-by-batch jobs.

The parallel loops write fixed output indices for feature selection, projection, posterior fitting, adjustment, and feature reinsertion, so results are deterministic for the same inputs. For operational testing only, COMBATERS_PARALLEL=off forces the serial path and COMBATERS_PARALLEL=parallel forces the parallel path; unset or auto keeps the size-based policy.

Rust Layout

  • crates/combaters-core: pure Rust ComBat core
  • src/lib.rs: thin PyO3 binding layer
  • combaters/: Python package wrapper

Citation

If you use combaters, cite the original ComBat method and the Bioconductor sva package that provides the reference sva::ComBat implementation:

@article{johnson2007combat,
  title = {Adjusting batch effects in microarray expression data using empirical Bayes methods},
  author = {Johnson, W. Evan and Li, Cheng and Rabinovic, Ariel},
  journal = {Biostatistics},
  volume = {8},
  number = {1},
  pages = {118--127},
  year = {2007},
  doi = {10.1093/biostatistics/kxj037}
}

@article{leek2012sva,
  title = {The sva package for removing batch effects and other unwanted variation in high-throughput experiments},
  author = {Leek, Jeffrey T. and Johnson, W. Evan and Parker, Hilary S. and Jaffe, Andrew E. and Storey, John D.},
  journal = {Bioinformatics},
  volume = {28},
  number = {6},
  pages = {882--883},
  year = {2012},
  doi = {10.1093/bioinformatics/bts034}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

combaters-0.1.0.tar.gz (75.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

combaters-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (480.2 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

File details

Details for the file combaters-0.1.0.tar.gz.

File metadata

  • Download URL: combaters-0.1.0.tar.gz
  • Upload date:
  • Size: 75.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for combaters-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bb9515526d810e029a6f88a5c5c0748a1e49b5f32fa4c120c940ec9a52f7eae6
MD5 9cb841ff7cf000f9c4275d3aa8ee2afe
BLAKE2b-256 7f77bb38ce4ddacc2242bca3fc013c0db6c3c95312117c1b699bc76b154d3091

See more details on using hashes here.

Provenance

The following attestation bundles were made for combaters-0.1.0.tar.gz:

Publisher: release.yml on wenjiudaijiugui/combaters

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file combaters-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for combaters-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d11d794a8ff636147e9dff3a80abc239f1e7cb16ba1737a1e0163053a782872c
MD5 3da43ed6321f270f4046d3606373926b
BLAKE2b-256 4336f019db7916e8587e8c01e704ad1563c0a4461e491407446b45626718e84d

See more details on using hashes here.

Provenance

The following attestation bundles were made for combaters-0.1.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on wenjiudaijiugui/combaters

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page