cnvturbo: GPU/Numba-accelerated scRNA-seq CNV inference with HMM i6 cell-level tumor calling and R inferCNV-compatible raw-count pipeline. Fully compatible with Scanpy/AnnData. 10-100x faster than alternatives.
Project description
cnvturbo
cnvturbo — A Python re-implementation of R inferCNV for single-cell RNA-seq copy-number variation analysis. Algorithmically faithful to R inferCNV's HMM i6 pipeline, ~100× faster, and fully integrated with the Scanpy / AnnData ecosystem.
Rewritten in pure Python with R-exact algorithm alignment (hspike emission calibration, gene-level Viterbi in copy-ratio space, R-equivalent denoise + subcluster Tumor calling), plus Numba/CUDA-accelerated kernels.
Why cnvturbo?
| Feature | R inferCNV | infercnvpy | cnvturbo |
|---|---|---|---|
| Cell-level Tumor/Normal HMM | ✓ | ✗ (cluster score only) | ✓ |
| HMM i6 + hspike emission | ✓ | ✗ | ✓ (analytic + MAD-robust) |
| Per-chromosome Viterbi (copy-ratio) | ✓ | ✗ | ✓ |
| Denoise (segment-length filter) | ✓ | ✗ | ✓ |
| Reference subcluster handling | ✓ | partial | ✓ |
| GPU / Numba acceleration | ✗ | ✗ | ✓ |
| Runtime (P12, 7,269 cells) | ~5 hr | ~9 min | ~86 s |
| Cell-level concordance with R | 1.000 (ref) | 0.81 | 1.000 |
Verified on 3 PDAC samples (15,135 cells total): cell-level Tumor/Normal classification 100% identical to R inferCNV's HMM output, while running 100–200× faster. See Benchmark below.
Installation
From PyPI (recommended)
pip install cnvturbo
With acceleration backends
# CPU acceleration (Numba)
pip install "cnvturbo[hmm-cpu]"
# GPU acceleration (PyTorch)
pip install "cnvturbo[hmm-gpu]"
# All accelerators + EM fitting
pip install "cnvturbo[hmm]"
Development install
git clone https://github.com/LogicByteCraft/cnvturbo.git
cd cnvturbo
pip install -e ".[dev,test]"
Requirements
- Python ≥ 3.10
scanpy ≥ 1.10,anndata ≥ 0.7.3,numpy ≥ 1.20,pandas ≥ 1- Optional:
numba ≥ 0.57(CPU),torch ≥ 2.0(GPU),hmmlearn ≥ 0.3(EM)
Quick start
import scanpy as sc
import cnvturbo
from cnvturbo import tl as cnv_tl, pl as cnv_pl
adata = sc.read_h5ad("my_sample.h5ad")
adata.layers["counts"] = adata.X.copy()
cnv_tl.infercnv_r_compat(
adata,
raw_layer="counts",
reference_key="cell_type",
reference_cat=["NK", "Endothelial", "Fibroblast"],
window_size=101,
min_mean_expr_cutoff=0.1, # R inferCNV default for 10x; use 1.0 for Smart-seq2
apply_2x_transform=True,
n_jobs=16,
)
emit_means, emit_stds = cnv_tl.compute_hspike_emission_params(
adata,
raw_layer="counts",
reference_key="cell_type",
reference_cat=["NK", "Endothelial", "Fibroblast"],
min_mean_expr_cutoff=0.1, # 必须与 infercnv_r_compat 保持一致
output_space="copy_ratio",
)
cnv_tl.hmm_call_subclusters(
adata,
use_rep="cnv",
reference_key="cell_type",
reference_cat=["NK", "Endothelial", "Fibroblast"],
precomputed_emit_means=emit_means,
precomputed_emit_stds=emit_stds,
leiden_resolution="auto",
cluster_by_groups=True,
min_segment_length=5,
min_segments_for_tumor=1,
key_added="cnv_call",
n_jobs=16,
)
print(adata.obs["cnv_call"].value_counts())
After this, adata.obs["cnv_call"] contains "Tumor" / "Normal" per cell, and adata.obs["cnv_call_score"] carries a continuous CNV burden score (mean(|X_cnv − 1.0|) in copy-ratio space).
Detailed usage
1. Prepare AnnData
cnvturbo requires:
- Raw integer counts in
adata.Xoradata.layers["counts"]. - Gene coordinates in
adata.var: columnschromosome,start,end. - A reference annotation in
adata.obs: a column identifying normal cells (e.g., NK / Endothelial / Fibroblast).
Add gene coordinates from a GTF:
from cnvturbo.io import genomic_position_from_gtf
genomic_position_from_gtf(
gtf_file="Homo_sapiens.GRCh38.110.gtf.gz",
adata=adata,
)
2. R-compatible preprocessing (infercnv_r_compat)
Reproduces R inferCNV's pipeline exactly:
- Low-expression gene filter —
mean(raw_count) < min_mean_expr_cutoff(Rrequire_above_min_mean_expr_cutoff; 10x default0.1, Smart-seq21.0) - Library-size normalization → median depth
log2(x + 1)- First reference subtraction (gene-space, "bounds" mode)
- Clip to ±3 (default)
- Per-chromosome same-length pyramid smoothing (window=101)
- Per-cell median centering
- Second reference subtraction (gene-space)
2^x→ copy-ratio (neutral ≈ 1.0)
cnv_tl.infercnv_r_compat(
adata,
raw_layer="counts",
reference_key="cell_type",
reference_cat=["NK", "Endothelial"],
max_ref_threshold=3.0,
window_size=101,
exclude_chromosomes=("chrX", "chrY"),
min_mean_expr_cutoff=0.1, # R inferCNV default for 10x; set 1.0 for Smart-seq2; 0 to disable
apply_2x_transform=True,
n_jobs=16,
key_added="cnv",
)
Output:
adata.obsm["X_cnv"]—(n_cells × n_genes_filtered)copy-ratio matrixadata.uns["cnv"]["chr_pos"]— gene-level chromosome offsetsadata.uns["cnv"]["kept_var_names"]— originalvar_namesthat survivedmin_mean_expr_cutoff+chrX/chrYexclusion (matchesobsm["X_cnv"]columns)adata.uns["cnv"]["min_mean_expr_cutoff"]— actual cutoff applied (provenance)
3. hspike emission calibration (compute_hspike_emission_params)
Mirrors R's hidden_spike simulation: builds a synthetic genome (50% CNV / 50% neutral chromosomes), samples the simulation base from real reference cells, runs the full pipeline, and extracts emission parameters per CNV state.
emit_means, emit_stds = cnv_tl.compute_hspike_emission_params(
adata,
raw_layer="counts",
reference_key="cell_type",
reference_cat=["NK", "Endothelial"],
min_mean_expr_cutoff=0.1, # 必须与 infercnv_r_compat 保持一致
n_sim_cells=100,
n_genes_per_chr=400,
output_space="copy_ratio",
)
4. HMM cell-level Tumor calling (hmm_call_subclusters)
R-equivalent decoder: per-group Leiden subclustering (cluster_by_groups=True, auto resolution), per-chromosome Viterbi with R's pnorm-based emission, segment-length denoise, "subcluster contains ≥1 CNV segment ⇒ Tumor" rule.
cnv_tl.hmm_call_subclusters(
adata,
use_rep="cnv",
reference_key="cell_type",
reference_cat=["NK", "Endothelial"],
precomputed_emit_means=emit_means,
precomputed_emit_stds=emit_stds,
leiden_resolution="auto",
cluster_by_groups=True,
n_neighbors=20,
n_pcs=10,
min_segment_length=5,
min_segments_for_tumor=1,
use_r_viterbi=True,
key_added="cnv_call",
backend="auto",
n_jobs=16,
)
Output (added to adata.obs):
cnv_call—"Tumor"/"Normal"per cellcnv_call_score— continuous CNV burden (mean(|X_cnv − 1.0|))cnv_call_subcluster— Leiden subcluster id used for HMM
5. Visualization
cnv_tl.pca(adata, use_rep="cnv")
cnv_tl.umap(adata)
cnv_pl.chromosome_heatmap(adata, groupby="cnv_call")
import scanpy as sc
sc.pl.embedding(adata, basis="cnv_umap", color=["cnv_call", "cnv_call_score"])
Benchmark
Three pancreatic adenocarcinoma samples (P07 = 3,659 cells, P12 = 7,269 cells, P30 = 4,207 cells); reference group = NK + Endothelial + Fibroblast (~50% of all cells).
| Sample | R inferCNV (runtime) | cnvturbo (runtime) | Speed-up | cnvturbo cell-level Accuracy vs R |
|---|---|---|---|---|
| P07CRX_T (3,659) | 2.5 h | 64 s | 140× | 1.000 |
| P12HWZ_T (7,269) | 5.0 h | 86 s | 210× | 1.000 |
| P30WJJ_T (4,207) | 3.5 h | 54 s | 230× | 1.000 |
cnvturbo's per-cell Tumor / Normal classification is identical to R inferCNV's HMM output across all 15,135 cells.
The "ground truth" was reconstructed directly from R's
pred_cnv_regions.dat+cell_groupingsto bypass a known fuzzy-match bug in some user post-processing scripts.
API overview
cnvturbo
├── tl # tools
│ ├── infercnv # original sliding-window scoring
│ ├── infercnv_r_compat # R-exact 8-step pipeline (recommended)
│ ├── compute_hspike_emission_params # hspike-based HMM emission calibration
│ ├── hmm_call_subclusters # subcluster-level R-equivalent HMM caller
│ ├── hmm_call_cells # cell-level HMM caller (no subclustering)
│ ├── cnv_score, cnv_score_cell # CNV burden scores
│ ├── ithcna, ithgex # intra-tumor heterogeneity
│ ├── pca, umap, tsne, leiden # CNV-space embeddings (Scanpy wrappers)
│ └── copykat # CopyKAT integration (optional, requires R)
├── pp # preprocessing utilities
├── pl # plotting
├── io # GTF / genomic-position helpers
└── datasets # bundled tutorial data
Design highlights
- R-exact pipeline:
infercnv_r_compatreproduces the full 8 R inferCNV steps in gene-space copy-ratio (vs. window-space log2 used by older Python ports). - HMM i6 cell-level calling:
hmm_call_subclustersreproduces R's HMM Viterbi decoder, denoising, and per-subcluster Tumor classification — typically absent from existing Python implementations. - Performance kernels: Numba parallel CPU / PyTorch GPU back-ends for sliding-window convolution and batched Viterbi (
backend="auto" | "cpu" | "cuda"). - Robust to reference contamination: emission std uses MAD (median absolute deviation) × 1.4826 instead of plain std, so reference cells contaminated by tumor cells don't inflate state widths.
A high-level infercnv / cnv_score / chromosome_heatmap API similar to the de facto Python convention is also exposed for ease of migration.
Citation
If you use cnvturbo in your research, please cite this implementation:
@software{cnvturbo,
title = {cnvturbo: GPU/Numba-accelerated scRNA-seq CNV inference with R inferCNV-compatible HMM i6},
url = {https://github.com/LogicByteCraft/cnvturbo},
year = {2026}
}
cnvturbo's algorithm is a faithful port of R inferCNV; please cite the upstream methodology as well when relevant.
License
BSD 3-Clause License — see LICENSE.
Acknowledgements
cnvturbo is inspired by and stays algorithmically aligned with:
inferCNV— reference R implementation of the HMM i6 pipeline.Scanpy/AnnData— single-cell analysis ecosystem.
Contributing
Issues and pull requests are welcome at https://github.com/LogicByteCraft/cnvturbo. Before contributing:
pip install -e ".[dev,test]"
pre-commit install
pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cnvturbo-0.2.0.tar.gz.
File metadata
- Download URL: cnvturbo-0.2.0.tar.gz
- Upload date:
- Size: 4.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83398db3c6a692008ed328c0911a6d22c04fc4d623cc8bc5023bc4926fb4915a
|
|
| MD5 |
26a6e271f0562dab45a2e5e0cb4ca7d1
|
|
| BLAKE2b-256 |
5f67ef5ff705327965f471aff237dcb7bce00d9c822c6a95dcc1be06f8dde9af
|
Provenance
The following attestation bundles were made for cnvturbo-0.2.0.tar.gz:
Publisher:
release.yaml on LogicByteCraft/cnvturbo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cnvturbo-0.2.0.tar.gz -
Subject digest:
83398db3c6a692008ed328c0911a6d22c04fc4d623cc8bc5023bc4926fb4915a - Sigstore transparency entry: 1378298097
- Sigstore integration time:
-
Permalink:
LogicByteCraft/cnvturbo@af878fcb51f981736d3b3238f61644c284691c0a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/LogicByteCraft
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@af878fcb51f981736d3b3238f61644c284691c0a -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file cnvturbo-0.2.0-py3-none-any.whl.
File metadata
- Download URL: cnvturbo-0.2.0-py3-none-any.whl
- Upload date:
- Size: 3.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eea404af6d632f1dd57822b159d80cbc64daecc177685baba01a7e99577bdfbf
|
|
| MD5 |
3f93b53b8f2e40743c66b83efcaddca6
|
|
| BLAKE2b-256 |
d3a145a8b3edb93791e93ad7c46579d585b9184fc61d20284c39fc6249ff55d5
|
Provenance
The following attestation bundles were made for cnvturbo-0.2.0-py3-none-any.whl:
Publisher:
release.yaml on LogicByteCraft/cnvturbo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cnvturbo-0.2.0-py3-none-any.whl -
Subject digest:
eea404af6d632f1dd57822b159d80cbc64daecc177685baba01a7e99577bdfbf - Sigstore transparency entry: 1378298215
- Sigstore integration time:
-
Permalink:
LogicByteCraft/cnvturbo@af878fcb51f981736d3b3238f61644c284691c0a -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/LogicByteCraft
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@af878fcb51f981736d3b3238f61644c284691c0a -
Trigger Event:
workflow_dispatch
-
Statement type: