Copy number alteration inference from single-cell RNA-seq data
Project description
inferCNAsc
Copy number alteration (CNA) inference from single-cell RNA-seq data. A Rust core with optional Python bindings via PyO3.
The pipeline is a chromosome-aware sliding-window smoother, per-gene z-score
thresholding, and a parallel run-length merge that assembles per-cell CNA
regions. The Rust core is parallelized over genes (smoothing, z-scoring) and
over cells (region assembly) with rayon; the Python layer handles AnnData
adaptation, Ensembl annotation lookup, evaluation, and plotting.
Installation
pip install infercnasc
Wheels are built for Linux x86_64/aarch64, macOS universal2, and Windows x86_64 against Python's abi3-py310 stable ABI, so a single wheel serves Python 3.10+.
For the Rust API:
[dependencies]
infercnasc = "0.2"
No feature flags are needed for the native Rust API.
Python
from infercnasc import CNAInferrer
import infercnasc.plot as icplot
inferrer = CNAInferrer.from_anndata(adata)
inferrer = CNAInferrer(window_size=50).fit(expression_matrix, gene_df)
cnas = inferrer.cna_df()
icplot.cna_matrix(inferrer)
gene_df is a DataFrame with columns gene, chrom, start, end.
infercnasc.io.annotate_genes(gene_ids) fetches these from Ensembl with a
local requests-cache backing store.
Sparse AnnData.X is supported natively. Coordinate-annotation filtering
runs on the sparse matrix first, so the eventual dense materialization is
limited to genes that survive annotation; this avoids the standard scRNA
memory blow-up of an unconditional .toarray().
Rust
use infercnasc::{smooth_expression, find_cnas, assign_cnas_to_cells, InferError};
let smoothed = smooth_expression(&expression, &chroms, window_size)?;
let (gains, losses) = find_cnas(&smoothed, z_score_threshold);
let cnas = assign_cnas_to_cells(
&gains, &losses, &chroms, &starts, &ends, &gene_names, min_region_size,
);
smooth_expression returns Result<Array2<f64>, InferError>. find_cnas and
assign_cnas_to_cells are infallible.
Pipeline
- Gene annotation. Gene identifiers are resolved to genomic coordinates via the Ensembl REST API, with responses cached locally under the platform cache directory.
- Smoothing. A sliding-window mean is applied along each chromosome. The window resets at chromosome boundaries. Columns are processed in parallel.
- CNA calling. Per-gene z-scores are computed across cells. Entries
above
+z_thresholdare flagged as gains, entries below-z_thresholdas losses. Zero-variance genes are skipped. - Region assembly. Consecutive flagged genes on the same chromosome are
merged into
CnaRecordregions by a parallel per-cell run-length scan. Runs shorter thanmin_region_sizeare dropped.
Benchmarks
End-to-end pipeline (smoothing + calling + region assembly) on real public tumor scRNA-seq data: the Tirosh 2016 oligodendroglioma dataset shipped with the inferCNV R package (184 cells x ~10,000 annotated genes). A planted-chr1-loss synthetic matrix is also run to show scaling at larger sizes. Single local run on a Ryzen laptop; your numbers will differ.
| implementation | Tirosh (184 x 10,338) | synth (2000 x 10,000) |
|---|---|---|
_core direct FFI call |
0.032 s | 0.62 s |
CNAInferrer.fit (wrapper) |
0.111 s | 0.84 s |
infercnvpy.tl.infercnv |
1.134 s | 1.95 s |
| pure-numpy reference | 0.547 s | (skip) |
_core direct is the straight FFI call; CNAInferrer.fit adds the
coordinate filter, DataFrame sort, and DataFrame assembly around it.
infercnvpy.tl.infercnv uses a different algorithmic core (log
fold-change on sparse windows) and is the nearest published
Python-ecosystem comparator. The pure-numpy reference is a faithful
per-chromosome cumulative-sum smoothing + z-score reimplementation used
as an apples-to-apples control for the algorithm itself.
Reproduce:
python benchmarks/compare.py # real Tirosh data
python benchmarks/compare.py --synth --cells 2000 --genes 10000
cargo bench --no-default-features # native Rust criterion
Evaluation
metrics = inferrer.evaluate(simulated_df)
# {"true_positives": ..., "precision": ..., "recall": ..., "f1": ...}
Matching is any-overlap on genomic coordinates within the same chromosome and label. The implementation is O(n_inferred + n_truth) via a chromosome- and-label-indexed bucket sweep.
Acknowledgements
A pre-release Python prototype predating this crate was developed with Raeann Kalinowski and Amy Liu as a 2025 course project at Johns Hopkins; this repository is a full independent rewrite and is not affiliated with that coursework.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file infercnasc-0.2.0-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: infercnasc-0.2.0-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 230.6 kB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdc46e3f3b5042c8014beac3545f75dad0bf503157502d5ab05c173fe31076b4
|
|
| MD5 |
273758f38acc2ac946a3de5c352b19bd
|
|
| BLAKE2b-256 |
d91052342b7eed4d907b7c021e2755c623f0564936f9c6d6d24010029d6a0338
|
Provenance
The following attestation bundles were made for infercnasc-0.2.0-cp310-abi3-win_amd64.whl:
Publisher:
release.yml on alejandro-soto-franco/inferCNAsc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
infercnasc-0.2.0-cp310-abi3-win_amd64.whl -
Subject digest:
cdc46e3f3b5042c8014beac3545f75dad0bf503157502d5ab05c173fe31076b4 - Sigstore transparency entry: 1342664637
- Sigstore integration time:
-
Permalink:
alejandro-soto-franco/inferCNAsc@8eb75520b8b1ddadd3e87d4017f806eb852e9aee -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/alejandro-soto-franco
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8eb75520b8b1ddadd3e87d4017f806eb852e9aee -
Trigger Event:
push
-
Statement type:
File details
Details for the file infercnasc-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: infercnasc-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 407.4 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d09d94c8ce817a0de173885a7904d66ff5967afa1ec617f117616361bfee4d46
|
|
| MD5 |
7756884d5b71a3b269b3794b89cab97a
|
|
| BLAKE2b-256 |
7fd081e0b601787ca044c1cf1d9c87ba6c0c4c72e129674ec20dc90a0246977d
|
Provenance
The following attestation bundles were made for infercnasc-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on alejandro-soto-franco/inferCNAsc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
infercnasc-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
d09d94c8ce817a0de173885a7904d66ff5967afa1ec617f117616361bfee4d46 - Sigstore transparency entry: 1342664620
- Sigstore integration time:
-
Permalink:
alejandro-soto-franco/inferCNAsc@8eb75520b8b1ddadd3e87d4017f806eb852e9aee -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/alejandro-soto-franco
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8eb75520b8b1ddadd3e87d4017f806eb852e9aee -
Trigger Event:
push
-
Statement type:
File details
Details for the file infercnasc-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: infercnasc-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 398.2 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44f48daa2f97be221d17bf42ff06576a07f25b3e9359cb3e78f4bab266c69335
|
|
| MD5 |
c1fd62742d0f12cc8f053b68f1e24c65
|
|
| BLAKE2b-256 |
d96b38625f60b61201c42b9f2242910d5f2cc130521baa66168ec49adc3dd42e
|
Provenance
The following attestation bundles were made for infercnasc-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
release.yml on alejandro-soto-franco/inferCNAsc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
infercnasc-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
44f48daa2f97be221d17bf42ff06576a07f25b3e9359cb3e78f4bab266c69335 - Sigstore transparency entry: 1342664627
- Sigstore integration time:
-
Permalink:
alejandro-soto-franco/inferCNAsc@8eb75520b8b1ddadd3e87d4017f806eb852e9aee -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/alejandro-soto-franco
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8eb75520b8b1ddadd3e87d4017f806eb852e9aee -
Trigger Event:
push
-
Statement type:
File details
Details for the file infercnasc-0.2.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: infercnasc-0.2.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 696.3 kB
- Tags: CPython 3.10+, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d84839afe4a20531e51f02a38075c9051183c9b65bd2fea60d07f217e2a85878
|
|
| MD5 |
07ddebda1ba3d59b16c6a4036ef23319
|
|
| BLAKE2b-256 |
a133ff7d94f94dcd1497aa8f0e660d2672f7c360d4c1c17aab6568a02b6c71e9
|
Provenance
The following attestation bundles were made for infercnasc-0.2.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:
Publisher:
release.yml on alejandro-soto-franco/inferCNAsc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
infercnasc-0.2.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl -
Subject digest:
d84839afe4a20531e51f02a38075c9051183c9b65bd2fea60d07f217e2a85878 - Sigstore transparency entry: 1342664613
- Sigstore integration time:
-
Permalink:
alejandro-soto-franco/inferCNAsc@8eb75520b8b1ddadd3e87d4017f806eb852e9aee -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/alejandro-soto-franco
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8eb75520b8b1ddadd3e87d4017f806eb852e9aee -
Trigger Event:
push
-
Statement type: