Skip to main content

Copy number alteration inference from single-cell RNA-seq data

Project description

inferCNAsc

crates.io docs.rs PyPI License: MIT CI MSRV

Copy number alteration (CNA) inference from single-cell RNA-seq data. A Rust core with optional Python bindings via PyO3.

The pipeline is a chromosome-aware sliding-window smoother, per-gene z-score thresholding, and a parallel run-length merge that assembles per-cell CNA regions. The Rust core is parallelized over genes (smoothing, z-scoring) and over cells (region assembly) with rayon; the Python layer handles AnnData adaptation, Ensembl annotation lookup, evaluation, and plotting.

Installation

pip install infercnasc

Wheels are built for Linux x86_64/aarch64, macOS universal2, and Windows x86_64 against Python's abi3-py310 stable ABI, so a single wheel serves Python 3.10+.

For the Rust API:

[dependencies]
infercnasc = "0.2"

No feature flags are needed for the native Rust API.

Python

from infercnasc import CNAInferrer
import infercnasc.plot as icplot

inferrer = CNAInferrer.from_anndata(adata)

inferrer = CNAInferrer(window_size=50).fit(expression_matrix, gene_df)

cnas = inferrer.cna_df()
icplot.cna_matrix(inferrer)

gene_df is a DataFrame with columns gene, chrom, start, end. infercnasc.io.annotate_genes(gene_ids) fetches these from Ensembl with a local requests-cache backing store.

Sparse AnnData.X is supported natively. Coordinate-annotation filtering runs on the sparse matrix first, so the eventual dense materialization is limited to genes that survive annotation; this avoids the standard scRNA memory blow-up of an unconditional .toarray().

Rust

use infercnasc::{smooth_expression, find_cnas, assign_cnas_to_cells, InferError};

let smoothed = smooth_expression(&expression, &chroms, window_size)?;
let (gains, losses) = find_cnas(&smoothed, z_score_threshold);
let cnas = assign_cnas_to_cells(
    &gains, &losses, &chroms, &starts, &ends, &gene_names, min_region_size,
);

smooth_expression returns Result<Array2<f64>, InferError>. find_cnas and assign_cnas_to_cells are infallible.

Pipeline

  1. Gene annotation. Gene identifiers are resolved to genomic coordinates via the Ensembl REST API, with responses cached locally under the platform cache directory.
  2. Smoothing. A sliding-window mean is applied along each chromosome. The window resets at chromosome boundaries. Columns are processed in parallel.
  3. CNA calling. Per-gene z-scores are computed across cells. Entries above +z_threshold are flagged as gains, entries below -z_threshold as losses. Zero-variance genes are skipped.
  4. Region assembly. Consecutive flagged genes on the same chromosome are merged into CnaRecord regions by a parallel per-cell run-length scan. Runs shorter than min_region_size are dropped.

Benchmarks

End-to-end pipeline (smoothing + calling + region assembly) on real public tumor scRNA-seq data: the Tirosh 2016 oligodendroglioma dataset shipped with the inferCNV R package (184 cells x ~10,000 annotated genes). A planted-chr1-loss synthetic matrix is also run to show scaling at larger sizes. Single local run on a Ryzen laptop; your numbers will differ.

implementation Tirosh (184 x 10,338) synth (2000 x 10,000)
_core direct FFI call 0.032 s 0.62 s
CNAInferrer.fit (wrapper) 0.111 s 0.84 s
infercnvpy.tl.infercnv 1.134 s 1.95 s
pure-numpy reference 0.547 s (skip)

_core direct is the straight FFI call; CNAInferrer.fit adds the coordinate filter, DataFrame sort, and DataFrame assembly around it. infercnvpy.tl.infercnv uses a different algorithmic core (log fold-change on sparse windows) and is the nearest published Python-ecosystem comparator. The pure-numpy reference is a faithful per-chromosome cumulative-sum smoothing + z-score reimplementation used as an apples-to-apples control for the algorithm itself.

Reproduce:

python benchmarks/compare.py                  # real Tirosh data
python benchmarks/compare.py --synth --cells 2000 --genes 10000
cargo bench --no-default-features             # native Rust criterion

Evaluation

metrics = inferrer.evaluate(simulated_df)
# {"true_positives": ..., "precision": ..., "recall": ..., "f1": ...}

Matching is any-overlap on genomic coordinates within the same chromosome and label. The implementation is O(n_inferred + n_truth) via a chromosome- and-label-indexed bucket sweep.

Acknowledgements

A pre-release Python prototype predating this crate was developed with Raeann Kalinowski and Amy Liu as a 2025 course project at Johns Hopkins; this repository is a full independent rewrite and is not affiliated with that coursework.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

infercnasc-0.2.0-cp310-abi3-win_amd64.whl (230.6 kB view details)

Uploaded CPython 3.10+Windows x86-64

infercnasc-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (407.4 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

infercnasc-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (398.2 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

infercnasc-0.2.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (696.3 kB view details)

Uploaded CPython 3.10+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file infercnasc-0.2.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: infercnasc-0.2.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 230.6 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for infercnasc-0.2.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 cdc46e3f3b5042c8014beac3545f75dad0bf503157502d5ab05c173fe31076b4
MD5 273758f38acc2ac946a3de5c352b19bd
BLAKE2b-256 d91052342b7eed4d907b7c021e2755c623f0564936f9c6d6d24010029d6a0338

See more details on using hashes here.

Provenance

The following attestation bundles were made for infercnasc-0.2.0-cp310-abi3-win_amd64.whl:

Publisher: release.yml on alejandro-soto-franco/inferCNAsc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file infercnasc-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for infercnasc-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d09d94c8ce817a0de173885a7904d66ff5967afa1ec617f117616361bfee4d46
MD5 7756884d5b71a3b269b3794b89cab97a
BLAKE2b-256 7fd081e0b601787ca044c1cf1d9c87ba6c0c4c72e129674ec20dc90a0246977d

See more details on using hashes here.

Provenance

The following attestation bundles were made for infercnasc-0.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on alejandro-soto-franco/inferCNAsc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file infercnasc-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for infercnasc-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 44f48daa2f97be221d17bf42ff06576a07f25b3e9359cb3e78f4bab266c69335
MD5 c1fd62742d0f12cc8f053b68f1e24c65
BLAKE2b-256 d96b38625f60b61201c42b9f2242910d5f2cc130521baa66168ec49adc3dd42e

See more details on using hashes here.

Provenance

The following attestation bundles were made for infercnasc-0.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on alejandro-soto-franco/inferCNAsc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file infercnasc-0.2.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for infercnasc-0.2.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 d84839afe4a20531e51f02a38075c9051183c9b65bd2fea60d07f217e2a85878
MD5 07ddebda1ba3d59b16c6a4036ef23319
BLAKE2b-256 a133ff7d94f94dcd1497aa8f0e660d2672f7c360d4c1c17aab6568a02b6c71e9

See more details on using hashes here.

Provenance

The following attestation bundles were made for infercnasc-0.2.0-cp310-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: release.yml on alejandro-soto-franco/inferCNAsc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page