Skip to main content

Rust port of scHiCluster's impute_chromosome inner pipeline — drop-in replacement for the published Python implementation. Bit-equivalent numerical behaviour, up to ~10× faster on long Hi-C chromosomes.

Project description

rust-scHiCluster

Rust re-implementation of the inner numerical pipeline of scHiCluster (Zhou et al. 2019, PNAS) — single-cell Hi-C contact-matrix imputation. ~10× faster on long chromosomes at single-cell impute, bit-equivalent within float-32 epsilon.

This is a separate package that re-implements the algorithm in Rust; it is not a fork of the upstream Python tree. The original Python implementation continues to be available as pip install schicluster (and is used here only as the parity baseline in the test suite).

What it computes

schicluster_rs.impute_chromosome(...) runs the same end-to-end pipeline as schicluster.impute.impute_chromosome.impute_chromosome:

  1. Read raw single-cell contact matrix from a .cool file.
  2. Drop the diagonal.
  3. 2-D Gaussian convolution (mirror padding) — replaces scipy.ndimage.gaussian_filter.
  4. Drop the diagonal again.
  5. Row-normalise → P.
  6. Random-walk-with-restart fixed point: Q = (1−rp)·P·Q + rp·P for up to 30 iterations or until ‖Q_t − Q_{t-1}‖_F < tol.
  7. Symmetrise: E = Q + Qᵀ.
  8. SQRTVC normalise: E ← D^{-1/2} · E · D^{-1/2} where D = diag(Eᵀ𝟙).
  9. Filter the upper triangle to entries with j − i ≤ output_dist_bins.
  10. Write the result to an HDF5 file (cooler-compatible).

Steps 2–9 run inside Rust; only the cooler read in step 1 and the HDF5 write in step 10 cross the Python boundary.

Speed

Real Chang 2024 LC462 mouse cortex Droplet Hi-C, 25 kb resolution:

step scipy upstream rust speedup
chr1 (n = 7820 bins) 30.5 s 3.2 s 9.6×
chr19 (n = 2461 bins) 0.4 s 0.27 s 1.5×
20 chrs end-to-end per cell 87 s 33 s 2.7×

Multi-process parallelism (8 workers × 2 rayon threads = 16 cores total): 8 chr1 in parallel from 29 s → 9.4 s = an additional 3.1× beyond the per-cell speedup, by avoiding rayon thread oversubscription.

Accuracy

Bit-equivalent to upstream within float-32 ε. On real Chang chr1 (n = 7820, ~1.7 M output non-zeros):

  • max |E_rust − E_scipy| = 8.94 × 10⁻⁸
  • Pearson correlation = 1.000000
  • nnz match exactly.

tests/test_parity.py runs random_walk_cpu over (n, rp) ∈ {50, 200, 500} × {0.05, 0.5, 0.9} and asserts max-relative-error < 1e-4 against scipy's reference implementation. All 11 tests pass.

Install

Requires Rust ≥ 1.78 and maturin ≥ 1.4:

git clone https://github.com/omicverse/rust-scHiCluster
cd rust-scHiCluster
maturin develop --release   # build + install into the active venv

PyPI release coming soon (pip install schicluster-rs).

Use

Drop-in monkey-patch (recommended) — no code changes anywhere:

import schicluster_rs
schicluster_rs.set_num_threads(2)        # 8 workers × 2 = 16 cores
schicluster_rs.patch_schicluster()

# every downstream call to scHiCluster's impute_chromosome now uses Rust:
from schicluster.impute.impute_chromosome import impute_chromosome
impute_chromosome(scool_url=..., chrom='chr1', resolution=25_000,
                  output_path=..., rp=0.5, tol=0.01,
                  pad=1, std=1.0, output_dist=10_050_000)

Direct:

from schicluster_rs import random_walk_cpu, impute_chromosome

# Just the iterative RWR step (CSR → CSR):
Q = random_walk_cpu(P, rp=0.5, tol=0.01)

# Full inner pipeline (writes HDF5 like upstream):
impute_chromosome(scool_url='cell.cool', chrom='chr1',
                  resolution=25_000, output_path='chr1.hdf',
                  rp=0.5, tol=0.01, pad=1, std=1.0,
                  output_dist=10_050_000)

Multi-process tuning

schicluster's default workflow is ProcessPoolExecutor(max_workers=N). Each worker forks the rayon thread pool — without explicit sizing, every worker spawns num_cpus threads, leading to N × num_cpus contending threads on a single node.

Set the per-worker rayon thread count via set_num_threads(n) in the worker initialiser. Recommended sizing: n = num_cpus // num_workers. Example: 16-core node with 8 workers → set_num_threads(2).

from concurrent.futures import ProcessPoolExecutor
import schicluster_rs

def worker_init():
    schicluster_rs.set_num_threads(2)
    schicluster_rs.patch_schicluster()

with ProcessPoolExecutor(max_workers=8, initializer=worker_init) as ex:
    list(ex.map(impute_one_cell, cells))

Layout

rust-scHiCluster/
├── pyproject.toml          maturin build config
├── README.md               this file
├── LICENSE                 MIT
├── rust/
│   ├── Cargo.toml
│   └── src/lib.rs          all algorithms (~500 LoC)
├── python/
│   └── schicluster_rs/__init__.py     thin Python wrapper + monkey-patch
└── tests/
    └── test_parity.py      parity vs scipy on random sparse matrices

Algorithm notes

The hot loop is the iterative random-walk-with-restart, which is implemented as a Sparse-times-Dense matrix multiplication (SpMM) with rayon row-wise parallelism:

  • P (sparse, ~7 nnz per row after Gaussian smoothing) stays as CSR.
  • Q (the iterate) is stored dense, since RWR diffuses it to ≥ 30 % density after 1–2 iterations anyway.
  • Each iteration: Q' = (1−rp) · (P · Q) + rp · P. The P · Q matmul is computed row-wise; each output row is independent, so rayon splits row-chunks across cores. Within each row, the inner AXPY (accumulate P[i,k] · Q[k, :] for sparse k) vectorises cleanly.

Other steps (Gaussian convolution, SQRTVC normalize, triangle filter) are similarly multi-threaded over rows or chunks.

For users who can tolerate ≪1% deviation from the strict scipy result, a band_factor parameter is available that runs the RWR with a banded Q (only entries with |j − i| ≤ band_factor × output_dist_bins), giving an additional ~4× speedup. Default is 0 (off, strict).

Citation

If you use this package, please cite the original scHiCluster paper:

Zhou, J., Ma, J., Chen, Y., Cheng, C., Bao, B., Peng, J., Sejnowski, T. J., Dixon, J. R. & Ecker, J. R. (2019). Robust single-cell Hi-C clustering by convolution- and random-walk-based imputation. PNAS, 116(28):14011-14018.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schicluster_rs-0.1.0.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

schicluster_rs-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (325.2 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file schicluster_rs-0.1.0.tar.gz.

File metadata

  • Download URL: schicluster_rs-0.1.0.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for schicluster_rs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 11ccfff16480850e95d6989091d3b78f1fe012fc3ddc19e63fd1c09292082e9a
MD5 5f6356e3f775f1ed3d7dc3d29184087e
BLAKE2b-256 7848530672aa4ec1fde84374bf7f217fb36a5da86528a24b9a4db7f0316cb058

See more details on using hashes here.

File details

Details for the file schicluster_rs-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for schicluster_rs-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 63968047fe5013adc96e52f4e325fba0ae6651eedf001a9cec65de795540553d
MD5 bb22bec7b2cffc14846bf4c7037b73e8
BLAKE2b-256 bc4e0a1dc3a6c4db2581890409e2af60a316e0f14c668a333560b4f82cc2aded

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page