Pure-Python re-implementation of Seurat's CCA — canonical correlation analysis for single-cell integration, AnnData-native.

These details have not been verified by PyPI

Project links

Project description

pyccasc

A pure-Python re-implementation of Seurat's RunCCA (Stuart, Butler, Hoffman, Hafemeister et al., Cell 2019) — canonical correlation analysis for single-cell integration. Drop-in for the scanpy / AnnData ecosystem.

The PyPI distribution is pyccasc (CCA for single-cell); the Python import name is cca_py (so from cca_py import run_cca). The GitHub repo lives at omicverse/py-cca.

AnnData-native — feeds directly into Scanpy / OmicVerse pipelines
No rpy2, no R install, no Rcpp toolchain
Numerical parity with Seurat::RunCCA validated across 9 (size × num_cc) configurations: singular values match to ~1e-7, subspaces match to ~1e-3 (rotation within near-degenerate eigenspaces is the only source of difference)

Same upstream-mirror pattern as pymclustR, monocle2-py, milor-py: the canonical implementation lives in omicverse; this repo is the standalone slice for users who want CCA without the full omicverse stack.

Install

pip install pyccasc

Quick-start

import numpy as np
from cca_py import run_cca

# X, Y are (n_features, n_cells) matrices with matched genes
X = np.random.randn(2000, 500)   # batch 1: 2000 genes × 500 cells
Y = np.random.randn(2000, 700)   # batch 2: 2000 genes × 700 cells

result = run_cca(X, Y, num_cc=30)
print(result.ccv.shape)          # (1200, 30) — shared CC embedding
print(result.d.shape)            # (30,)      — singular values

u, v = result.split()             # per-batch halves: (500, 30) and (700, 30)

AnnData adapter

from cca_py import run_cca_anndata

# adata1, adata2 are scanpy AnnData objects (cells × genes)
result = run_cca_anndata(adata1, adata2, num_cc=30, layer="log1p")

# adata1.obsm['X_cca'] now holds the (n_obs_1, 30) shared embedding
# adata2.obsm['X_cca'] holds the (n_obs_2, 30) embedding for the second batch
# adata.uns['cca'] carries the singular-value diagnostics

Algorithm

Direct port of Seurat::RunCCA.default (Seurat R/dimensional_reduction.R, lines 506–541):

object1 <- Standardize(object1)        # z-score per cell (column)
object2 <- Standardize(object2)
mat3    <- crossprod(object1, object2) # cells_1 × cells_2 cross-cov
cca.svd <- irlba(mat3, nv = num.cc)    # truncated SVD
ccv     <- rbind(cca.svd$u, cca.svd$v) # (n1 + n2) × num.cc
# sign-flip each column so its first entry is non-negative
return(list(ccv = ccv, d = cca.svd$d))

We use scipy.sparse.linalg.svds (ARPACK) in place of irlba. Both are Lanczos-based and produce numerically equivalent top-k SVD truncations.

⚠️ Standardize gotcha: Seurat's Standardize (in src/data_manipulation.cpp) z-scores per column (per cell), not per row (per gene) — a non-obvious choice that's load-bearing for CCA correctness. We replicated it.

Module map

Module	What it covers
`cca_py.cca`	core `run_cca()` + `standardize()` + `l2_normalize()`
`cca_py.anndata_adapter`	`run_cca_anndata()` for the scanpy / AnnData ecosystem

Seurat parity

tests/r_parity_dump.R runs Seurat::RunCCA on three synthetic dataset sizes (small / medium / large) at three num_cc values (5 / 10 / 20). tests/test_r_parity.py then runs py-CCA's run_cca on the same inputs and asserts:

Quantity	Tolerance
singular values (per-component relative error)	< 1e-5
per-component embedding correlation	> 0.999
Frobenius distance between the two column-span projectors	< 5e-3

All 9 configurations × 2 assertion families = 18 parity tests pass. To reproduce:

# in CMAP env (R + Seurat)
Rscript tests/r_parity_dump.R

# then in omicdev env
pytest tests/ -v

Roadmap

This first release covers the core SVD step of RunCCA. The full Seurat integration workflow uses CCA as the first step in FindIntegrationAnchors:

✅ RunCCA — shared CC embedding (this release)
⏳ L2CCA — provided as cca_py.l2_normalize; integration with the result struct pending
⏳ FindIntegrationAnchors — k-NN in CCA space → mutual nearest neighbours → anchor scoring
⏳ IntegrateData — anchor-weighted correction of the expression matrix

PRs welcome.

Citation

If you use this package, please cite the original Seurat integration paper:

Stuart, T., Butler, A., Hoffman, P., Hafemeister, C. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019). https://doi.org/10.1016/j.cell.2019.05.031

and acknowledge omicverse / this repo for the Python port.

License

GNU GPLv3 — matches both upstream omicverse and Seurat.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyccasc-0.1.0.tar.gz (52.9 kB view details)

Uploaded Apr 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyccasc-0.1.0-py3-none-any.whl (36.3 kB view details)

Uploaded Apr 20, 2026 Python 3

File details

Details for the file pyccasc-0.1.0.tar.gz.

File metadata

Download URL: pyccasc-0.1.0.tar.gz
Upload date: Apr 20, 2026
Size: 52.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for pyccasc-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`64a950402ca611980e755c7b4429dfd2298fc93f7a59f2ba97cc19324522ffbe`
MD5	`5b9612d047f16ec19d23efca3c9898bc`
BLAKE2b-256	`d5c3b6bb4e8df17186896c0a441191ffa899797fe8fadda8c6e12fce9ec1f4c3`

See more details on using hashes here.

File details

Details for the file pyccasc-0.1.0-py3-none-any.whl.

File metadata

Download URL: pyccasc-0.1.0-py3-none-any.whl
Upload date: Apr 20, 2026
Size: 36.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for pyccasc-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f81d3a4d8c91d32342983d4129f340dfbca47aa8bbc371bd1be3706192bc49c8`
MD5	`b02fd0cdb5e2f5f206eed80a4b380522`
BLAKE2b-256	`03170666b8dcca19a96ea5a0d3b2494f4deccf02bdf3a27d4c7cb85166ce5e3b`

See more details on using hashes here.

pyccasc 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pyccasc

Install

Quick-start

AnnData adapter

Algorithm

Module map

Seurat parity

Roadmap

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes