Memory-efficient streaming analysis of large-scale CRISPR and Perturb-seq screens on disk-backed AnnData files

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jaydu

These details have not been verified by PyPI

Project links

Documentation

Project description

crispyx

Motivation

Genome-wide CRISPR screens routinely produce datasets with hundreds of thousands of cells and tens of thousands of genes. Standard single-cell analysis toolkits (Scanpy, Pertpy) load the entire count matrix into memory, which can require 30–100+ GB of RAM and makes many screens impractical to analyse on commodity hardware or shared HPC nodes with per-job memory limits.

crispyx solves this by streaming data directly from on-disk AnnData (.h5ad) files. Quality control, normalisation, pseudo-bulk aggregation, and differential expression all operate without materialising the full matrix in memory, so even the largest screens can be processed with modest resources.

Features

Streaming QC & preprocessing – Filter cells, perturbations, and genes; normalise and log-transform; all without loading the full matrix into memory
Pseudo-bulk aggregation – Average log expression and pseudo-bulk count matrices for effect size estimation
Differential expression – t-test, Wilcoxon rank-sum, and negative binomial GLM with apeGLM LFC shrinkage; multi-core support and adaptive memory management; per-condition low-expression filtering to exclude genes that are near-zero in both groups
Dimension reduction – Memory-efficient PCA and KNN graph construction on backed data
Scanpy-compatible API & plotting – Familiar cx.pp, cx.pb, cx.tl, and cx.pl namespaces; Scanpy-style rank genes plots, volcano, MA, PCA, UMAP, QC summaries, and overlap heatmaps
Data preparation utilities – Edit backed metadata without loading X; standardise gene names; normalise perturbation labels; auto-detect metadata columns
HPC-ready – Resume/checkpoint for long-running jobs; configurable memory_limit_gb; Docker and Singularity support

Quick Start

import crispyx as cx

# Open dataset without loading into memory
adata = cx.read_h5ad_ondisk("data/demo_benchmark.h5ad")

# Quality control with adaptive thresholds
adata = cx.pp.qc_summary(
    adata,
    perturbation_column="perturbation",
    min_genes=5,
    min_cells_per_perturbation=5,
)

# Differential expression
adata = cx.tl.rank_genes_groups(
    adata,
    perturbation_column="perturbation",
    method="wilcoxon",  # or "t-test", "nb_glm"
)

# Access results
print(adata.uns["rank_genes_groups"])
de_results = adata.uns["rank_genes_groups"].load()

For the full workflow (normalisation, PCA, pseudo-bulk, NB-GLM, LFC shrinkage, plotting, data preparation utilities), see the Usage Guide and the tutorial notebook.

Performance

Benchmarked across 12 CRISPR screen datasets (21k–1.97M cells), crispyx consistently outperforms Scanpy, Pertpy/PyDESeq2, and edgeR in both speed and memory:

Metric	crispyx vs Scanpy	crispyx vs Pertpy/PyDESeq2
t-test	2–11× faster	—
Wilcoxon	2–43× faster	—
NB-GLM	—	2× faster, completes where Pertpy OOMs
Peak memory	2–6× lower	Runs within 64 GB where Pertpy exceeds 120 GB
Accuracy	Pearson r > 0.999 vs Scanpy	Pearson r > 0.97 vs PyDESeq2

crispyx succeeds on all 12 datasets, while Scanpy times out or OOMs on the largest screens and Pertpy/edgeR fail on most genome-wide datasets.

Benchmark results: crispyx vs reference methods

See benchmarking/ for full results and reproduction scripts.

Installation

pip install crispyx

For development (editable install with all extras):

git clone https://github.com/jaydu1/crispyx.git
cd crispyx
pip install -e ".[test,benchmark,docs]"

Benchmarking

cd benchmarking
./run_benchmark.sh config/Adamson.yaml       # single dataset
./run_benchmark.sh config/*.yaml             # all datasets

See benchmarking/README.md for configuration options and output structure.

Testing

pytest

Documentation

sphinx-build docs docs/_build

Acknowledgements

crispyx builds on the foundational work of Scanpy (Wolf et al., 2018), Pertpy, PyDESeq2 (Muzellec et al., 2023), and AnnData (Virshup et al., 2024). We gratefully acknowledge these projects for establishing the single-cell analysis ecosystem in Python; crispyx extends their APIs and algorithmic designs to enable memory-efficient, streaming computation for large-scale CRISPR screen datasets.

Contributing

Suggestions, bug reports, and contributions are welcome! Please open an issue or submit a pull request.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jaydu

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

0.0.4

May 14, 2026

This version

0.0.3

May 13, 2026

0.0.2

Apr 28, 2026

0.0.1

Mar 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crispyx-0.0.3.tar.gz (245.9 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crispyx-0.0.3-py3-none-any.whl (196.1 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file crispyx-0.0.3.tar.gz.

File metadata

Download URL: crispyx-0.0.3.tar.gz
Upload date: May 13, 2026
Size: 245.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for crispyx-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`9a0db2739e671324ace39a06e2d45962f18515da3aaa583691f761818f716e67`
MD5	`9cc169d0b2d2297399ee291de0650542`
BLAKE2b-256	`f429ee013a358293e712c699c9ddbee424fa5b483d89d9acfe549c27a6a1e3f9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for crispyx-0.0.3.tar.gz:

Publisher: publish.yml on jaydu1/crispyx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: crispyx-0.0.3.tar.gz
- Subject digest: 9a0db2739e671324ace39a06e2d45962f18515da3aaa583691f761818f716e67
- Sigstore transparency entry: 1523527634
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: jaydu1/crispyx@3553e0401f92cbb0565bc1abd848e8855585c7b8
- Branch / Tag: refs/tags/0.0.3
- Owner: https://github.com/jaydu1
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3553e0401f92cbb0565bc1abd848e8855585c7b8
- Trigger Event: release

File details

Details for the file crispyx-0.0.3-py3-none-any.whl.

File metadata

Download URL: crispyx-0.0.3-py3-none-any.whl
Upload date: May 13, 2026
Size: 196.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for crispyx-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4b7f6ef29ffddaeca33e72bb36b63d343c5f94482b3e7f21bcd139800b68fec8`
MD5	`46cfbcefa42f4da54165aaccd30618cb`
BLAKE2b-256	`5830cbac7fe28507cf79c0ef02f9055ec03fa3832471c21f598f534aaa17c371`

See more details on using hashes here.

Provenance

The following attestation bundles were made for crispyx-0.0.3-py3-none-any.whl:

Publisher: publish.yml on jaydu1/crispyx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: crispyx-0.0.3-py3-none-any.whl
- Subject digest: 4b7f6ef29ffddaeca33e72bb36b63d343c5f94482b3e7f21bcd139800b68fec8
- Sigstore transparency entry: 1523527644
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: jaydu1/crispyx@3553e0401f92cbb0565bc1abd848e8855585c7b8
- Branch / Tag: refs/tags/0.0.3
- Owner: https://github.com/jaydu1
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3553e0401f92cbb0565bc1abd848e8855585c7b8
- Trigger Event: release

crispyx 0.0.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

crispyx

Motivation

Features

Quick Start

Performance

Installation

Benchmarking

Testing

Documentation

Acknowledgements

Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance