Opinionated, report-first CLI for single-cell and multi-omics analysis. Sane defaults baked in, defensible deliverables out.

These details have not been verified by PyPI

Project links

Project description

scellrun

scellrun stops two analysts — or two LLM agents — from getting two different answers on the same single-cell data.

Why this exists

Single-cell analysis has a documented reproducibility problem. From the field's own retrospective (Perspectives on rigor and reproducibility in single cell genomics): "in my group's experience, it is not unusual for reanalysis to find 20% fewer or more clusters in datasets" — same raw data, different analyst, different answer. The same review notes that of ~50 high-impact single-cell papers surveyed, "just a handful" reported any external validation. Most of the choices that drive that 20% divergence — mt% ceiling, HVG count, integration method, clustering resolution, panel pick — are made ad-hoc in a notebook and never written down.

A modern LLM agent handed an .h5ad and scanpy will write working code and produce a report. That solves "can the work happen". It does not solve "will two agents on the same data produce the same answer", "can a reviewer reconstruct why mt% was 20 and not 10 six months later", or "is the panel choice consistent with this lab's working practice on this tissue". Vanilla agents improvise thresholds, do not record the rationale in any machine-readable form, and have no way to encode the consensus a clinical-bioinformatics team has built over years of dogfooding.

scellrun fills that gap. Every threshold has a tested default with a one-sentence rationale; every choice the pipeline makes — auto, user-override, or LLM-recommended — is appended to a 00_decisions.jsonl file you can grep; tissue-specific working practice ships as profiles/ (cartilage today, contribute yours); each stage runs a self-check against PI-defined trigger thresholds and surfaces an actionable suggestion before the user sees the downstream finding. Different layer from a workflow manager: if you need orchestration across a cluster, use nf-core; scellrun is what you call from inside one of those. Not a replacement for scanpy either — it calls scanpy under the hood, with opinionated parameters and a decision log on top.

Who this is for

The LLM agent (Claude Code, Hermes, Codex) handling a clinician's request. This is the primary user. The agent ssh's to the data, runs scellrun analyze, reads the artifacts, and translates. skills/scellrun/SKILL.md is the operational guide it reads.
The clinician-bioinformatics team that wants every project to look the same in a report. Same QC layout, same decision table, same provenance trail across samples, students, and rotations.
The reviewer asking "why mt% 20?" The answer is 00_decisions.jsonl line 14, verbatim: "mt% ceiling 20.0% — joint tissue is stress-prone, the textbook 10% silently drops real chondrocytes (PI cohort 2024-2026, AIO PM=20)".

Quick start

conda create -n scellrun python=3.11 -y
conda activate scellrun
pip install scellrun

scellrun analyze data.h5ad --tissue "OA cartilage"
# → ./scellrun_out/run-<ts>/05_report/index.html

Don't have an .h5ad? Cellranger output works directly:

scellrun scrna convert path/to/cellranger_outs -o data.h5ad
scellrun analyze data.h5ad --tissue "OA cartilage"

Add --lang zh for a Chinese report. Add --profile joint-disease if the tissue is cartilage / synovium / subchondral bone (auto-loads the Fan 2024 chondrocyte panel and tightens hb% for avascular cartilage). Walkthrough in docs/quickstart.md; contribution notes in docs/contributing.md.

How an agent uses this

Drop skills/scellrun/SKILL.md into your agent's skills directory and the agent will know which command maps to which user intent, how to read the decision log, when to surface a self-check trigger before answering, and which profile to pick by tissue keyword. docs/agent-demo.md is a verbatim transcript of a Claude Code agent running scellrun end-to-end on real OA cartilage scRNA data — including the agent quoting the decision log when the user asks "why resolution 0.3?" and switching panels when the deterministic call is wrong.

What's in the decision log

00_decisions.jsonl is the single source of truth for every non-trivial choice the pipeline made. One JSON object per line. Sample shape (real, from docs/v1demo/decisions.jsonl):

{"schema_version":1,"stage":"qc","key":"max_pct_mt","value":20.0,"default":20.0,"source":"auto","rationale":"mt% ceiling 20.0% — joint tissue is stress-prone, the textbook 10% silently drops real chondrocytes (PI cohort 2024-2026, AIO PM=20)","fix_payload":null,"attempt_id":"cae89793d0f9470a9c7f38894928f304","ts":"2026-04-30T15:20:32+00:00"}
{"schema_version":1,"stage":"analyze","key":"method_downgrade","value":"none","default":"harmony","source":"auto","rationale":"no sample/batch column in obs — single-sample input; auto-downgraded --method from harmony to none","fix_payload":null,"attempt_id":"cae89793d0f9470a9c7f38894928f304","ts":"2026-04-30T15:18:51+00:00"}
{"schema_version":1,"stage":"analyze","key":"chosen_resolution_for_annotate","value":0.3,"default":null,"source":"auto","rationale":"fewest singletons → most balanced (every resolution fragmented) — picked res=0.3: n_clusters=13, largest=31.5%, smallest=0.2%, singletons=2","fix_payload":null,"attempt_id":"cae89793d0f9470a9c7f38894928f304","ts":"2026-04-30T15:20:52+00:00"}
{"schema_version":1,"stage":"analyze","key":"annotate.auto_panel","value":"celltype_broad","default":null,"source":"auto","rationale":"swapped to celltype_broad: chondrocyte_hits=2, broad_hits=9; required >=1.5x margin to keep chondrocyte panel.","fix_payload":null,"attempt_id":"cae89793d0f9470a9c7f38894928f304","ts":"2026-04-30T15:24:19+00:00"}
{"schema_version":1,"stage":"annotate","key":"panel","value":"celltype_broad","default":null,"source":"auto","rationale":"orchestrator-injected panel 'celltype_broad' (auto-pick or self-check fix)","fix_payload":null,"attempt_id":"cae89793d0f9470a9c7f38894928f304","ts":"2026-04-30T15:24:49+00:00"}

Every choice the pipeline made, with a one-sentence rationale, in a file you can grep. source is one of auto (a built-in heuristic), user (a CLI override), or ai (an LLM call). fix_payload is non-null only on self-check *.suggest rows — it carries the structured fix the orchestrator can mechanically apply when --auto-fix is on. attempt_id groups rows by invocation. The full schema is in skills/scellrun/SKILL.md.

Profiles

A profile is community-encoded working practice for a tissue domain — defaults plus marker panels — in one Python file. v1.0 ships two:

default — fresh-tissue 10x v3 chemistry, joint-tissue-aware mt% ceiling at 20% (the textbook 10% silently drops real chondrocytes; the OARSI working group ceiling is 20%).
joint-disease — same QC plus tighter hb% (cartilage is avascular), the Fan 2024 chondrocyte 11-subtype panel, and a 15-group celltype_broad panel. Auto-swaps from chondrocyte to broad when the data is immune-rich (subchondral bone, infiltrated synovium, joint fluid) so the report doesn't blindly mis-label pericytes / plasmacytoid DCs / osteoclasts as chondrocyte subtypes.

scellrun profiles list
scellrun profiles show joint-disease   # prints thresholds + panels

If your tissue or disease has working practice that diverges from the defaults, contribute a profile — one Python file under src/scellrun/profiles/.

Roadmap

v0.1 → v1.0.1 has shipped: per-stage QC / integrate / markers / annotate, the analyze one-shot, the decision log, self-check + --auto-fix, the joint-disease profile, panel auto-pick, single-sample auto-downgrade, agent demo, Dockerfile, and the v1.0.1 SKILL.md sync. The CLI surface is frozen for the v1.x series; new stages and profiles land additively. Post-v1.0 directions tracked in ROADMAP.md: conda-forge feedstock, registry-pushed Docker image, bulk RNA-seq subcommand, metabolomics composite scoring, proteomics integration.

License

MIT — see LICENSE.

Acknowledgements

Defaults trace to the in-house R AIO pipeline (Liu lab) and clinician-bioinformatics working practice for OARSI / MSK research. Built with assistance from Claude (Anthropic).

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.3.1

May 7, 2026

1.3.0

May 2, 2026

1.2.0

May 1, 2026

This version

1.1.2

May 1, 2026

1.1.1

May 1, 2026

1.1.0

Apr 30, 2026

1.0.2

Apr 30, 2026

1.0.1

Apr 30, 2026

1.0.0

Apr 30, 2026

0.9.1

Apr 30, 2026

0.9.0

Apr 30, 2026

0.8.0

Apr 30, 2026

0.7.0

Apr 30, 2026

0.6.0

Apr 30, 2026

0.5.0

Apr 30, 2026

0.4.0

Apr 30, 2026

0.3.0

Apr 30, 2026

0.2.0

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scellrun-1.1.2.tar.gz (165.4 kB view details)

Uploaded May 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scellrun-1.1.2-py3-none-any.whl (98.7 kB view details)

Uploaded May 1, 2026 Python 3

File details

Details for the file scellrun-1.1.2.tar.gz.

File metadata

Download URL: scellrun-1.1.2.tar.gz
Upload date: May 1, 2026
Size: 165.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for scellrun-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`3867e6bf3d5013af2d0f563a3fc7e912ea74e60e47b885fe3efd78904912bc24`
MD5	`c20d58dca872c94293ae433e4b772490`
BLAKE2b-256	`b5add3b3c05315c5105f7feded6ba3daf7e01e1b5df16d150afea63919695511`

See more details on using hashes here.

File details

Details for the file scellrun-1.1.2-py3-none-any.whl.

File metadata

Download URL: scellrun-1.1.2-py3-none-any.whl
Upload date: May 1, 2026
Size: 98.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for scellrun-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7347991e37b9f7c18997fe476574a74ebfb151c7a78fbfa6d43305d84cbeeff8`
MD5	`20ae418e69c9ea82ae0c22b989752e74`
BLAKE2b-256	`c51d26c05e683150fb5de3f3de0b10c44dfef97e6403d72ff794a3b1d8009c53`

See more details on using hashes here.

scellrun 1.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

scellrun

Why this exists

Who this is for

Quick start

How an agent uses this

What's in the decision log

Profiles

Roadmap

License

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes