Skip to main content

Opinionated, report-first CLI for single-cell and multi-omics analysis. Sane defaults baked in, defensible deliverables out.

Project description

scellrun

CI License: MIT PyPI

A scRNA-seq CLI with a decision log so two analysts — or two LLM agents — on the same data give the same answer.

中文版 README


The 30-second pitch

Two analysts on the same scRNA dataset reanalyze it and the cluster count differs by ~20% (Lähnemann et al., 2020 / PMC9122178). The reason is rarely the science — it's the dozens of tiny choices nobody writes down: mt% ceiling, HVG count, integration method, clustering resolution, panel pick. Six months later nobody remembers which knob got tweaked.

scellrun is what you put on top of scanpy so every one of those choices ends up in a single grep-able file with a reason next to it. Same data → same defaults → same output, every time. If a reviewer asks "why mt% 20?" you read line 14 of 00_decisions.jsonl to them.

It's not a workflow manager (use nf-core for cluster orchestration), and it doesn't replace scanpy — it calls scanpy under the hood with opinionated parameters and an audit trail.

Quick start

conda create -n scellrun python=3.11 -y
conda activate scellrun
pip install scellrun

scellrun analyze data.h5ad --tissue "OA cartilage"
# → ./scellrun_out/run-<ts>/05_report/index.html

Got cellranger output instead of an .h5ad?

scellrun scrna convert path/to/cellranger_outs -o data.h5ad
scellrun analyze data.h5ad --tissue "OA cartilage"

Add --lang zh for a Chinese report. Add --profile joint-disease for cartilage / synovium / subchondral bone. Walkthrough in docs/quickstart.md.

What you actually get

  • Five-stage one-shot: QC → integrate (Harmony) → markers → annotate → report. One command, one HTML you can email.
  • Decision log at <run>/00_decisions.jsonl — every non-trivial choice with a one-sentence reason. Greppable; auto/user/ai source labels.
  • Five tissue profilesdefault, joint-disease (Fan 2024 chondrocyte panel), tumor, brain, kidney. One Python file each, contribute yours.
  • Self-check — each stage detects pathologies (panel mismatch, all-fragmented clusters, single-sample-no-batch) and proposes the cheapest fix. --auto-fix applies it.
  • Reviewer loopscellrun review <run> runs a tiny local Flask app for cluster relabels, threshold tweaks, and notes; analyze --apply-overrides <json> re-runs with the human's edits as source="user" rows.
  • PDF exportscellrun export <run> --format pdf for publication.

Who this is for

  • The LLM agent (Claude Code, Hermes, Codex, OpenClaw) handling a clinician's request. This is the primary user. Drop skills/scellrun/SKILL.md into your agent's skill directory; the agent then knows which command maps to which user intent, how to read the decision log, when to surface a self-check trigger.
  • The clinician-bioinformatics team that wants every project to look the same. Same QC layout, same decision table, same provenance trail across samples, students, rotations.
  • The reviewer asking "why mt% 20?" Open 00_decisions.jsonl; the answer is on line 14, verbatim.

docs/agent-demo.md is a verbatim Claude Code transcript running scellrun end-to-end on real OA cartilage data, including the agent quoting the decision log when the user asks "why res=0.3?".

What the decision log looks like

{"stage":"qc","key":"max_pct_mt","value":20.0,"default":20.0,"source":"auto",
 "rationale":"mt% ceiling 20% — joint tissue is stress-prone, the textbook 10% silently drops real chondrocytes (PI cohort 2024-2026, AIO PM=20)"}
{"stage":"analyze","key":"method_downgrade","value":"none","default":"harmony","source":"auto",
 "rationale":"no sample/batch column in obs — single-sample input; auto-downgraded --method from harmony to none"}
{"stage":"analyze","key":"chosen_resolution_for_annotate","value":0.3,"source":"auto",
 "rationale":"fewest singletons → most balanced (every resolution fragmented) — picked res=0.3: n_clusters=13, largest=31.5%, smallest=0.2%, singletons=2"}
{"stage":"analyze","key":"annotate.auto_panel","value":"celltype_broad","source":"auto",
 "rationale":"swapped to celltype_broad: chondrocyte_hits=2, broad_hits=9; required >=1.5x margin to keep chondrocyte panel"}

source is auto (built-in heuristic), user (a CLI / review override), or ai (an LLM call). attempt_id groups rows by invocation; fix_payload carries the structured fix on self-check *.suggest rows. Full schema in skills/scellrun/SKILL.md.

v1.3.2 onwards the chosen_resolution_for_annotate rationale, panel auto-pick reasoning, and self-check triggers all surface in the HTML report's "At a glance" section so a reader doesn't have to grep the jsonl to learn why a particular resolution / panel got picked.

Profiles

A profile is community-encoded working practice for a tissue domain — defaults + marker panels in one Python file.

profile mt% hb% panels notes
default 20% fresh-tissue 10x v3 baseline (OARSI ceiling)
joint-disease 20% tight Fan 2024 11-subtype chondrocyte + 15-group broad cold-validated; auto-swaps to broad on immune-rich data
tumor 20% TISCH/Sun 2021 pan-cancer TME (broad only) not yet cold-validated
brain 10% Tasic/Hodge cortical-hippocampal (broad only) not yet cold-validated
kidney 15% KPMP/Stewart 2019 nephron + immune (broad only) not yet cold-validated
scellrun profiles list
scellrun profiles show joint-disease   # thresholds + panels

Tissue or disease working practice diverging from the defaults? Contribute a profile — one Python file under src/scellrun/profiles/. See docs/contributing.md.

Status

v1.3 frozen surface — scRNA only. v1.x is now in maintenance mode. Bug fixes and additive scRNA profiles only; no new public commands. Bulk RNA-seq, metabolomics, and proteomics extensions are deferred to a future v2.0; see ROADMAP.md.

CLI surface (qc / integrate / markers / annotate / analyze / review / export / profiles) is locked for the v1.x series.

Distribution

License

MIT — see LICENSE.

Acknowledgements

Defaults trace back to the Liu-lab in-house R AIO pipeline and clinician-bioinformatics working practice for OARSI / musculoskeletal research. The Fan 2024 chondrocyte panel ships under the joint-disease profile.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scellrun-1.3.3.tar.gz (207.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scellrun-1.3.3-py3-none-any.whl (123.5 kB view details)

Uploaded Python 3

File details

Details for the file scellrun-1.3.3.tar.gz.

File metadata

  • Download URL: scellrun-1.3.3.tar.gz
  • Upload date:
  • Size: 207.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for scellrun-1.3.3.tar.gz
Algorithm Hash digest
SHA256 23b872c73b4a5cce48537d78d9695ef9ef5af9155d6c2e43d1704ce74897aa72
MD5 7b531ee79c48a1de8621b46ebc51a4fb
BLAKE2b-256 d2ea8db5cd0fda91f7c0dc52f4b2225552d382008b4fc63a2eec93b3da2a9b2c

See more details on using hashes here.

File details

Details for the file scellrun-1.3.3-py3-none-any.whl.

File metadata

  • Download URL: scellrun-1.3.3-py3-none-any.whl
  • Upload date:
  • Size: 123.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for scellrun-1.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8363d75e177edcd8ffc4091e5467b60b8fc4ffc54af851f50626c8292b867c48
MD5 849aa8a790c7de8a2544513c7b40be0f
BLAKE2b-256 dfdc9823a7332e489e28def8484bdba19665f7543a4eb06bfdab84903c4790a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page