RNA tumor analysis driven by pirlygenes gene sets. Migrates the pirlygenes analyze pipeline into composable sub-commands with serializable intermediate state for incremental website-style runs.
Project description
trufflepig
RNA tumor analysis driven by
pirlygenesgene sets.
What this is
trufflepig is the analysis, plotting, reporting, and CLI layer for RNA
tumor analysis. It loads curated gene sets and reference expression data
from the pirlygenes package,
which is now data-only.
The legacy pirlygenes analyze CLI has been fully migrated into
this repo as trufflepig run. Multi-sample longitudinal comparison
(pirlygenes compare-analyze) is trufflepig compare. Per-stage
extraction of the analyze pipeline (so a web UI can stream incremental
results) is the next track.
Install
pip install -e .
Pulls pirlygenes>=5.0.0 for the curated gene sets and reference data.
Usage
Single-sample analysis
trufflepig run \
--sample path/to/quant.sf \
--workspace out/patient_X_baseline \
--cancer-type BLCA
Output layout:
out/patient_X_baseline/
meta.json # trufflepig run metadata (versions + args)
analyze/ # full analyze output: figures, markdown reports, TSVs
records/ # (created, currently empty) — reserved for per-stage
# records once Phase 2 extraction lands
figures/ # (created, currently empty) — reserved for the
# stage-level figure layout
Today, every analyze artifact (markdown, figures, TSVs, the bundled
PDF) lives under analyze/. The empty sibling directories are the
seam for per-stage extraction (trufflepig#2–#14); once stages start
writing their own records, analyze/ shrinks.
Common pass-through flags: --hla-types, --fusions, --alterations,
--alignment-qc, --sample-mode, --tumor-context, --site-hint,
--met-site, --decomposition-templates, --output-image-prefix,
--sample-id-col, --sample-id-value, --gene-id-col, --gene-name-col,
--label-genes, --genes, --transcripts,
--aggregate-gene-expression, --expression-qc-rescue,
--therapy-target-top-k, --therapy-target-tpm-threshold, --force.
All have the same meaning as in the old pirlygenes analyze.
Multi-sample (longitudinal)
trufflepig compare \
--workspace out/patient_X_longitudinal \
--inputs out/patient_X_baseline,out/patient_X_relapse \
--title "Patient X — baseline vs relapse"
--inputs accepts both trufflepig workspaces (auto-descends to
analyze/) and legacy pirlygenes output directories.
Reference / cohort introspection
trufflepig data # list bundled gene-set CSVs and TCGA cohorts
trufflepig cancers # browse the cancer-type registry
trufflepig cancers --family sarcoma --details
trufflepig plot-cancer-cohorts --output-prefix /tmp/cohort
Web UI
pip install 'pirl-trufflepig[web]'
trufflepig serve --port 8000
# open http://127.0.0.1:8000
Upload a TPM file or salmon quant in the browser, watch each pipeline
stage stream back, and read the rendered summary.md / analysis.md /
brief.md inline. Comparison runs work the same way — pick prior runs
by ID. Each run writes a self-contained workspace under
$TRUFFLEPIG_WEB_ROOT (default $HOME/trufflepig-web-runs).
Pipeline DAG
trufflepig list-stages
The DAG is the post-migration target for trufflepig stage <name>. The
top-level trufflepig run already runs the full pipeline; stage-level
execution is wired in as stages are extracted from the migrated
codebase.
Layout
trufflepig/
cli.py # argparse entry exposed as the `trufflepig` console script
main.py # migrated analyze/compare_analyze + report assembly
workspace.py # workspace layout (meta.json + records/ + figures/)
pipeline.py # stage DAG (name -> upstream dependencies)
analyze/ # data contracts shared with the migrated pipeline
decomposition/ # compartment-fit engine + panels + plot helpers
stages/ # one module per stage (post-extraction)
load_expression.py, sample_context.py, tumor_purity.py,
decomposition/, plot*.py, brief.py, confidence.py, ... # the analysis code
Roadmap
Phase 1 — Subsume pirlygenes analyze ✅
- Wire
trufflepig runas a thin bridge topirlygenes.cli.analyze(trufflepig#19) - Wire
trufflepig compareas a thin bridge topirlygenes.cli.compare_analyze - Mass-move analysis modules from pirlygenes to trufflepig (trufflepig#1). pirlygenes now ships data only.
- Native
trufflepig run/trufflepig comparedispatch — no bridge
Phase 2 — Per-stage extraction
Break the migrated analyze function into the stage DAG so a web UI
can run and stream single stages:
-
load_expression— parse sample TPM TSV/CSV into a canonical frame (#2) -
sample_context— infer library prep, preservation, degradation (#3) -
analyze— cancer-type call + purity (#4) -
decompose— compartment-level decomposition fit (#5) -
ranges— per-target tumor-expression ranges + attribution (#6) -
confidence— purity + per-target confidence tiers (#7) -
render_targets,render_summary,render_analysis,render_provenance,render_brief(#8–#12) -
bundle— figures into PDF + finalizemeta.json(#13) - Per-stage record schema documentation (#14)
Phase 3 — Multi-sample / longitudinal
trufflepig compare runs today; the richer layer:
- Explicit delta tables — cancer-call shifts, purity drift, target gains/losses, MHC/HLA changes, immune / IFN / hypoxia / EMT / therapy-response axis movement, assay/library differences that limit comparability (extension of pirlygenes#230)
- Cohort-level comparisons (browse N samples with the same cancer type; surface outlier targets)
- Patient-level provenance graph linking baseline → progression samples
Phase 4 — Web UI
A single-page web frontend so a user can drop in a TPM or salmon quant, watch each stage stream back, and download the rendered markdown / PDF.
- FastAPI app + browser UI (
trufflepig serve) with file upload, background analyze, server-sent-events progress stream, inline rendered reports, and longitudinal comparison launcher (#16) - Streaming progress + per-stage output hooks (SSE stream of analyze stdout) (#15)
- Reference-data layout for lazy-load from R2/S3 with browser cache (#18)
- Pyensembl-free gene resolution (HGNC CSV dict lookup) for fast cold-start in serverless / browser contexts (#17)
- Auth + workspace persistence so a user can return to a prior run
- Production deploy target (serverless) replacing the local subprocess runner with a remote-job submission
Non-goals
- No JSON mirror of the markdown reports — the rendered markdown has named human audiences; a JSON mirror would have no real consumer.
- No change to the gene-set data in
pirlygenes.
Local-report regeneration
Researcher workflow: replay a private manifest of analyses on local samples and write outputs outside the repo:
python scripts/regenerate_local_reports.py \
--source /path/to/pirlygenes/local_reports/<run>/manifest.json \
--root ~/trufflepig-local-reports/<stamp>
The script refuses to write inside the repo. The default --root is
$HOME/trufflepig-local-reports/<timestamp>/.
License
Apache 2.0 — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pirl_trufflepig-1.3.0.tar.gz.
File metadata
- Download URL: pirl_trufflepig-1.3.0.tar.gz
- Upload date:
- Size: 668.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8dce3974ddeaff035eaaab807ec7d8e548bfa0adeb0a0b35df0f5567cf18c0ed
|
|
| MD5 |
797887c44df9af541a29a493b9165729
|
|
| BLAKE2b-256 |
fd53d7490452dc7cca37ac7d73774436ce8101e14c58d86367a64b58f7ea0015
|
File details
Details for the file pirl_trufflepig-1.3.0-py3-none-any.whl.
File metadata
- Download URL: pirl_trufflepig-1.3.0-py3-none-any.whl
- Upload date:
- Size: 512.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f33f294fffc6097d7569c41e6027a037ee721356c92676b3e61bbaf77c3b7cb
|
|
| MD5 |
8bddb5d1b3eb88bbd3188dedabd713e1
|
|
| BLAKE2b-256 |
e26c24b418142a2cd251818e4f562d6a4f6ece461fa6e9bdfab58cb8fcb1dd36
|