RNA tumor analysis driven by pirlygenes gene sets. Migrates the pirlygenes analyze pipeline into composable sub-commands with serializable intermediate state for incremental website-style runs.

These details have not been verified by PyPI

Project description

trufflepig

RNA tumor analysis driven by pirlygenes gene sets.

What this is

trufflepig is the analysis, plotting, reporting, and CLI layer for RNA tumor analysis. It loads curated gene sets and reference expression data from the pirlygenes package, which is now data-only.

The legacy pirlygenes analyze CLI has been fully migrated into this repo as trufflepig run. Multi-sample longitudinal comparison (pirlygenes compare-analyze) is trufflepig compare. Per-stage extraction of the analyze pipeline (so a web UI can stream incremental results) is the next track.

Install

pip install -e .

Pulls pirlygenes>=5.0.0 for the curated gene sets and reference data.

Usage

Single-sample analysis

trufflepig run \
    --sample path/to/quant.sf \
    --workspace out/patient_X_baseline \
    --cancer-type BLCA

Output layout:

out/patient_X_baseline/
  meta.json            # trufflepig run metadata (versions + args)
  analyze/             # full analyze output: figures, markdown reports, TSVs
  records/             # (created, currently empty) — reserved for per-stage
                       #   records once Phase 2 extraction lands
  figures/             # (created, currently empty) — reserved for the
                       #   stage-level figure layout

Today, every analyze artifact (markdown, figures, TSVs, the bundled PDF) lives under analyze/. The empty sibling directories are the seam for per-stage extraction (trufflepig#2–#14); once stages start writing their own records, analyze/ shrinks.

Analysis stages

The reports follow these named stages:

Expression QC loads the input expression file, maps gene identifiers, checks TPM scale, and removes technical RNA from clean TPM used downstream. Outputs: clean expression table and QC warnings.
RNA Prep and Preservation infers library prep, preservation, degradation, and assay caveats that affect confidence and expression interpretation. Outputs: prep/preservation calls, degradation flags, and widened uncertainty when needed.
Tissue Composition Screen compares the sample with normal tissues and cancer-expression references before the cancer-type call; it also adds tumor-evidence signals such as proliferation, CTA/oncofetal markers, and tumor-up markers. Outputs: healthy/tumor hint, top normal matches, and top cancer-reference matches.
Cancer-Type Evidence combines expression-reference matching, rare-marker/fusion evidence, exact local references, and registry relationships into one cancer-type call. Outputs: inferred cancer type, expression reference used for cohort math, and alternate hypotheses.
Tumor Purity and Coarse Composition estimates tumor fraction and broad non-tumor compartments such as immune, stromal, epithelial matched normal, and other background components. Outputs: purity interval and fitted compartment fractions.
Subtype and Background Refinements refines the coarse composition with activated background states such as CAF/TAM/Treg/MDSC and matched-normal compartments. Outputs: immune/stromal infiltration, subtype/background adjustments, and matched-normal splits used before target ranking.
Tumor-Attributed Expression subtracts fitted non-tumor signal and estimates how much observed expression is likely tumor-cell derived. Outputs: tumor-source TPM ranges, attribution flags, and confidence tiers.
Therapy Prioritization ranks actionable targets and pathway states using tumor-attributed expression, indication curation, antigen-presentation status, immune/background attribution, and pathway/treatment-state signals. Outputs: therapy shortlist, target tables, pathway/treatment-state evidence, and caveats.

Common pass-through flags: --hla-types, --fusions, --alterations, --alignment-qc, --sample-mode, --tumor-context, --site-hint, --met-site, --decomposition-templates, --output-image-prefix, --sample-id-col, --sample-id-value, --gene-id-col, --gene-name-col, --label-genes, --genes, --transcripts, --aggregate-gene-expression, --expression-qc-rescue, --therapy-target-top-k, --therapy-target-tpm-threshold, --force. All have the same meaning as in the old pirlygenes analyze.

Multi-sample (longitudinal)

trufflepig compare \
    --workspace out/patient_X_longitudinal \
    --inputs out/patient_X_baseline,out/patient_X_relapse \
    --title "Patient X — baseline vs relapse"

--inputs accepts both trufflepig workspaces (auto-descends to analyze/) and legacy pirlygenes output directories.

Reference / cohort introspection

trufflepig data            # list bundled gene-set CSVs and TCGA cohorts
trufflepig cancers         # browse the cancer-type registry
trufflepig cancers --family sarcoma --details
trufflepig plot-cancer-cohorts --output-prefix /tmp/cohort

Expression references use one contract internally:

All analysis references are clean TPM. Raw TPM is only used in the early expression-QC stage.
Direct references keep their gene key explicit: pirlygenes observed cohorts and pan-cancer references are keyed by Ensembl ID + symbol; trufflepig subtype-deconvolved references are symbol-only because the source deconvolution artifacts are symbol-level.
Cancer-type context distinguishes the cancer label from the expression reference. If a registry code has no exact expression cohort, trufflepig records the compatible parent, curated, or family fallback used for cohort math.
The registry-completeness tests require every cancer type to have an effective expression reference and verify the normalization/gene-key contract for those references.
Cancer types without a direct expression cohort also have a compact literature-backed RNA signature tied to that related reference context. These signatures can add marker evidence, but they are not treated as replacement expression cohorts.
Every registry tumor type is placed in a small ontology record with its parent/family, effective expression reference, expected high RNA markers, and expected low contrast markers. Reports use those markers as a sanity check on the inferred cancer type; expected-low genes are review prompts, not standalone exclusions, because high values can come from immune, stromal, or mixed-lineage background.

Web UI

pip install 'pirl-trufflepig[web]'
trufflepig serve --port 8000
# open http://127.0.0.1:8000

Upload a TPM file or salmon quant in the browser, watch each pipeline stage stream back, and read the rendered summary.md / analysis.md / brief.md inline. Comparison runs work the same way — pick prior runs by ID. Each run writes a self-contained workspace under $TRUFFLEPIG_WEB_ROOT (default $HOME/trufflepig-web-runs).

Pipeline DAG

trufflepig list-stages

The DAG is the post-migration target for trufflepig stage <name>. The top-level trufflepig run already runs the full pipeline; stage-level execution is wired in as stages are extracted from the migrated codebase.

Layout

trufflepig/
  cli.py            # argparse entry exposed as the `trufflepig` console script
  main.py           # migrated analyze/compare_analyze + report assembly
  workspace.py      # workspace layout (meta.json + records/ + figures/)
  pipeline.py       # stage DAG (name -> upstream dependencies)
  analyze/          # data contracts shared with the migrated pipeline
  decomposition/    # compartment-fit engine + panels + plot helpers
  stages/           # one module per stage (post-extraction)
  load_expression.py, sample_context.py, tumor_purity.py,
  decomposition/, plot*.py, brief.py, confidence.py, ...   # the analysis code

Roadmap

Phase 1 — Subsume pirlygenes analyze ✅

Wire trufflepig run as a thin bridge to pirlygenes.cli.analyze (trufflepig#19)
Wire trufflepig compare as a thin bridge to pirlygenes.cli.compare_analyze
Mass-move analysis modules from pirlygenes to trufflepig (trufflepig#1). pirlygenes now ships data only.
Native trufflepig run / trufflepig compare dispatch — no bridge

Phase 2 — Per-stage extraction

Break the migrated analyze function into the stage DAG so a web UI can run and stream single stages:

load_expression — parse sample TPM TSV/CSV into a canonical frame (#2)
sample_context — infer library prep, preservation, degradation (#3)
analyze — cancer-type call + purity (#4)
decompose — compartment-level decomposition fit (#5)
ranges — per-target tumor-expression ranges + attribution (#6)
confidence — purity + per-target confidence tiers (#7)
render_targets, render_summary, render_analysis, render_provenance, render_brief (#8–#12)
bundle — figures into PDF + finalize meta.json (#13)
Per-stage record schema documentation (#14)

Phase 3 — Multi-sample / longitudinal

trufflepig compare runs today; the richer layer:

Explicit delta tables — cancer-call shifts, purity drift, target gains/losses, MHC/HLA changes, immune / IFN / hypoxia / EMT / therapy-response axis movement, assay/library differences that limit comparability (extension of pirlygenes#230)
Cohort-level comparisons (browse N samples with the same cancer type; surface outlier targets)
Patient-level provenance graph linking baseline → progression samples

Phase 4 — Web UI

A single-page web frontend so a user can drop in a TPM or salmon quant, watch each stage stream back, and download the rendered markdown / PDF.

FastAPI app + browser UI (trufflepig serve) with file upload, background analyze, server-sent-events progress stream, inline rendered reports, and longitudinal comparison launcher (#16)
Streaming progress + per-stage output hooks (SSE stream of analyze stdout) (#15)
Reference-data layout for lazy-load from R2/S3 with browser cache (#18)
Pyensembl-free gene resolution (HGNC CSV dict lookup) for fast cold-start in serverless / browser contexts (#17)
Auth + workspace persistence so a user can return to a prior run
Production deploy target (serverless) replacing the local subprocess runner with a remote-job submission

Non-goals

No JSON mirror of the markdown reports — the rendered markdown has named human audiences; a JSON mirror would have no real consumer.
No change to the gene-set data in pirlygenes.

Local-report regeneration

Researcher workflow: replay a private manifest of analyses on local samples and write outputs outside the repo:

python scripts/regenerate_local_reports.py \
    --source /path/to/pirlygenes/local_reports/<run>/manifest.json \
    --root ~/trufflepig-local-reports/<stamp>

The script refuses to write inside the repo. The default --root is $HOME/trufflepig-local-reports/<timestamp>/.

License

Apache 2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.15.0

Jul 1, 2026

1.14.2

Jun 23, 2026

1.14.1

Jun 22, 2026

1.14.0

Jun 22, 2026

1.12.0

Jun 12, 2026

1.11.4

Jun 11, 2026

1.11.3

Jun 10, 2026

1.11.2

Jun 10, 2026

1.11.1

Jun 10, 2026

1.11.0

Jun 9, 2026

1.10.2

Jun 9, 2026

1.10.1

Jun 9, 2026

1.3.0

May 15, 2026

1.2.0

May 14, 2026

1.1.0

May 14, 2026

1.0.0

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pirl_trufflepig-1.15.0.tar.gz (21.8 MB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pirl_trufflepig-1.15.0-py3-none-any.whl (21.6 MB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file pirl_trufflepig-1.15.0.tar.gz.

File metadata

Download URL: pirl_trufflepig-1.15.0.tar.gz
Upload date: Jul 1, 2026
Size: 21.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for pirl_trufflepig-1.15.0.tar.gz
Algorithm	Hash digest
SHA256	`7944a215a018b7ef32b859ed80e3d0f95bd57566374388816dc281c94bfc8b1a`
MD5	`1e38f101595301933fe751828f505afd`
BLAKE2b-256	`6708334995d213c35026b13c6ae4caa85ad38b3aa44b549e716dd1d9597defa8`

See more details on using hashes here.

File details

Details for the file pirl_trufflepig-1.15.0-py3-none-any.whl.

File metadata

Download URL: pirl_trufflepig-1.15.0-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 21.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for pirl_trufflepig-1.15.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b552b7f057f3414c56e4c707dac6b973d8a9929dbdb8251fd71de400261956b2`
MD5	`b8738e94ad5c1ad84615c57ea48ba831`
BLAKE2b-256	`0e4f9b37172c2c5f9018d878b57c8b8c888979b659234aba525790095209244e`

See more details on using hashes here.

pirl-trufflepig 1.15.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

trufflepig

What this is

Install

Usage

Single-sample analysis

Analysis stages

Multi-sample (longitudinal)

Reference / cohort introspection

Web UI

Pipeline DAG

Layout

Roadmap

Phase 1 — Subsume pirlygenes analyze ✅

Phase 2 — Per-stage extraction

Phase 3 — Multi-sample / longitudinal

Phase 4 — Web UI

Non-goals

Local-report regeneration

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes