Transform enrichment outputs into verifiable, auditable pathway claims with calibrated abstention.

These details have not been verified by PyPI

Project description

LLM-PathwayCurator

LLM-PathwayCurator Enrichment interpretations → audited, decision-grade pathway claims.

Docs: https://llm-pathway-curator.readthedocs.io/
Paper reproducibility (canonical): paper/ (see paper/README.md; panel map in paper/FIGURE_MAP.csv)

🚀 What this is

LLM-PathwayCurator is an interpretation QA layer for enrichment analysis (EA).
It does not introduce a new enrichment statistic. Instead, it turns EA outputs into auditable decision objects:

Input: enrichment term lists (ORA, fgsea/GSEA, etc.)
Output: typed, evidence-linked claims + PASS/ABSTAIN/FAIL decisions + reason-coded audit logs
Promise: we abstain when claims are unstable, under-supported, contradictory, or context-nonspecific

Selective prediction for pathway interpretation: calibrated abstention is a feature, not a failure.

LLM-PathwayCurator workflow: EvidenceTable → modules → claims → audits

Fig. 1a. Overview of LLM-PathwayCurator (bioRxiv preprint)

🧭 Why this is different (and why it matters)

Enrichment tools return ranked term lists. In practice, interpretation breaks because:

Representative terms are ambiguous under study context
Gene support is opaque, enabling cherry-picking
Related terms share / bridge evidence in non-obvious ways
There is no mechanical stop condition for fragile narratives

LLM-PathwayCurator replaces narrative endorsement with audit-gated decisions.
We transform ranked terms into machine-auditable claims by enforcing:

Evidence-linked constraints: claims must resolve to valid term/module identifiers and supporting-gene evidence
Stability audits: supporting-gene perturbations yield stability proxies (operating point: τ)
Context validity stress tests: context swap reveals context dependence without external knowledge
Contradiction checks: internally inconsistent claims fail mechanically
Reason-coded outcomes: every decision is explainable by a finite audit code set

🔍 What this is not

Not an enrichment method; it audits enrichment outputs.
Not a free-text summarizer; claims are schema-bounded (typed JSON; no narrative prose as “evidence”).
Not a biological truth oracle; it checks internal consistency and evidence integrity, not mechanistic truth.

🧩 Core pipeline (A → B → C)

A) Stability distillation (evidence hygiene)
Perturb supporting genes (seeded) to compute stability proxies (e.g., LOO/jackknife-like survival scores).
Output: distilled.tsv

B) Evidence factorization (modules)
Factorize the term–gene bipartite graph into evidence modules that preserve shared vs distinct support.
Outputs: modules.tsv, term_modules.tsv, term_gene_edges.tsv

C) Claims → audit → report

C1 (proposal-only): deterministic baseline or optional LLM proposes typed claims with resolvable evidence links
C2 (audit/decider): mechanical rules assign PASS/ABSTAIN/FAIL with precedence (FAIL > ABSTAIN > PASS)
C3 (report): decision-grade report + audit log + provenance

⚡ Quick start (library entrypoint)

llm-pathway-curator run \
  --sample-card examples/demo/sample_card.json \
  --evidence-table examples/demo/evidence_table.tsv \
  --out out/demo/

Key outputs (stable contract)

audit_log.tsv — PASS/ABSTAIN/FAIL + reason codes (mechanical)
report.jsonl, report.md — decision objects (evidence-linked)
claims.proposed.tsv — proposed candidates (proposal-only; auditable)
distilled.tsv — stability proxies / evidence hygiene outputs
modules.tsv, term_modules.tsv, term_gene_edges.tsv — evidence structure
run_meta.json (+ optional manifest.json) — pinned params + provenance

📊 Rank & visualize ranked terms (`rank` / `plot-ranked`)

LLM-PathwayCurator includes two small post-processing commands for ranking and publication-ready visualization of ranked terms/modules:

llm-pathway-curator rank — produces a ranked table (claims_ranked.tsv) for downstream plots and summaries.
llm-pathway-curator plot-ranked — renders ranked terms/modules as either:
- bars (Metascape-like horizontal bars), or
- packed circles (module-level circle packing with term circles inside).

A) Rank (produce `claims_ranked.tsv`)

Use rank to generate a deterministic ranked table from a run output directory.

llm-pathway-curator rank --help
# Typical workflow: point rank to a run directory and write claims_ranked.tsv
# (See --help for the exact flags supported by your installed version.)

B) Plot (bars or packed circles)

plot-ranked auto-detects claims_ranked.tsv (recommended) or falls back to audit_log.tsv under --run-dir.

Packed circles require an extra dependency: python -m pip install circlify

Bars (Metascape-like)

llm-pathway-curator plot-ranked \
  --mode bars \
  --run-dir out/demo \
  --out-png out/demo/plots/ranked_bars.png \
  --decision PASS \
  --group-by-module \
  --left-strip \
  --strip-labels \
  --bar-color-mode module

Packed circles (modules → terms)

llm-pathway-curator plot-ranked \
  --mode packed \
  --run-dir out/demo \
  --out-png out/demo/plots/ranked_packed.png \
  --decision PASS \
  --term-color-mode module

Packed circles (direction shading)

llm-pathway-curator plot-ranked \
  --mode packed \
  --run-dir out/demo \
  --out-png out/demo/plots/ranked_packed.direction.png \
  --decision PASS \
  --term-color-mode direction

Consistent module labels/colors across plots

plot-ranked assigns a single module display rank (M01, M02, ...) and a stable module color per module_id, so bars and packed circles can be placed side-by-side without label/color drift.

⚖️ Inputs (contracts)

EvidenceTable (minimum required columns)

Each row is one enriched term.

Required columns:

term_id, term_name, source
stat, qval, direction
evidence_genes (supporting genes; TSV uses ; join)

Sample Card (study context)

Structured context record used for proposal and context gating, e.g.:

condition/disease, tissue, perturbation, comparison

Adapters for common tools live under src/llm_pathway_curator/adapters/.

🔧 Adapters (Input → EvidenceTable)

Adapters are intentionally conservative:

preserve evidence identity (term × genes)
avoid destructive parsing
keep TSV round-trips stable (contract drift is treated as a bug)

See: src/llm_pathway_curator/adapters/README.md

🛡️ Decisions: PASS / ABSTAIN / FAIL

LLM-PathwayCurator assigns decisions by mechanical audit gates:

FAIL: auditable violations (evidence-link drift, schema violations, contradictions, forbidden fields, etc.)
ABSTAIN: non-specific, under-supported, or unstable under perturbations / stress tests
PASS: survives all enabled gates at the chosen operating point (τ)

Important: the LLM (if enabled) never decides acceptance. It may propose candidates; the audit suite is the decider.

🧪 Built-in stress tests (counterfactuals without external knowledge)

Context swap: shuffle study context (e.g., BRCA → LUAD) to test context dependence
Evidence dropout: randomly remove supporting genes (seeded; min_keep enforced)
Contradiction injection (optional): introduce internally contradictory candidates to test FAIL gates

These are specification-driven perturbations intended to validate that the pipeline abstains for the right reasons, with stress-specific reason codes.

♻️ Reproducibility by default

LLM-PathwayCurator is deterministic by default:

fixed seeds (CLI + library defaults)
pinned parsing + hashing utilities
stable output schemas and reason codes
run metadata persisted to run_meta.json (and runner-level manifest.json when used)

Paper-side runners (e.g., paper/scripts/run_fig2_pipeline.py) orchestrate reproducible sweeps and do not implement scientific logic; they call the library entrypoint (llm_pathway_curator.pipeline.run_pipeline).

📦 Installation

Option A: PyPI (recommended)

pip install llm-pathway-curator

Option B: From source (development)

git clone https://github.com/<ORG>/LLM-PathwayCurator.git
cd LLM-PathwayCurator
pip install -e .

🐳 Docker (recommended for reproducibility)

docker compose -f docker/docker-compose.yml up -d
docker compose -f docker/docker-compose.yml logs -f --tail=50 llm-pathway-curator

(If you publish GHCR images)

docker pull ghcr.io/<ORG>/llm-pathway-curator:<TAG>

🤖 LLM usage (proposal-only; optional)

If enabled, the LLM is confined to proposal steps and must emit schema-bounded JSON with resolvable EvidenceTable links.

Backends (example):

OpenAI: OPENAI_API_KEY
Gemini: GEMINI_API_KEY
Ollama: LLMPATH_OLLAMA_HOST, LLMPATH_OLLAMA_MODEL

Typical environment:

export LLMPATH_BACKEND="openai"   # openai|gemini|ollama
export OPENAI_API_KEY="sk-..."

Deterministic settings are used by default (e.g., temperature=0), and runs persist prompt/raw/meta artifacts alongside run_meta.json.

📄 Manuscript reproduction

paper/ contains manuscript-facing scripts, Source Data exports, and frozen/derived artifacts (when redistributable).

paper/README.md — how to reproduce figures
paper/FIGURE_MAP.csv — canonical mapping: panel ↔ inputs ↔ scripts ↔ outputs

🧾 Citation

If you use LLM-PathwayCurator, please cite:

Preprint: Transforming enrichment terms into audit-gated decision-grade claims with LLM-PathwayCurator, bioRxiv (2026). DOI: (to be added)
Software: LLM-PathwayCurator (v0.1.0). Zenodo. DOI: (to be added)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.0.post1

Feb 20, 2026

0.1.0

Feb 17, 2026

This version

0.0.1

Feb 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_pathway_curator-0.0.1.tar.gz (1.1 MB view details)

Uploaded Feb 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_pathway_curator-0.0.1-py3-none-any.whl (229.9 kB view details)

Uploaded Feb 12, 2026 Python 3

File details

Details for the file llm_pathway_curator-0.0.1.tar.gz.

File metadata

Download URL: llm_pathway_curator-0.0.1.tar.gz
Upload date: Feb 12, 2026
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_pathway_curator-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`df38e493088762d1b730aa3f0c66dcf20d566b14cfa32b29aad1adb9be64924f`
MD5	`5396911da1cba94da25acbbb74d1b993`
BLAKE2b-256	`599b1b3bcd0ae87f8e571eb1f79f3fc64c92b860a569ecbd2ffa3e2172046f97`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_pathway_curator-0.0.1.tar.gz:

Publisher: pypi-release.yml on kenflab/LLM-PathwayCurator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_pathway_curator-0.0.1.tar.gz
- Subject digest: df38e493088762d1b730aa3f0c66dcf20d566b14cfa32b29aad1adb9be64924f
- Sigstore transparency entry: 942560442
- Sigstore integration time: Feb 12, 2026
Source repository:
- Permalink: kenflab/LLM-PathwayCurator@c609f521b28030a7cf0297f113d00a1afac186c3
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/kenflab
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-release.yml@c609f521b28030a7cf0297f113d00a1afac186c3
- Trigger Event: push

File details

Details for the file llm_pathway_curator-0.0.1-py3-none-any.whl.

File metadata

Download URL: llm_pathway_curator-0.0.1-py3-none-any.whl
Upload date: Feb 12, 2026
Size: 229.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_pathway_curator-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`22a8f986b70b7c61e8f07872890127d8748fd6f3c0f3fb4db17cd9ee0c9a3626`
MD5	`2094ec604c05a78e8a4585ba300516ac`
BLAKE2b-256	`82fc6c7dc6824fa2dd14b28c46239cff71b9ba38904ec158013038b687fa7a5d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_pathway_curator-0.0.1-py3-none-any.whl:

Publisher: pypi-release.yml on kenflab/LLM-PathwayCurator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_pathway_curator-0.0.1-py3-none-any.whl
- Subject digest: 22a8f986b70b7c61e8f07872890127d8748fd6f3c0f3fb4db17cd9ee0c9a3626
- Sigstore transparency entry: 942560447
- Sigstore integration time: Feb 12, 2026
Source repository:
- Permalink: kenflab/LLM-PathwayCurator@c609f521b28030a7cf0297f113d00a1afac186c3
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/kenflab
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-release.yml@c609f521b28030a7cf0297f113d00a1afac186c3
- Trigger Event: push

llm-pathway-curator 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LLM-PathwayCurator

🚀 What this is

🧭 Why this is different (and why it matters)

🔍 What this is not

🧩 Core pipeline (A → B → C)

⚡ Quick start (library entrypoint)

Key outputs (stable contract)

📊 Rank & visualize ranked terms (rank / plot-ranked)

A) Rank (produce claims_ranked.tsv)

B) Plot (bars or packed circles)

Bars (Metascape-like)

Packed circles (modules → terms)

Packed circles (direction shading)

Consistent module labels/colors across plots

⚖️ Inputs (contracts)

EvidenceTable (minimum required columns)

Sample Card (study context)

🔧 Adapters (Input → EvidenceTable)

🛡️ Decisions: PASS / ABSTAIN / FAIL

🧪 Built-in stress tests (counterfactuals without external knowledge)

♻️ Reproducibility by default

📦 Installation

Option A: PyPI (recommended)

Option B: From source (development)

🐳 Docker (recommended for reproducibility)

🤖 LLM usage (proposal-only; optional)

📄 Manuscript reproduction

🧾 Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

📊 Rank & visualize ranked terms (`rank` / `plot-ranked`)

A) Rank (produce `claims_ranked.tsv`)