Transform enrichment outputs into verifiable, auditable pathway claims with calibrated abstention.

These details have not been verified by PyPI

Project description

LLM-PathwayCurator

LLM-PathwayCurator Enrichment interpretations → audited, decision-grade pathway claims.

Docs: https://llm-pathwaycurator.readthedocs.io/
Paper reproducibility (canonical): paper/ (see paper/README.md; panel map in paper/FIGURE_MAP.csv)
Cite: bioRxiv preprint (DOI: 10.64898/2026.02.18.706381).

🚀 What this is

LLM-PathwayCurator is an interpretation quality-assurance (QA) layer for enrichment analysis.
It does not introduce a new enrichment statistic. Instead, it turns EA outputs into auditable decision objects:

Input: enrichment term lists from ORA (e.g., Metascape) or rank-based enrichment (e.g., fgsea, an implementation of the GSEA method)
Output: typed, evidence-linked claims + PASS/ABSTAIN/FAIL decisions + reason-coded audit logs
Promise: we abstain when claims are unstable, under-supported, contradictory, or context-nonspecific

Selective prediction for pathway interpretation: calibrated abstention is a feature, not a failure.

LLM-PathwayCurator workflow: EvidenceTable → modules → claims → audits

Fig. 1a. Overview of LLM-PathwayCurator workflow:
EvidenceTable → modules → claims → audits (bioRxiv preprint)

🧭 Why this is different (and why it matters)

Enrichment tools return ranked term lists. In practice, interpretation breaks because:

Representative terms are ambiguous under study context
Gene support is opaque, enabling cherry-picking
Related terms share / bridge evidence in non-obvious ways
There is no mechanical stop condition for fragile narratives

LLM-PathwayCurator replaces narrative endorsement with audit-gated decisions.
We transform ranked terms into machine-auditable claims by enforcing:

Evidence-linked constraints: claims must resolve to valid term/module identifiers and supporting-gene evidence
Stability audits: supporting-gene perturbations yield stability proxies (operating point: τ)
Context validity stress tests: context swap reveals context dependence without external knowledge
Contradiction checks: internally inconsistent claims fail mechanically
Reason-coded outcomes: every decision is explainable by a finite audit code set

🔍 What this is not

Not an enrichment method; it audits enrichment outputs.
Not a free-text summarizer; claims are schema-bounded (typed JSON; no narrative prose as “evidence”).
Not a biological truth oracle; it checks internal consistency and evidence integrity, not mechanistic truth.

🧩 Core pipeline (A → B → C)

A) Stability distillation (evidence hygiene)
Perturb supporting genes (seeded) to compute stability proxies (e.g., LOO/jackknife-like survival scores).
Output: distilled.tsv

B) Evidence factorization (modules)
Factorize the term–gene bipartite graph into evidence modules that preserve shared vs distinct support.
Outputs: modules.tsv, term_modules.tsv, term_gene_edges.tsv

C) Claims → audit → report

C1 (proposal-only): deterministic baseline or optional LLM proposes typed claims with resolvable evidence links
C2 (audit/decider): mechanical rules assign PASS/ABSTAIN/FAIL with precedence (FAIL > ABSTAIN > PASS)
C3 (report): decision-grade report + audit log (audit_log.tsv) + provenance

⚡ Quick start (library entrypoint)

llm-pathway-curator run \
  --sample-card examples/demo/sample_card.json \
  --evidence-table examples/demo/evidence_table.tsv \
  --out out/demo/

Key outputs (stable contract)

audit_log.tsv — PASS/ABSTAIN/FAIL + reason codes (mechanical)
report.jsonl, report.md — decision objects (evidence-linked)
claims.proposed.tsv — proposed candidates (proposal-only; auditable)
distilled.tsv — stability proxies / evidence hygiene outputs
modules.tsv, term_modules.tsv, term_gene_edges.tsv — evidence structure
run_meta.json (+ optional manifest.json) — pinned params + provenance

📊 Rank & visualize ranked terms (`rank` / `plot-ranked`)

LLM-PathwayCurator includes two small post-processing commands for ranking and publication-ready visualization of ranked terms/modules:

llm-pathway-curator rank — produces a ranked table (claims_ranked.tsv) for downstream plots and summaries.
llm-pathway-curator plot-ranked — renders ranked terms/modules as either:
- bars (Metascape-like horizontal bars), or
- packed circles (module-level circle packing with term circles inside).

A) Rank (produce `claims_ranked.tsv`)

Use rank to generate a deterministic ranked table from a run output directory.

llm-pathway-curator rank --help
# Typical workflow: point rank to a run directory and write claims_ranked.tsv
# (See --help for the exact flags supported by your installed version.)

B) Plot (bars or packed circles)

plot-ranked auto-detects claims_ranked.tsv (recommended) or falls back to audit_log.tsv under --run-dir.

Packed circles require an extra dependency: python -m pip install circlify

Bars (Metascape-like)

llm-pathway-curator plot-ranked \
  --mode bars \
  --run-dir out/demo \
  --out-png out/demo/plots/ranked_bars.png \
  --decision PASS \
  --group-by-module \
  --left-strip \
  --strip-labels \
  --bar-color-mode module

Packed circles (modules → terms)

llm-pathway-curator plot-ranked \
  --mode packed \
  --run-dir out/demo \
  --out-png out/demo/plots/ranked_packed.png \
  --decision PASS \
  --term-color-mode module

Packed circles (direction shading)

llm-pathway-curator plot-ranked \
  --mode packed \
  --run-dir out/demo \
  --out-png out/demo/plots/ranked_packed.direction.png \
  --decision PASS \
  --term-color-mode direction

Consistent module labels/colors across plots

plot-ranked assigns a single module display rank (M01, M02, ...) and a stable module color per module_id, so bars and packed circles can be placed side-by-side without label/color drift.

⚖️ Inputs (contracts)

EvidenceTable (minimum required columns)

Each row is one enriched term.

Required columns:

term_id, term_name, source
stat, qval, direction
evidence_genes (supporting genes; TSV uses ; join)

Sample Card (study context)

Structured context record used for proposal and context gating, e.g.:

condition/disease, tissue, perturbation, comparison

Adapters for common tools live under src/llm_pathway_curator/adapters/.

🔧 Adapters (Input → EvidenceTable)

Adapters are intentionally conservative:

preserve evidence identity (term × genes)
avoid destructive parsing
keep TSV round-trips stable (contract drift is treated as a bug)

See: src/llm_pathway_curator/adapters/README.md

🛡️ Decisions: PASS / ABSTAIN / FAIL

LLM-PathwayCurator assigns decisions by mechanical audit gates:

FAIL: auditable violations (evidence-link drift, schema violations, contradictions, forbidden fields, etc.)
ABSTAIN: non-specific, under-supported, or unstable under perturbations / stress tests
PASS: survives all enabled gates at the chosen operating point (τ)

Important: the LLM (if enabled) never decides acceptance. It may propose candidates; the audit suite is the decider.

🧪 Built-in stress tests (counterfactuals without external knowledge)

Context swap: shuffle study context (e.g., BRCA → LUAD) to test context dependence
Evidence dropout: randomly remove supporting genes (seeded; min_keep enforced)
Contradiction injection (optional): introduce internally contradictory candidates to test FAIL gates

These are specification-driven perturbations intended to validate that the pipeline abstains for the right reasons, with stress-specific reason codes.

♻️ Reproducibility by default

LLM-PathwayCurator is deterministic by default:

fixed seeds (CLI + library defaults)
pinned parsing + hashing utilities
stable output schemas and reason codes
run metadata persisted to run_meta.json (and runner-level manifest.json when used)

Paper-side runners (e.g., paper/scripts/fig2_run_pipeline.py) orchestrate reproducible sweeps and do not implement scientific logic; they call the library entrypoint (llm_pathway_curator.pipeline.run_pipeline).

📦 Installation

Option A: PyPI (recommended)

pip install llm-pathway-curator

(See PyPI project page: https://pypi.org/project/llm-pathway-curator/)

Option B: From source (development)

git clone https://github.com/kenflab/LLM-PathwayCurator.git
cd LLM-PathwayCurator
pip install -e .

🐳 Docker (recommended for reproducibility)

We provide an official Docker environment (Python + R + Jupyter), sufficient to run LLM-PathwayCurator and most paper figure generation.
Optionally includes Ollama for local LLM annotation (no cloud API key required).

Option A: Prebuilt image (recommended)

Use the published image from GitHub Container Registry (GHCR).
```
# from the repo root (optional, for notebooks / file access)
docker pull ghcr.io/kenflab/llm-pathway-curator:official
```
Run Jupyter:
```
docker run --rm -it \
  -p 8888:8888 \
  -v "$PWD":/work \
  -e GEMINI_API_KEY \
  -e OPENAI_API_KEY \
  ghcr.io/kenflab/llm-pathway-curator:official
```
Open Jupyter: http://localhost:8888

(Use the token printed in the container logs.)

Notes:

For manuscript reproducibility, we also provide versioned tags (e.g., :0.1.0). Prefer a version tag when matching a paper release.
Option B: Build locally (development)
- Option B-1: Build locally with Compose (recommended for dev)
```
# from the repo root
docker compose -f docker/docker-compose.yml build
docker compose -f docker/docker-compose.yml up
```
  B-1.1) Open Jupyter
  - http://localhost:8888 Workspace mount: /work
  B-1.2) If prompted for "Password or token"
  - Get the tokenized URL from container logs:
```
docker compose -f docker/docker-compose.yml logs -f llm-pathway-curator
```
  - Then either:
    - open the printed URL (contains ?token=...) in your browser, or
    - paste the token value into the login prompt.
- Option B-2: Build locally without Compose (alternative)
```
# from the repo root
docker build -f docker/Dockerfile -t llm-pathway-curator:official .
```
  B-2.1) Run Jupyter
```
docker run --rm -it \
  -p 8888:8888 \
  -v "$PWD":/work \
  -e GEMINI_API_KEY \
  -e OPENAI_API_KEY \
  llm-pathway-curator:official
```
  B-2.2) Open Jupyter
  - http://localhost:8888 Workspace mount: /work

🖥️ Apptainer / Singularity (HPC)

Option A: Prebuilt image (recommended)

Use the published image from GitHub Container Registry (GHCR).
```
apptainer build llm-pathway-curator.sif docker://ghcr.io/kenflab/llm-pathway-curator:official
```

Option B: a .sif from the Docker image (development)

docker compose -f docker/docker-compose.yml build
apptainer build llm-pathway-curator.sif docker-daemon://llm-pathway-curator:official

Run Jupyter (either image):

apptainer exec --cleanenv \
  --bind "$PWD":/work \
  llm-pathway-curator.sif \
  bash -lc 'jupyter lab --ip=0.0.0.0 --port=8888 --no-browser

🤖 LLM usage (proposal-only; optional)

If enabled, the LLM is confined to proposal steps and must emit schema-bounded JSON with resolvable EvidenceTable links.

Backends (example):

Ollama: LLMPATH_OLLAMA_HOST, LLMPATH_OLLAMA_MODEL
Gemini: GEMINI_API_KEY
OpenAI: OPENAI_API_KEY

Typical environment:

export LLMPATH_BACKEND="ollama"   # ollama|gemini|openai

Deterministic settings are used by default (e.g., temperature=0), and runs persist prompt/raw/meta artifacts alongside run_meta.json.

📄 Manuscript reproduction

paper/ contains manuscript-facing scripts, Source Data exports, and frozen/derived artifacts (when redistributable).

paper/README.md — how to reproduce figures
paper/FIGURE_MAP.csv — canonical mapping: panel ↔ inputs ↔ scripts ↔ outputs

🧾 Citation

If you use LLM-PathwayCurator, please cite:

bioRxiv preprint (doi: 10.64898/2026.02.18.706381)
Zenodo archive (v0.1.0): 10.5281/zenodo.18625777
GitHub release tag: v0.1.0
Software RRID: RRID:SCR_027964

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0.post1

Feb 20, 2026

0.1.0

Feb 17, 2026

0.0.1

Feb 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_pathway_curator-0.1.0.post1.tar.gz (1.1 MB view details)

Uploaded Feb 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_pathway_curator-0.1.0.post1-py3-none-any.whl (231.5 kB view details)

Uploaded Feb 20, 2026 Python 3

File details

Details for the file llm_pathway_curator-0.1.0.post1.tar.gz.

File metadata

Download URL: llm_pathway_curator-0.1.0.post1.tar.gz
Upload date: Feb 20, 2026
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_pathway_curator-0.1.0.post1.tar.gz
Algorithm	Hash digest
SHA256	`a7d42840befd1e2e7fa12217fe54d2b4fe8f9514a8c6216786e0694f88f2d3a2`
MD5	`60e322f16a919e1f28c5f2ea9e3b48e0`
BLAKE2b-256	`d7d1f3b3303da9efa8d5eea7432b08e9c2da702c33d21e4f4b93db297cb6c5b8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_pathway_curator-0.1.0.post1.tar.gz:

Publisher: pypi-release.yml on kenflab/LLM-PathwayCurator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_pathway_curator-0.1.0.post1.tar.gz
- Subject digest: a7d42840befd1e2e7fa12217fe54d2b4fe8f9514a8c6216786e0694f88f2d3a2
- Sigstore transparency entry: 973510605
- Sigstore integration time: Feb 20, 2026
Source repository:
- Permalink: kenflab/LLM-PathwayCurator@16bb8e9573771d297fd2d5962adbecf9cec08127
- Branch / Tag: refs/tags/v0.1.0.post1
- Owner: https://github.com/kenflab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-release.yml@16bb8e9573771d297fd2d5962adbecf9cec08127
- Trigger Event: push

File details

Details for the file llm_pathway_curator-0.1.0.post1-py3-none-any.whl.

File metadata

Download URL: llm_pathway_curator-0.1.0.post1-py3-none-any.whl
Upload date: Feb 20, 2026
Size: 231.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for llm_pathway_curator-0.1.0.post1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`49ad7ccb8b01ef03b9dbbe268ae15f8a1185d9bf88f801fca40b62f9ae65dbd2`
MD5	`50ef5fb85c4bbb1b595fc1e35a34dc44`
BLAKE2b-256	`0be1de3e8da2aecc7e4f4a3157e023f789d727d86865460f88c9de5c0340183d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_pathway_curator-0.1.0.post1-py3-none-any.whl:

Publisher: pypi-release.yml on kenflab/LLM-PathwayCurator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llm_pathway_curator-0.1.0.post1-py3-none-any.whl
- Subject digest: 49ad7ccb8b01ef03b9dbbe268ae15f8a1185d9bf88f801fca40b62f9ae65dbd2
- Sigstore transparency entry: 973510607
- Sigstore integration time: Feb 20, 2026
Source repository:
- Permalink: kenflab/LLM-PathwayCurator@16bb8e9573771d297fd2d5962adbecf9cec08127
- Branch / Tag: refs/tags/v0.1.0.post1
- Owner: https://github.com/kenflab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi-release.yml@16bb8e9573771d297fd2d5962adbecf9cec08127
- Trigger Event: push

llm-pathway-curator 0.1.0.post1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LLM-PathwayCurator

🚀 What this is

🧭 Why this is different (and why it matters)

🔍 What this is not

🧩 Core pipeline (A → B → C)

⚡ Quick start (library entrypoint)

Key outputs (stable contract)

📊 Rank & visualize ranked terms (rank / plot-ranked)

A) Rank (produce claims_ranked.tsv)

B) Plot (bars or packed circles)

Bars (Metascape-like)

Packed circles (modules → terms)

Packed circles (direction shading)

Consistent module labels/colors across plots

⚖️ Inputs (contracts)

EvidenceTable (minimum required columns)

Sample Card (study context)

🔧 Adapters (Input → EvidenceTable)

🛡️ Decisions: PASS / ABSTAIN / FAIL

🧪 Built-in stress tests (counterfactuals without external knowledge)

♻️ Reproducibility by default

📦 Installation

Option A: PyPI (recommended)

Option B: From source (development)

🐳 Docker (recommended for reproducibility)

Option A: Prebuilt image (recommended)

Option B: Build locally (development)

Option B-1: Build locally with Compose (recommended for dev)

Option B-2: Build locally without Compose (alternative)

🖥️ Apptainer / Singularity (HPC)

Option A: Prebuilt image (recommended)

Option B: a .sif from the Docker image (development)

🤖 LLM usage (proposal-only; optional)

📄 Manuscript reproduction

🧾 Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

📊 Rank & visualize ranked terms (`rank` / `plot-ranked`)

A) Rank (produce `claims_ranked.tsv`)