ExposoGraph — LLM-powered extraction and manual curation of carcinogen metabolism pathways

These details have not been verified by PyPI

Project links

Project description

ExposoGraph

Build, curate, and export carcinogen metabolism knowledge graphs using LLM-powered extraction and manual entry.

Part of the CarcinoGenomic Platform — a 5-layer computational pipeline for individualized carcinogen metabolism risk assessment from germline DNA.

Version: 0.0.3 Developed by: Data analysis team @ KaziLab Contact: exposograph@kazilab.se Copyright: KaziLab

Acknowledgement

Parts of this documentation and code were created with assistance from ChatGPT Codex and Claude Code.

Features

Multi-LLM Extraction — Describe a carcinogen metabolism pathway in plain English; OpenAI (GPT-4o) or local Ollama models extract structured nodes and edges automatically
Mode-Aware Ingestion — Use exploratory mode to keep provisional entities or strict mode to keep only canonically grounded content
Manual Entry — Add and annotate nodes and edges with full provenance and curation tracking
Graph Analysis — Shortest path, centrality, metabolism chain traversal, pathway subgraph, variant impact scoring
Public DB Integration — KEGG pathway lookups, CTD chemical-gene interactions, IARC carcinogen classifications
Interactive Preview — Color-coded Streamlit AGraph visualization with hover metadata, search/filter controls, zoom, and downloadable Plotly HTML exports
Advanced Graph Viewer — Dash Cytoscape viewer with sidebar search, legends, carcinogen filters, detail panel, image export, and saved layout JSON
Validated vs Exploratory Views — Filter the current graph to all, validated only, or exploratory only in the UI and export pipeline
Rich Annotations — Structured provenance records, source manifests, curated KEGG pathway coverage, PubMed IDs, tissue context, pharmacogenomic variants, activity scores
Multiple Export Formats — Standalone Plotly HTML, parseable app HTML, JSON, D3.js viewer (graph-data.js), GEXF (Gephi)
Viewer Data Contract — Export a Cytoscape-ready JSON bundle and saved preset layout for richer web-style exploration without maintaining custom JavaScript
Validation — Referential integrity checks at model level, dangling edge detection, carcinogen context validation
Persistent Storage — SQLite-backed revision history with explicit export visibility tracking and atomic operations

Quick Start

Try Without an API Key

A pre-built Benzo[a]pyrene metabolism graph is included:

pip install -e .
python examples/build_bap_graph.py

This loads examples/bap_graph.json (20 nodes, 20 edges covering the full BaP → BPDE → DNA adduct pathway), runs graph analysis, and exports to HTML and JSON.

For a no-API-key demonstration of strict vs exploratory handling and validated-only exports:

python examples/mode_visibility_demo.py

Sample output

Loaded graph: 20 nodes, 20 edges

Shortest path CYP1A1 → BPDE-dG: CYP1A1 → BPDE → BPDE_dG

Top-5 nodes by degree centrality:
  CYP1A1        0.263
  BPDE          0.211
  CYP1B1        0.158
  BPDE_dG       0.158
  BPDE_GSH      0.158

BaP metabolism chain: 16 nodes, 13 edges
  Activation edges:    4
  Detoxification edges: 3
  Adduct edges:        1
  Repair edges:        3

Variant impact score for CYP1A1:
  Activity score:        1.0
  Downstream adducts:    1
  Impact score:          1.00

Sample JSON

{
  "nodes": [
    {"id": "BaP", "label": "Benzo[a]pyrene", "type": "Carcinogen", "group": "PAH", "iarc": "Group 1"},
    {"id": "CYP1A1", "label": "CYP1A1", "type": "Enzyme", "phase": "I", "role": "Activation"},
    {"id": "BPDE", "label": "BPDE", "type": "Metabolite", "reactivity": "High"},
    {"id": "BPDE_dG", "label": "BPDE-N2-dG", "type": "DNA_Adduct"}
  ],
  "edges": [
    {"source": "CYP1A1", "target": "BPDE", "type": "ACTIVATES", "carcinogen": "BaP"},
    {"source": "BPDE", "target": "BPDE_dG", "type": "FORMS_ADDUCT", "carcinogen": "BaP"}
  ]
}

Streamlit App

pip install -e ".[streamlit]"
streamlit run ExposoGraph/app.py

App mode defaults to stateless, which disables server-side saves and is appropriate for public web deployment. To enable local revision history and file saves on your own machine:

export ExposoGraph_MODE=local
streamlit run ExposoGraph/app.py

Jupyter

pip install -e ".[notebook]"
jupyter lab

No notebook file is currently bundled in this repository. Use the installed package from your own notebook, or start from the runnable examples in examples/.

Advanced Viewer

pip install -e ".[viewer]"

from ExposoGraph import (
    GraphVisibility,
    ViewerLayoutMode,
    launch_dash_viewer,
    write_cytoscape_bundle,
)

write_cytoscape_bundle(
    engine,
    "exports/graph_cytoscape.json",
    visibility=GraphVisibility.ALL,
    layout_mode=ViewerLayoutMode.COSE,
)

launch_dash_viewer(
    engine,
    visibility=GraphVisibility.ALL,
    layout_mode=ViewerLayoutMode.COSE,
    port=8050,
)

Python Library

pip install -e .

from ExposoGraph import (
    GraphEngine,
    GraphMode,
    GraphVisibility,
    centrality,
    extract_graph,
    metabolism_chain,
    to_json,
)

# LLM extraction (requires OpenAI API key)
# exploratory: keep unmatched or custom content
kg = extract_graph(
    "Benzo[a]pyrene is activated by CYP1A1...",
    mode=GraphMode.EXPLORATORY,
)
engine = GraphEngine()
engine.merge(kg, mode=GraphMode.EXPLORATORY)

# Analysis
scores = centrality(engine, method="degree")
chain = metabolism_chain(engine, "BaP")

# Visibility-aware export
to_json(engine, "graph_validated.json", visibility=GraphVisibility.VALIDATED_ONLY)

Using Ollama (Local LLM)

from ExposoGraph import GraphMode, extract_graph
from ExposoGraph.llm_backend import OllamaBackend

backend = OllamaBackend(base_url="http://localhost:11434")
kg = extract_graph(
    "BaP is activated by CYP1A1...",
    backend=backend,
    model="llama3.1",
    mode=GraphMode.EXPLORATORY,
)

Public Database Integration

from ExposoGraph import GraphMode
from ExposoGraph.db_clients import IARCClassifier
from ExposoGraph.seeder import seed_from_ctd, seed_from_kegg_pathway

# Seed from KEGG pathway
kg = seed_from_kegg_pathway("hsa05204", mode=GraphMode.STRICT)

# Seed from CTD
kg = seed_from_ctd("Benzo(a)pyrene", mode=GraphMode.EXPLORATORY)

# IARC classification lookup
clf = IARCClassifier()
clf.classify("Benzo[a]pyrene")  # → IARCGroup.GROUP_1

Reference Curation Metadata

from ExposoGraph import CURATION_SOURCE_MANIFEST, REFERENCE_KEGG_PATHWAYS

primary_sources = CURATION_SOURCE_MANIFEST["primary_sources"]
kegg_ids = [entry["pathway_id"] for entry in REFERENCE_KEGG_PATHWAYS]

Manuscript-Aligned Showcase Summary

from ExposoGraph import build_full_legends_architecture_summary

summary = build_full_legends_architecture_summary()

summary.node_count          # 96
summary.edge_count          # 102
summary.node_type_counts    # {'Carcinogen': 15, 'Enzyme': 36, ...}
summary.edge_type_counts    # {'ACTIVATES': 30, 'DETOXIFIES': 23, ...}
summary.carcinogen_classes  # grouped class inventories for section 2.2 rewrites

Optional Androgen Module

from ExposoGraph import build_androgen_module_graph, build_full_legends_graph

androgen_only = build_androgen_module_graph()
showcase_with_androgen = build_full_legends_graph(include_androgen_module=True)

Configuration

Set your OpenAI API key as an environment variable:

export OPENAI_API_KEY="sk-..."

Or enter it in the Streamlit sidebar when running the app.

For Streamlit Cloud deployment, add the key to .streamlit/secrets.toml:

OPENAI_API_KEY = "sk-..."

Graph Modes and Visibility

Two separate controls now shape how data moves through the system:

Graph mode controls ingestion behavior:
- exploratory keeps unmatched or custom entities and marks them as provisional
- strict keeps only canonically grounded nodes and edges
Graph visibility controls viewing and export behavior:
- all
- validated_only
- exploratory_only

Typical pattern:

from ExposoGraph import (
    GraphEngine,
    GraphMode,
    GraphRepository,
    GraphVisibility,
    ViewerLayoutMode,
    launch_dash_viewer,
    write_cytoscape_bundle,
    extract_graph,
    to_interactive_html,
    to_plotly_html,
)

engine = GraphEngine()
kg = extract_graph(
    "BaP induces CYP1A1 and forms BPDE adducts",
    mode=GraphMode.STRICT,
)
engine.merge(kg, mode=GraphMode.STRICT)

to_interactive_html(
    engine,
    "validated_graph.html",
    visibility=GraphVisibility.VALIDATED_ONLY,
)

to_plotly_html(
    engine,
    "validated_graph_plotly.html",
    visibility=GraphVisibility.VALIDATED_ONLY,
)

write_cytoscape_bundle(
    engine,
    "validated_graph_cytoscape.json",
    visibility=GraphVisibility.VALIDATED_ONLY,
    layout_mode=ViewerLayoutMode.PRESET,
)

with GraphRepository("data/ExposoGraph.sqlite3") as repo:
    repo.save_engine(
        graph_key="bap_validated",
        graph_name="BaP Validated",
        engine=engine,
        visibility=GraphVisibility.VALIDATED_ONLY,
    )

Project Structure

ExposoGraph/
├── __init__.py          # Public API exports
├── app.py               # Streamlit UI orchestrator
├── branding.py          # Version and metadata
├── config.py            # App modes, graph modes, and visibility enums
├── engine.py            # NetworkX-backed graph engine
├── exporter.py          # JSON, D3.js, HTML, GEXF export
├── graph_filters.py     # Validated/exploratory graph filtering helpers
├── graph_analysis.py    # Shortest path, centrality, metabolism chains
├── grounding.py         # Canonical grounding and strict-mode preparation
├── llm_backend.py       # Pluggable LLM backends (OpenAI, Ollama)
├── llm_extractor.py     # LLM prompt + extraction pipeline
├── models.py            # Pydantic data models (Node, Edge, KnowledgeGraph)
├── reference_data.py    # Gene panels and activity scores
├── seeder.py            # DB-to-KnowledgeGraph conversion
├── storage.py           # SQLite revision history
├── db_clients/
│   ├── kegg.py          # KEGG REST API client
│   ├── ctd.py           # CTD chemical-gene interaction client
│   └── iarc.py          # Bundled IARC classification data
├── ui_extract.py        # Tab: LLM extraction
├── ui_manual.py         # Tab: manual node/edge entry
├── ui_preview.py        # Tab: interactive graph preview
└── ui_data.py           # Tab: raw data view
examples/
├── bap_graph.json       # Pre-built BaP metabolism graph (no API key needed)
├── build_bap_graph.py   # Demo script: load → analyze → export
└── mode_visibility_demo.py  # Demo script: strict ingestion + filtered export/save
tests/
├── test_integration.py  # End-to-end pipeline test
├── test_engine.py
├── test_models.py
├── test_exporter.py
├── test_graph_analysis.py
├── test_llm_backend.py
├── test_llm_extractor.py
├── test_db_clients.py
├── test_seeder.py
├── test_reference_data.py
├── test_config.py
└── test_storage.py

Node & Edge Types

Nodes: Carcinogen, Enzyme, Gene, Metabolite, DNA_Adduct, Pathway, Tissue

Edges: ACTIVATES, DETOXIFIES, TRANSPORTS, FORMS_ADDUCT, REPAIRS, PATHWAY, EXPRESSED_IN, INDUCES, INHIBITS, ENCODES, CUSTOM

Development

pip install -e ".[all]"
pytest                           # runs with --cov, 85% threshold gate
ruff check .
mypy ExposoGraph/

Optional dependency groups

Group	Install	Provides
`llm`	`pip install -e ".[llm]"`	OpenAI API support
`ollama`	`pip install -e ".[ollama]"`	Ollama local LLM support
`db`	`pip install -e ".[db]"`	KEGG/CTD HTTP clients
`streamlit`	`pip install -e ".[streamlit]"`	Streamlit web app
`notebook`	`pip install -e ".[notebook]"`	Jupyter + Plotly/Matplotlib
`dev`	`pip install -e ".[dev]"`	pytest, ruff, mypy
`docs`	`pip install -e ".[docs]"`	Sphinx + Furo
`all`	`pip install -e ".[all]"`	Everything

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.3

Mar 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exposograph-0.0.3.tar.gz (192.9 kB view details)

Uploaded Mar 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

exposograph-0.0.3-py3-none-any.whl (147.0 kB view details)

Uploaded Mar 21, 2026 Python 3

File details

Details for the file exposograph-0.0.3.tar.gz.

File metadata

Download URL: exposograph-0.0.3.tar.gz
Upload date: Mar 21, 2026
Size: 192.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for exposograph-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`c2e5e52aa982fa82dcc04160b43ffb0dc02c24dbf36789977e14b360dab67f36`
MD5	`f48b816aad28c340b56b44994d610e15`
BLAKE2b-256	`789d5ec8107763ddce1874877d3110994bfc7f4c16d95b8ff98ba50d5cc733c4`

See more details on using hashes here.

File details

Details for the file exposograph-0.0.3-py3-none-any.whl.

File metadata

Download URL: exposograph-0.0.3-py3-none-any.whl
Upload date: Mar 21, 2026
Size: 147.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for exposograph-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4fa7d45a817c75e93afb792dbf62e759d4eb97ae7af9ebfeeef991164e266c08`
MD5	`b5470cc3d4a14ff9c62d08a0a80105d9`
BLAKE2b-256	`5139d6a38945e9d99d87b7835135b61194728917163bdc362ff60340486cdfa0`

See more details on using hashes here.

ExposoGraph 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ExposoGraph

Acknowledgement

Features

Quick Start

Try Without an API Key

Sample JSON

Streamlit App

Jupyter

Advanced Viewer

Python Library

Using Ollama (Local LLM)

Public Database Integration

Reference Curation Metadata

Manuscript-Aligned Showcase Summary

Optional Androgen Module

Configuration

Graph Modes and Visibility

Project Structure

Node & Edge Types

Development

Optional dependency groups

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes