Skip to main content

Complete pathway visualization: KEGG + SBGN + highlighting + splines

Project description

Pathview-plus โ€” Complete Pathway Visualization

Full-featured Python implementation of R pathview + SBGNview with support for KEGG, Reactome, MetaCyc, and more.

Python 3.10+


๐ŸŽฏ Features

Core Capabilities

  • โœ… KEGG Pathways โ€” Download and visualize any KEGG pathway
  • โœ… SBGN Pathways โ€” Support for Reactome, MetaCyc, PANTHER, SMPDB
  • โœ… Multiple Formats โ€” PNG (native overlay), SVG (vector), PDF (graph layout)
  • โœ… Gene & Metabolite Data โ€” Overlay expression and abundance data
  • โœ… Multi-Condition โ€” Visualize multiple experiments side-by-side
  • โœ… ID Conversion โ€” Automatic mapping: Entrez โ†” Symbol โ†” UniProt โ†” Ensembl
  • โœ… Highlighting โ€” Post-hoc emphasis of specific nodes/edges/paths
  • โœ… Spline Curves โ€” Smooth Bezier edge routing
  • โœ… Custom Colors โ€” Configurable diverging color scales

New in v2.0

  • ๐Ÿ†• Full SBGN-ML support โ€” Parse and render SBGN Process Description files
  • ๐Ÿ†• Database integration โ€” Direct download from Reactome, MetaCyc
  • ๐Ÿ†• SVG vector output โ€” Scalable graphics for web and publication
  • ๐Ÿ†• Highlighting system โ€” ggplot2-style composable modifications
  • ๐Ÿ†• Spline rendering โ€” Cubic Bezier and Catmull-Rom curves

๐Ÿ“ฆ Installation

Quick install

pip install pathview-plus

Custom install

# Clone repository
git clone https://github.com/raw-lab/pathview-plus
cd pathview-plus

# Install dependencies
pip install -r requirements.txt
pip install .

# Or install specific packages
pip install polars numpy matplotlib seaborn Pillow networkx requests

Dependencies:

  • Python โ‰ฅ 3.10
  • polars โ‰ฅ 0.19.0
  • matplotlib โ‰ฅ 3.7.0
  • seaborn โ‰ฅ 0.12.0
  • numpy โ‰ฅ 1.24.0
  • Pillow โ‰ฅ 10.0.0
  • networkx โ‰ฅ 3.1
  • requests โ‰ฅ 2.31.0

๐Ÿš€ Quick Start

1. Basic KEGG Pathway

import polars as pl
from pathview import pathview

# Load your data
gene_data = pl.read_csv("gene_expr.tsv", separator="\t")

# Visualize on KEGG pathway
result = pathview(
    pathway_id="04110",      # Cell cycle
    gene_data=gene_data,
    species="hsa",
    output_format="png"
)

2. Reactome SBGN Pathway

from pathview import download_reactome, parse_sbgn, sbgn_to_df, pathview

# Download Reactome pathway
path = download_reactome("R-HSA-109582")  # Hemostasis

# Parse and visualize
pathway = parse_sbgn(path)
node_df = sbgn_to_df(pathway)

# Overlay data
result = pathview(
    pathway_id="R-HSA-109582",
    gene_data=gene_data,
    output_format="svg"  # Vector graphics
)

3. Multi-Condition Comparison

# Three experimental conditions
gene_data = pl.DataFrame({
    "entrez": ["1956", "2099", "5594", "207"],
    "Control": [0.5, -0.3, 1.2, -0.8],
    "Treatment_A": [2.1, -1.5, 0.4, 1.3],
    "Treatment_B": [1.8, -0.9, 2.3, 0.7],
})

result = pathview(
    pathway_id="04010",  # MAPK signaling
    gene_data=gene_data,
    species="hsa",
    limit={"gene": 2.5, "cpd": 1.5},
)
# Each node shows 3 color bands (one per condition)

4. Custom Color Schemes

result = pathview(
    pathway_id="04151",
    gene_data=gene_data,
    species="hsa",
    low={"gene": "#2166AC", "cpd": "#4575B4"},   # Blue
    mid={"gene": "#F7F7F7", "cpd": "#F7F7F7"},   # White
    high={"gene": "#D6604D", "cpd": "#B2182B"},  # Red
)

๐Ÿ“– Complete Examples

Example 1: Gene Symbol IDs

gene_data = pl.DataFrame({
    "symbol": ["TP53", "EGFR", "KRAS", "PIK3CA", "AKT1"],
    "log2fc": [-1.8, 2.4, 1.1, 1.5, 0.9],
})

result = pathview(
    pathway_id="04151",
    gene_data=gene_data,
    species="hsa",
    gene_idtype="SYMBOL",  # Automatic conversion to Entrez
)

Example 2: Combined Gene + Metabolite

from pathview import sim_mol_data

gene_data = sim_mol_data(mol_type="gene", species="hsa", n_mol=80)
cpd_data = sim_mol_data(mol_type="cpd", n_mol=30)

result = pathview(
    pathway_id="00010",  # Glycolysis
    gene_data=gene_data,
    cpd_data=cpd_data,
    species="hsa",
    low={"gene": "green", "cpd": "blue"},
    high={"gene": "red", "cpd": "yellow"},
)

Example 3: SVG Vector Output

result = pathview(
    pathway_id="04110",
    gene_data=gene_data,
    species="hsa",
    output_format="svg",  # Scalable vector graphics
)
# Output: hsa04110.pathview.svg
# - Scalable without quality loss
# - Smaller file size
# - Editable in Inkscape/Illustrator

Example 4: Graph Layout (No PNG Background)

result = pathview(
    pathway_id="04010",
    gene_data=gene_data,
    species="hsa",
    kegg_native=False,     # Use NetworkX layout
    output_format="pdf",
)
# Output: hsa04010.pathview.pdf

Example 5: Highlighting (API Preview)

from pathview import highlight_nodes, highlight_path

result = pathview("04010", gene_data=data)

# Composable modifications (ggplot2-style)
highlighted = (result
               + highlight_nodes(["1956", "2099"], color="red", width=4)
               + highlight_path(["1956", "2099", "5594"], color="orange"))

highlighted.save("highlighted.png")

Example 6: Spline Curves

from pathview import cubic_bezier, catmull_rom_spline
import matplotlib.pyplot as plt

# Smooth Bezier curve
curve = cubic_bezier((0,0), (1,2), (3,2), (4,0), n_points=100)

plt.plot(curve[:, 0], curve[:, 1], linewidth=2)
plt.title("Bezier Curve Edge Routing")
plt.savefig("bezier_example.png")

Example 7: Batch Processing

pathways = ["04110", "04010", "04151", "00010"]

for pw_id in pathways:
    try:
        result = pathview(
            pathway_id=pw_id,
            gene_data=gene_data,
            species="hsa",
            out_suffix=f"batch_{pw_id}",
        )
        print(f"โœ“ Completed {pw_id}")
    except Exception as e:
        print(f"โœ— Failed {pw_id}: {e}")

๐Ÿ–ฅ๏ธ Command Line Interface

# Basic usage
python pathview_cli.py --pathway-id 04110 --gene-data expr.tsv

# Specify species and ID type
python pathview_cli.py \
    --pathway-id 04110 \
    --species hsa \
    --gene-data expr.tsv \
    --gene-idtype SYMBOL

# Custom colors
python pathview_cli.py \
    --pathway-id 04010 \
    --gene-data expr.tsv \
    --low-gene '#2166AC' \
    --high-gene '#D6604D' \
    --output-format svg

# Simulate data (for testing)
python pathview_cli.py \
    --pathway-id 04110 \
    --simulate \
    --n-sim 200

# Display KEGG legend
python pathview_cli.py --legend

CLI Arguments:

Pathway:
  --pathway-id ID          KEGG pathway number (e.g., '04110')

Input data:
  --gene-data TSV          Gene expression file (TSV)
  --cpd-data TSV           Compound abundance file (TSV)
  --gene-idtype TYPE       Gene ID type: ENTREZ, SYMBOL, UNIPROT, ENSEMBL
  --cpd-idtype TYPE        Compound ID type: KEGG, PUBCHEM, CHEBI

Species & paths:
  --species CODE           KEGG species code (default: hsa)
  --kegg-dir DIR           Directory for files (default: .)
  --out-suffix SUFFIX      Output filename suffix (default: pathview)

Rendering:
  --kegg-native            Use KEGG PNG background (default: True)
  --output-format FORMAT   Output format: png, pdf, svg (default: png)
  --map-symbol             Replace Entrez with symbols (default: True)
  --node-sum METHOD        Aggregation: sum, mean, median, max
  --no-signature           Suppress watermark
  --no-col-key             Suppress color legend

Color scale:
  --limit-gene FLOAT       Color scale limit (default: 1.0)
  --bins-gene INT          Color bins (default: 10)
  --low-gene COLOR         Low-end color (default: green)
  --mid-gene COLOR         Mid-point color (default: gray)
  --high-gene COLOR        High-end color (default: red)
  --low-cpd COLOR          Low compound color (default: blue)
  --high-cpd COLOR         High compound color (default: yellow)

Utilities:
  --legend                 Display KEGG legend and exit
  --simulate               Generate simulated data
  --n-sim INT              Number of simulated molecules (default: 200)

๐Ÿ“Š Input File Formats

Gene Data (TSV)

First column = gene IDs, remaining columns = numeric expression values.

entrez	Control	Treatment_A	Treatment_B
1956	2.31	0.45	1.82
2099	-1.14	-0.88	0.33
5594	0.72	1.33	-0.51
207	-0.88	1.21	0.94

Gene Symbols

gene_symbol	log2fc	p_value
TP53	-1.8	0.001
EGFR	2.4	0.0001
KRAS	1.1	0.01

Compound Data (TSV)

kegg	abundance
C00031	1.45
C00118	-0.83
C00022	2.11

๐ŸŽจ Color Scale Configuration

Three-Point Diverging Scale

pathview(
    pathway_id="04110",
    gene_data=data,
    limit={"gene": 2.0, "cpd": 1.5},      # ยฑ2.0 for genes, ยฑ1.5 for compounds
    bins={"gene": 20, "cpd": 10},          # Color resolution
    low={"gene": "blue", "cpd": "green"},
    mid={"gene": "white", "cpd": "gray"},
    high={"gene": "red", "cpd": "yellow"},
)

The scale maps:

  • low value โ†’ low color (default: green/blue)
  • 0 โ†’ mid color (default: gray)
  • high value โ†’ high color (default: red/yellow)

One-Directional Scale

both_dirs={"gene": False, "cpd": False}
# Maps: 0 (mid) โ†’ max (high)

๐Ÿ—‚๏ธ Supported ID Types

Gene IDs

Type Value Example
Entrez ENTREZ 1956
Symbol SYMBOL EGFR
UniProt UNIPROT P00533
Ensembl ENSEMBL ENSG00000146648
KEGG KEGG hsa:1956

Compound IDs

Type Value Example
KEGG KEGG C00031
PubChem PUBCHEM 5793
ChEBI CHEBI 4167

๐Ÿงฌ Supported Databases

KEGG

  • Format: KGML (XML)
  • Species: 500+ organisms
  • Download: Automatic via KEGG REST API
  • Example: pathway_id="hsa04110"

Reactome

  • Format: SBGN-ML
  • Species: Human, mouse, rat, and more
  • Download: download_reactome("R-HSA-109582")
  • Example: Hemostasis, Immune System, Signaling

MetaCyc

  • Format: SBGN-ML
  • Coverage: 2,800+ metabolic pathways
  • Download: download_metacyc("PWY-7210")
  • Example: Pyrimidine biosynthesis

PANTHER

  • Format: SBGN-ML
  • Coverage: 177 signaling and metabolic pathways
  • Note: Manual download required

SMPDB

  • Format: SBGN-ML
  • Coverage: Small molecule pathways
  • Note: Manual download from website

๐Ÿ—๏ธ Architecture

pathview/
โ”œโ”€โ”€ __init__.py           # Public API exports
โ”œโ”€โ”€ constants.py          # Type definitions
โ”œโ”€โ”€ utils.py              # String/numeric utilities
โ”‚
โ”œโ”€โ”€ id_mapping.py         # Gene/compound ID conversion
โ”œโ”€โ”€ mol_data.py           # Data aggregation, simulation
โ”‚
โ”œโ”€โ”€ kegg_api.py           # KEGG REST API
โ”œโ”€โ”€ databases.py          # Reactome, MetaCyc downloaders
โ”‚
โ”œโ”€โ”€ kgml_parser.py        # KEGG KGML (XML) parser
โ”œโ”€โ”€ sbgn_parser.py        # SBGN-ML (XML) parser
โ”‚
โ”œโ”€โ”€ color_mapping.py      # Colormaps, node coloring
โ”œโ”€โ”€ node_mapping.py       # Map data onto nodes
โ”‚
โ”œโ”€โ”€ rendering.py          # PNG/PDF renderers
โ”œโ”€โ”€ svg_rendering.py      # SVG vector renderer
โ”œโ”€โ”€ highlighting.py       # Post-hoc modifications
โ”œโ”€โ”€ splines.py            # Bezier curve math
โ”‚
โ””โ”€โ”€ pathview.py           # Core orchestrator

pathview_cli.py           # Command-line interface
requirements.txt          # Dependencies
README.md                 # This file

Module Statistics:

  • 15 modules | 3,506 lines of code
  • Functional programming style
  • Full type hints
  • Comprehensive docstrings

๐Ÿ”ง API Reference

Core Function

pathview(
    pathway_id: str,
    gene_data: Optional[pl.DataFrame] = None,
    cpd_data: Optional[pl.DataFrame] = None,
    species: str = "hsa",
    kegg_dir: Path = ".",
    kegg_native: bool = True,
    output_format: str = "png",  # "png", "pdf", "svg"
    gene_idtype: str = "ENTREZ",
    cpd_idtype: str = "KEGG",
    out_suffix: str = "pathview",
    node_sum: str = "sum",
    map_symbol: bool = True,
    map_null: bool = True,
    min_nnodes: int = 3,
    new_signature: bool = True,
    plot_col_key: bool = True,
    # Color scale parameters
    limit: dict = {"gene": 1.0, "cpd": 1.0},
    bins: dict = {"gene": 10, "cpd": 10},
    both_dirs: dict = {"gene": True, "cpd": True},
    low: dict = {"gene": "green", "cpd": "blue"},
    mid: dict = {"gene": "gray", "cpd": "gray"},
    high: dict = {"gene": "red", "cpd": "yellow"},
    na_col: str = "transparent",
) -> dict

Data Functions

sim_mol_data(mol_type="gene", species="hsa", n_mol=100, n_exp=1) โ†’ pl.DataFrame
mol_sum(mol_data, id_map, sum_method="sum") โ†’ pl.DataFrame

ID Mapping

id2eg(ids, category, org="Hs") โ†’ pl.DataFrame
eg2id(eg_ids, category="SYMBOL", org="Hs") โ†’ pl.DataFrame
cpd_id_map(in_ids, in_type, out_type="KEGG") โ†’ pl.DataFrame

Parsing

# KEGG
parse_kgml(filepath) โ†’ KGMLPathway
node_info(pathway) โ†’ pl.DataFrame

# SBGN
parse_sbgn(filepath) โ†’ SBGNPathway
sbgn_to_df(pathway) โ†’ pl.DataFrame

Database Downloads

download_kegg(pathway_id, species="hsa", kegg_dir=".") โ†’ dict
download_reactome(pathway_id, output_dir=".") โ†’ Path
download_metacyc(pathway_id, output_dir=".") โ†’ Path
list_reactome_pathways(species="Homo sapiens") โ†’ list[dict]
detect_database(pathway_id) โ†’ str

Highlighting

# API design (full implementation in progress)
result = pathview(...)
highlighted = result + highlight_nodes(["1956", "2099"], color="red")
highlighted.save("output.png")

Splines

cubic_bezier(p0, p1, p2, p3, n_points=50) โ†’ np.ndarray
quadratic_bezier(p0, p1, p2, n_points=50) โ†’ np.ndarray
catmull_rom_spline(points, n_points=50, alpha=0.5) โ†’ np.ndarray
route_edge_spline(source, target, obstacles, mode="orthogonal") โ†’ np.ndarray
bezier_to_svg_path(curve, close=False) โ†’ str

๐Ÿ“ˆ Performance

  • KEGG pathways: ~2-5 seconds (download + render)
  • SBGN pathways: ~3-8 seconds (more complex)
  • Multi-condition: Linear scaling with # conditions
  • Batch processing: Parallel processing possible

Optimization tips:

  • Cache downloaded files (automatic)
  • Use output_format="svg" for faster rendering
  • Disable color key for batch jobs: plot_col_key=False

๐Ÿค Contributing

Contributions welcome! Areas for improvement:

  1. SBGN rendering โ€” Improve glyph shape variety
  2. Edge routing โ€” Implement A* pathfinding for splines
  3. Database integration โ€” Add PANTHER, SMPDB auto-download
  4. Highlighting โ€” Wire up image modification backend
  5. Performance โ€” Parallel pathway processing

๐Ÿ“„ License

Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) โ€” See LICENSE file

Citations:

If you are publishing results obtained using Pathview-Plus, please cite:

  • Pre-Print Pathview-Plus: Figueroa III JL, Brouwer CR, White III RA. 2026. Pathview-plus: unlocking the metabolic pathways from cells to ecosystems. bioRxiv.

If you using the R version please cite:

  • Original Pathview R: Luo, W., & Brouwer, C. 2013. Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics, 29(14), 1830โ€“1831. Pathview
  • Original SBGNview R: Shashikant, T., et al. 2022. SBGNview: Data analysis, integration and visualization on all pathways using SBGN. Bioinformatics, 38(11), 3006โ€“3008. SBGNview

Contributing to Pathview-plus

We welcome contributions of other experts expanding features in Pathview-plus including the R and python versions. Please contact us via support.


๐Ÿ“ž Support


Made with โค๏ธ for the pathway visualization community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pathview_plus-2.0.2.tar.gz (53.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pathview_plus-2.0.2-py3-none-any.whl (55.6 kB view details)

Uploaded Python 3

File details

Details for the file pathview_plus-2.0.2.tar.gz.

File metadata

  • Download URL: pathview_plus-2.0.2.tar.gz
  • Upload date:
  • Size: 53.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for pathview_plus-2.0.2.tar.gz
Algorithm Hash digest
SHA256 78cfdddc27d4b28258bd463f952cb8a645b990ccb4be199fe2a889c12f01df77
MD5 90b3949033ac05c0bf41d48fb4bc5ae0
BLAKE2b-256 507bdd7f0812182d945c092149ebefd7f7bf01c31806c40f1be0c6a3e148a8db

See more details on using hashes here.

File details

Details for the file pathview_plus-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: pathview_plus-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 55.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for pathview_plus-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0f9992a2bfbf9eb9ed7528a7502b3b51500702e9ef15456a0cb5473bebcc2ee6
MD5 483994efd1f803e6a09c0052dcaff73f
BLAKE2b-256 0dd3a6ab7dd83f0286c41496a2dd2343e405ac7c0869b48914f0e0ce893b02ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page