Publication-quality Sankey diagrams for scientific journals (Nature / Cell / Science style)

These details have not been verified by PyPI

Project links

Project description

sankey logo

Python License Presets Plotly

sankey — Create Flow Diagrams for Data Distribution Analysis

sankey is a Python package for producing publication-ready Sankey (alluvial) diagrams styled to match the visual conventions of leading scientific journals (Nature, Cell, Science). It provides a unified high-level API that accepts three common data representations (long-format edge tables, wide-format observational DataFrames, and capacity dictionaries), computes a flexible layout, applies journal-specific color schemes, and renders an interactive Plotly figure exportable to HTML, PNG, or PDF.

Highlights

Three input formats — long-format (edges, nodes), wide-format DataFrame, or capacity dictionaries
Journal-grade presets — ready-to-use schemes matching Nature, Cell, and Science aesthetics
Sinkhorn–Knopp flow generation — automatic flow matrix from per-node capacities
Flexible layout engine — fixed-gap, uniform, and proportional vertical spacing
Multi-format export — interactive HTML, high-resolution PNG, and vector PDF

Motivation
Installation
Quick Start
API Reference
Journal Presets
- Built-in Schemes
- Custom Schemes
Layout Methods
Color System
- Palettes
- Gradient Utilities
Architecture
Testing
Exporting Figures
Dependencies
License
Citing

Motivation

Sankey diagrams are widely used in systems biology, epidemiology, clinical cohort studies, and multi-omics integration to visualise the flow of observations across a sequence of categorical variables. Leading journals employ distinctive visual grammars—restrained colour palettes, subtle opacity layering, reserved treatment of residual flows, and clean sans-serif typography—that are tedious to reproduce manually in general-purpose plotting libraries.

sankey encapsulates these design rules in a set of versioned presets so that researchers can focus on their data rather than on fine-tuning plot aesthetics.

Installation

pip install sankey

For optional Matplotlib-backed colormap support and the development toolchain:

pip install sankey[colormaps]
pip install sankey[dev]

Requirements: Python ≥ 3.10, NumPy ≥ 1.24, Pandas ≥ 2.0, Plotly ≥ 5.14.

Quick Start

All three entry points below produce an identical internal representation and are rendered through the same pipeline.

1. Long-format (edges + nodes)

Full control over node identities, layer membership, and individual edge weights:

import pandas as pd
from sankey import sankey

nodes = pd.DataFrame({
    "name":  ["Amp", "Mut", "Del", "Path_A", "Path_B", "Path_C", "Cancer"],
    "layer": ["Inputs", "Inputs", "Inputs", "H1", "H1", "H1", "Outcome"],
    "is_residual": [False, False, False, False, False, False, False],
})

edges = pd.DataFrame({
    "source": [0, 0, 1, 1, 2, 2, 3, 4, 4, 5],
    "target": [3, 4, 4, 5, 5, 3, 6, 6, 6, 6],
    "value":  [30, 25, 15, 12, 18, 0, 55, 28, 12, 18],
})

fig = sankey((edges, nodes), preset="nature", height=500, width=1400,
             title="Long-format example")
fig.write_html("sankey.html")

2. Wide-format DataFrame

Each row is an observation; columns encode the successive categorical layers:

import pandas as pd
from sankey import sankey

df = pd.DataFrame({
    "Input":   ["Amp", "Amp", "Amp", "Mut", "Mut", "Mut", "Del", "Del"],
    "Process": ["Immune", "Metab", "Signal", "Immune", "Metab", "CellCycle", "Metab", "Signal"],
    "Outcome": ["Cancer"] * 8,
})

fig = sankey(df, layer_cols=["Input", "Process", "Outcome"], preset="cell")
fig.write_html("sankey_cell.html")

3. Capacity dictionary

Specify the total flow through each node; flows between layers are inferred via the Sinkhorn–Knopp algorithm:

from sankey import sankey

caps = {
    "Inputs":  [550, 270, 180],
    "H1":      [400, 350, 250],
    "Outcome": [1000],
}

fig = sankey(caps, preset="nature", seed=0)
fig.write_html("sankey_capacity.html")

API Reference

`sankey()`

def sankey(
    data=None,
    preset: str | dict | None = None,
    *,
    main_palette=None,
    input_colors=None,
    residual_color=None,
    outcome_color=None,
    x_method="auto",
    y_method="fixed_gap",
    gap=0.02,
    layer_cols=None,
    layer_pairs=None,
    seed=42,
    node_thickness=None,
    node_pad=None,
    font_family=None,
    font_size=None,
    height=700,
    width=2000,
    title=None,
    **layout_kwargs,
) -> go.Figure

Parameter	Type	Default	Description
`data`	`tuple`, `DataFrame`, or `dict`	—	Input data (see Quick Start).
`preset`	`str` \| `dict`	`"nature"`	Named preset (`"nature"`, `"cell"`, `"science"`) or a custom `Scheme` dict.
`main_palette`	`list[str]`	from preset	Override the main node colour palette.
`input_colors`	`list[str]`	from preset	Colours assigned to the first-layer nodes.
`residual_color`	`str`	from preset	Fill colour for residual nodes.
`outcome_color`	`str`	from preset	Fill colour for outcome-layer nodes.
`x_method`	`"auto"` \| `dict`	`"auto"`	X-positioning: `"auto"` spreads layers evenly; a dict maps layer → x.
`y_method`	`str`	`"fixed_gap"`	Vertical layout method (`"fixed_gap"`, `"uniform"`, `"proportional"`).
`gap`	`float`	`0.02`	Gap between nodes (used by `"fixed_gap"` only).
`layer_cols`	`list[str]`	`None`	Column order for wide-format DataFrames.
`layer_pairs`	`list[tuple]`	auto-derived	Adjacent-layer pairs for capacity input.
`seed`	`int`	`42`	Random seed for capacity-based flow generation.
`node_thickness`	`int`	from preset	Vertical thickness of node rectangles (pixels).
`node_pad`	`int`	from preset	Padding between nodes (pixels).
`font_family`	`str`	from preset	Font family for labels and annotations.
`font_size`	`int`	from preset	Base font size.
`height`	`int`	`700`	Figure height (pixels).
`width`	`int`	`2000`	Figure width (pixels).
`title`	`str`	`None`	Optional figure title.

Returns: plotly.graph_objects.Figure

`from_wide()`

def from_wide(df: pd.DataFrame, layer_cols: list[str]) -> tuple[pd.DataFrame, pd.DataFrame]

Converts a wide-format observational DataFrame into (nodes, edges) tables. Nodes named "Residual" (case-insensitive) are automatically flagged with is_residual = True.

`from_capacity()`

def from_capacity(
    capacities: dict[str, list[float]],
    layer_pairs: list[tuple[str, str]],
    seed: int = 42,
) -> tuple[pd.DataFrame, pd.DataFrame]

Generates (nodes, edges) from per-node total capacities using the Sinkhorn–Knopp algorithm to construct a doubly-stochastic flow matrix between each adjacent layer pair.

`load_preset()`

def load_preset(name: str, **overrides) -> Scheme

Returns a deep copy of the named preset, optionally merged with keyword overrides. Raises ValueError for unknown preset names.

`register_preset()`

def register_preset(name: str, config: Scheme) -> None

Registers a new named preset globally. Useful for institutional or lab-specific style guides.

Validation Utilities

def validate_conservation(edges: pd.DataFrame, nodes: pd.DataFrame) -> dict[str, float]
def validate_no_cycles(edges: pd.DataFrame) -> None

validate_conservation — checks that total flow is conserved across each adjacent layer pair; returns {"max_row_err": ..., "max_col_err": ...}.
validate_no_cycles — raises ValueError if any self-loop (source == target) is present, enforcing the DAG constraint required by Sankey diagrams.

Journal Presets

Built-in Schemes

Preset	Primary Hue	Font	Character
`"nature"`	Crimson-red	Arial	Warm, restrained, high contrast.
`"cell"`	Ocean-blue	Helvetica	Cool, clinical, minimalist.
`"science"`	Multichrome	Helvetica	Bold primaries, neutral grey background.

Each preset defines a complete visual scheme:

{
    "main_palette":        [...],   # 6-hex list for main nodes
    "gradient_method":     "sequential",
    "gradient_lighten":    0.6,
    "input_colors":        [...],   # 3-hex list for input layer
    "residual_color":      "#...",  # fill for residual nodes
    "residual_link_alpha": 0.38,    # opacity for residual links
    "residual_link_color": "#...",  # stroke for residual links
    "outcome_color":       "#...",  # fill for outcome nodes
    "outcome_link_alpha":  0.35,    # opacity for outcome links
    "default_link_alpha":  0.18,    # opacity for standard links
    "font_family":         "Arial",
    "font_size":           18,
    "node_thickness":      25,
    "node_pad":            80,
}

Custom Schemes

Pass a dictionary conforming to the Scheme TypedDict directly as the preset argument, or register it for reuse:

from sankey import sankey, register_preset

register_preset("my_lab", {
    "main_palette":   ["#2C3E50", "#E74C3C", "#3498DB", "#2ECC71", "#F39C12", "#9B59B6"],
    "input_colors":   ["#3498DB", "#2ECC71", "#F39C12"],
    "residual_color": "#F5F5F5",
    "outcome_color":  "#2C3E50",
    "font_family":    "Times New Roman",
    "font_size":      14,
    "node_thickness": 20,
    "node_pad":       60,
})

fig = sankey(data, preset="my_lab")

Layout Methods

Three vertical layout strategies are available:

Method	Description
`"fixed_gap"`	Constant vertical gap between nodes; node height proportional to capacity.
`"uniform"`	All nodes have equal height regardless of capacity.
`"proportional"`	Nodes fill the available vertical space in proportion to capacity; no gaps.

Horizontal (x) layout defaults to evenly-spaced layers ("auto") or accepts a manual dict mapping each layer name to an x-coordinate in [0, 1].

Color System

Palettes

The package bundles several perceptually-informed colour palettes available as standalone functions:

from sankey import palette_nature, palette_cell, palette_science
from sankey import palette_lancet, palette_colorbrewer, palette_viridis, palette_batlow
from sankey import palette_custom

# Static presets
nature_colors  = palette_nature()          # → 6-hex list
cell_colors    = palette_cell()            # → 6-hex list
science_colors = palette_science()         # → 6-hex list
lancet_colors  = palette_lancet()          # → 6-hex list

# Parameterised
cb_colors = palette_colorbrewer("Set1", n=5)   # ColorBrewer subsets
viridis   = palette_viridis(n=12)              # Viridis (Matplotlib optional)
batlow    = palette_batlow(n=12)               # Batlow (built-in fallback)
custom    = palette_custom(["#FF0000", "#00FF00", "#0000FF"])

Gradient Utilities

from sankey._colors import gradient_sequential, gradient_diverging, layer_gradient

sequential = gradient_sequential("#7B1515", "#F5D5D5", n=6)
diverging  = gradient_diverging("#313695", "#FFFFBF", "#A50026", n=11)
layer      = layer_gradient("#4A6FAF", n=5, lighten=0.6)

Architecture

sankey/
├── __init__.py       # Public API: sankey() entry point
├── _typing.py        # Type aliases: NodeTable, LinkTable, Scheme, ColorRule
├── _data.py          # Data ingestion: from_wide, from_capacity, validators
├── _layout.py        # Layout engine: auto_x, compute_y, compute_layout
├── _colors.py        # Colour system: palettes, gradients, rule matching
├── _presets.py       # Journal presets registry: load, list, register
└── _render.py        # Plotly renderer: node/link colouring, annotations

The pipeline follows a strict data → layout → render sequence:

Data ingestion (_data.py) — all input formats are normalised to (nodes: DataFrame, edges: DataFrame).
Layout computation (_layout.py) — x-positions are assigned per layer; y-positions are computed per node according to the chosen method.
Colour assignment (_colors.py, _render.py) — a rule engine matches nodes against conditions (is_residual, is_input, is_outcome, key-value equality) and applies colours, palettes, and link opacities.
Rendering (_render.py) — a Plotly go.Sankey trace is constructed with layer annotations, hover templates, and layout configuration.

Testing

The test suite covers all public modules and the end-to-end pipeline:

tests/
├── test_data.py         # from_wide, from_capacity, validators
├── test_layout.py       # auto_x, compute_y, compute_layout
├── test_colors.py       # palette functions, hex_to_rgba, gradients
├── test_presets.py      # load_preset, list_presets, register_preset
├── test_render.py       # render(), node/link colouring, annotations
├── test_integration.py  # sankey() full pipeline (all 3 input formats)
└── conftest.py          # Shared fixtures

Run the suite with:

pytest tests/ -v

Exporting Figures

Plotly figures support three output formats:

fig = sankey(data, preset="nature")

# Interactive HTML (browser-viewable, self-contained)
fig.write_html("figure.html")

# Raster image (specify scale for print resolution; scale=2 recommended)
fig.write_image("figure.png", width=1800, height=800, scale=2)

# Vector graphics (lossless, preferred for journal submission)
fig.write_image("figure.pdf", width=1800, height=800)

Note: PNG and PDF export require the kaleido package (pip install kaleido).

Dependencies

Package	Minimum Version	Required	Purpose
`numpy`	1.24	Yes	Numerical arrays, random sampling
`pandas`	2.0	Yes	Tabular data structures
`plotly`	5.14	Yes	Sankey trace construction & rendering
`matplotlib`	3.7	No	Extended colormap support
`pytest`	7.0	No	Test runner (dev only)
`kaleido`	—	No	PNG/PDF static image export

License

This project is distributed under a proprietary license. See the LICENSE file for full terms.

Free for personal and academic research use.
Generated figures may be included in academic papers, reports, and presentations.
Commercial use of any kind is prohibited.
Redistribution, sublicensing, or public disclosure of source code is strictly forbidden.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sankeyplot-0.1.0.tar.gz (19.8 kB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sankeyplot-0.1.0-py3-none-any.whl (19.0 kB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file sankeyplot-0.1.0.tar.gz.

File metadata

Download URL: sankeyplot-0.1.0.tar.gz
Upload date: Jun 14, 2026
Size: 19.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for sankeyplot-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a0eee25e80fde535f7ed2afadd831b7fcd6463ebdb88d1d8e80b4ff26b173a98`
MD5	`6932571e73bb0dcf8accefb03cd86a63`
BLAKE2b-256	`6cf17f4afa9e0f5765694f9ed4e26b93128c7fda048d442acc234aa5940ada56`

See more details on using hashes here.

File details

Details for the file sankeyplot-0.1.0-py3-none-any.whl.

File metadata

Download URL: sankeyplot-0.1.0-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 19.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for sankeyplot-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`10ab83407ca04b1238970dcd9a939c65a0d57f1f899ef4aa862e8f67b67dcdf3`
MD5	`32478c321c76abd716d4155624755aea`
BLAKE2b-256	`d3620577fdf9aa734c1e1aaf15d8dd34111296153d2d457dc79036b6b2cfd0aa`

See more details on using hashes here.

sankeyplot 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sankey — Create Flow Diagrams for Data Distribution Analysis

Highlights

Table of Contents

Motivation

Installation

Quick Start

1. Long-format (edges + nodes)

2. Wide-format DataFrame

3. Capacity dictionary

API Reference

sankey()

from_wide()

from_capacity()

load_preset()

register_preset()

Validation Utilities

Journal Presets

Built-in Schemes

Custom Schemes

Layout Methods

Color System

Palettes

Gradient Utilities

Architecture

Testing

Exporting Figures

Dependencies

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`sankey()`

`from_wide()`

`from_capacity()`

`load_preset()`

`register_preset()`