Publication-quality Sankey diagrams for scientific journals (Nature / Cell / Science style)
Project description
sankey — Create Flow Diagrams for Data Distribution Analysis
sankey is a Python package for producing publication-ready Sankey (alluvial)
diagrams styled to match the visual conventions of leading scientific journals
(Nature, Cell, Science). It provides a unified high-level API that accepts
three common data representations (long-format edge tables, wide-format
observational DataFrames, and capacity dictionaries), computes a flexible
layout, applies journal-specific color schemes, and renders an interactive
Plotly figure exportable to HTML, PNG, or PDF.
Highlights
- Three input formats — long-format
(edges, nodes), wide-formatDataFrame, or capacity dictionaries - Journal-grade presets — ready-to-use schemes matching Nature, Cell, and Science aesthetics
- Sinkhorn–Knopp flow generation — automatic flow matrix from per-node capacities
- Flexible layout engine — fixed-gap, uniform, and proportional vertical spacing
- Multi-format export — interactive HTML, high-resolution PNG, and vector PDF
Table of Contents
- Motivation
- Installation
- Quick Start
- API Reference
- Journal Presets
- Layout Methods
- Color System
- Architecture
- Testing
- Exporting Figures
- Dependencies
- License
- Citing
Motivation
Sankey diagrams are widely used in systems biology, epidemiology, clinical cohort studies, and multi-omics integration to visualise the flow of observations across a sequence of categorical variables. Leading journals employ distinctive visual grammars—restrained colour palettes, subtle opacity layering, reserved treatment of residual flows, and clean sans-serif typography—that are tedious to reproduce manually in general-purpose plotting libraries.
sankey encapsulates these design rules in a set of versioned presets so
that researchers can focus on their data rather than on fine-tuning plot
aesthetics.
Installation
pip install sankey
For optional Matplotlib-backed colormap support and the development toolchain:
pip install sankey[colormaps]
pip install sankey[dev]
Requirements: Python ≥ 3.10, NumPy ≥ 1.24, Pandas ≥ 2.0, Plotly ≥ 5.14.
Quick Start
All three entry points below produce an identical internal representation and are rendered through the same pipeline.
1. Long-format (edges + nodes)
Full control over node identities, layer membership, and individual edge weights:
import pandas as pd
from sankey import sankey
nodes = pd.DataFrame({
"name": ["Amp", "Mut", "Del", "Path_A", "Path_B", "Path_C", "Cancer"],
"layer": ["Inputs", "Inputs", "Inputs", "H1", "H1", "H1", "Outcome"],
"is_residual": [False, False, False, False, False, False, False],
})
edges = pd.DataFrame({
"source": [0, 0, 1, 1, 2, 2, 3, 4, 4, 5],
"target": [3, 4, 4, 5, 5, 3, 6, 6, 6, 6],
"value": [30, 25, 15, 12, 18, 0, 55, 28, 12, 18],
})
fig = sankey((edges, nodes), preset="nature", height=500, width=1400,
title="Long-format example")
fig.write_html("sankey.html")
2. Wide-format DataFrame
Each row is an observation; columns encode the successive categorical layers:
import pandas as pd
from sankey import sankey
df = pd.DataFrame({
"Input": ["Amp", "Amp", "Amp", "Mut", "Mut", "Mut", "Del", "Del"],
"Process": ["Immune", "Metab", "Signal", "Immune", "Metab", "CellCycle", "Metab", "Signal"],
"Outcome": ["Cancer"] * 8,
})
fig = sankey(df, layer_cols=["Input", "Process", "Outcome"], preset="cell")
fig.write_html("sankey_cell.html")
3. Capacity dictionary
Specify the total flow through each node; flows between layers are inferred via the Sinkhorn–Knopp algorithm:
from sankey import sankey
caps = {
"Inputs": [550, 270, 180],
"H1": [400, 350, 250],
"Outcome": [1000],
}
fig = sankey(caps, preset="nature", seed=0)
fig.write_html("sankey_capacity.html")
API Reference
sankey()
def sankey(
data=None,
preset: str | dict | None = None,
*,
main_palette=None,
input_colors=None,
residual_color=None,
outcome_color=None,
x_method="auto",
y_method="fixed_gap",
gap=0.02,
layer_cols=None,
layer_pairs=None,
seed=42,
node_thickness=None,
node_pad=None,
font_family=None,
font_size=None,
height=700,
width=2000,
title=None,
**layout_kwargs,
) -> go.Figure
| Parameter | Type | Default | Description |
|---|---|---|---|
data |
tuple, DataFrame, or dict |
— | Input data (see Quick Start). |
preset |
str | dict |
"nature" |
Named preset ("nature", "cell", "science") or a custom Scheme dict. |
main_palette |
list[str] |
from preset | Override the main node colour palette. |
input_colors |
list[str] |
from preset | Colours assigned to the first-layer nodes. |
residual_color |
str |
from preset | Fill colour for residual nodes. |
outcome_color |
str |
from preset | Fill colour for outcome-layer nodes. |
x_method |
"auto" | dict |
"auto" |
X-positioning: "auto" spreads layers evenly; a dict maps layer → x. |
y_method |
str |
"fixed_gap" |
Vertical layout method ("fixed_gap", "uniform", "proportional"). |
gap |
float |
0.02 |
Gap between nodes (used by "fixed_gap" only). |
layer_cols |
list[str] |
None |
Column order for wide-format DataFrames. |
layer_pairs |
list[tuple] |
auto-derived | Adjacent-layer pairs for capacity input. |
seed |
int |
42 |
Random seed for capacity-based flow generation. |
node_thickness |
int |
from preset | Vertical thickness of node rectangles (pixels). |
node_pad |
int |
from preset | Padding between nodes (pixels). |
font_family |
str |
from preset | Font family for labels and annotations. |
font_size |
int |
from preset | Base font size. |
height |
int |
700 |
Figure height (pixels). |
width |
int |
2000 |
Figure width (pixels). |
title |
str |
None |
Optional figure title. |
Returns: plotly.graph_objects.Figure
from_wide()
def from_wide(df: pd.DataFrame, layer_cols: list[str]) -> tuple[pd.DataFrame, pd.DataFrame]
Converts a wide-format observational DataFrame into (nodes, edges) tables.
Nodes named "Residual" (case-insensitive) are automatically flagged with
is_residual = True.
from_capacity()
def from_capacity(
capacities: dict[str, list[float]],
layer_pairs: list[tuple[str, str]],
seed: int = 42,
) -> tuple[pd.DataFrame, pd.DataFrame]
Generates (nodes, edges) from per-node total capacities using the
Sinkhorn–Knopp algorithm to construct a doubly-stochastic flow matrix
between each adjacent layer pair.
load_preset()
def load_preset(name: str, **overrides) -> Scheme
Returns a deep copy of the named preset, optionally merged with keyword
overrides. Raises ValueError for unknown preset names.
register_preset()
def register_preset(name: str, config: Scheme) -> None
Registers a new named preset globally. Useful for institutional or lab-specific style guides.
Validation Utilities
def validate_conservation(edges: pd.DataFrame, nodes: pd.DataFrame) -> dict[str, float]
def validate_no_cycles(edges: pd.DataFrame) -> None
validate_conservation— checks that total flow is conserved across each adjacent layer pair; returns{"max_row_err": ..., "max_col_err": ...}.validate_no_cycles— raisesValueErrorif any self-loop (source == target) is present, enforcing the DAG constraint required by Sankey diagrams.
Journal Presets
Built-in Schemes
| Preset | Primary Hue | Font | Character |
|---|---|---|---|
"nature" |
Crimson-red | Arial | Warm, restrained, high contrast. |
"cell" |
Ocean-blue | Helvetica | Cool, clinical, minimalist. |
"science" |
Multichrome | Helvetica | Bold primaries, neutral grey background. |
Each preset defines a complete visual scheme:
{
"main_palette": [...], # 6-hex list for main nodes
"gradient_method": "sequential",
"gradient_lighten": 0.6,
"input_colors": [...], # 3-hex list for input layer
"residual_color": "#...", # fill for residual nodes
"residual_link_alpha": 0.38, # opacity for residual links
"residual_link_color": "#...", # stroke for residual links
"outcome_color": "#...", # fill for outcome nodes
"outcome_link_alpha": 0.35, # opacity for outcome links
"default_link_alpha": 0.18, # opacity for standard links
"font_family": "Arial",
"font_size": 18,
"node_thickness": 25,
"node_pad": 80,
}
Custom Schemes
Pass a dictionary conforming to the Scheme TypedDict directly as the preset
argument, or register it for reuse:
from sankey import sankey, register_preset
register_preset("my_lab", {
"main_palette": ["#2C3E50", "#E74C3C", "#3498DB", "#2ECC71", "#F39C12", "#9B59B6"],
"input_colors": ["#3498DB", "#2ECC71", "#F39C12"],
"residual_color": "#F5F5F5",
"outcome_color": "#2C3E50",
"font_family": "Times New Roman",
"font_size": 14,
"node_thickness": 20,
"node_pad": 60,
})
fig = sankey(data, preset="my_lab")
Layout Methods
Three vertical layout strategies are available:
| Method | Description |
|---|---|
"fixed_gap" |
Constant vertical gap between nodes; node height proportional to capacity. |
"uniform" |
All nodes have equal height regardless of capacity. |
"proportional" |
Nodes fill the available vertical space in proportion to capacity; no gaps. |
Horizontal (x) layout defaults to evenly-spaced layers ("auto") or accepts
a manual dict mapping each layer name to an x-coordinate in [0, 1].
Color System
Palettes
The package bundles several perceptually-informed colour palettes available as standalone functions:
from sankey import palette_nature, palette_cell, palette_science
from sankey import palette_lancet, palette_colorbrewer, palette_viridis, palette_batlow
from sankey import palette_custom
# Static presets
nature_colors = palette_nature() # → 6-hex list
cell_colors = palette_cell() # → 6-hex list
science_colors = palette_science() # → 6-hex list
lancet_colors = palette_lancet() # → 6-hex list
# Parameterised
cb_colors = palette_colorbrewer("Set1", n=5) # ColorBrewer subsets
viridis = palette_viridis(n=12) # Viridis (Matplotlib optional)
batlow = palette_batlow(n=12) # Batlow (built-in fallback)
custom = palette_custom(["#FF0000", "#00FF00", "#0000FF"])
Gradient Utilities
from sankey._colors import gradient_sequential, gradient_diverging, layer_gradient
sequential = gradient_sequential("#7B1515", "#F5D5D5", n=6)
diverging = gradient_diverging("#313695", "#FFFFBF", "#A50026", n=11)
layer = layer_gradient("#4A6FAF", n=5, lighten=0.6)
Architecture
sankey/
├── __init__.py # Public API: sankey() entry point
├── _typing.py # Type aliases: NodeTable, LinkTable, Scheme, ColorRule
├── _data.py # Data ingestion: from_wide, from_capacity, validators
├── _layout.py # Layout engine: auto_x, compute_y, compute_layout
├── _colors.py # Colour system: palettes, gradients, rule matching
├── _presets.py # Journal presets registry: load, list, register
└── _render.py # Plotly renderer: node/link colouring, annotations
The pipeline follows a strict data → layout → render sequence:
- Data ingestion (
_data.py) — all input formats are normalised to(nodes: DataFrame, edges: DataFrame). - Layout computation (
_layout.py) — x-positions are assigned per layer; y-positions are computed per node according to the chosen method. - Colour assignment (
_colors.py,_render.py) — a rule engine matches nodes against conditions (is_residual,is_input,is_outcome, key-value equality) and applies colours, palettes, and link opacities. - Rendering (
_render.py) — a Plotlygo.Sankeytrace is constructed with layer annotations, hover templates, and layout configuration.
Testing
The test suite covers all public modules and the end-to-end pipeline:
tests/
├── test_data.py # from_wide, from_capacity, validators
├── test_layout.py # auto_x, compute_y, compute_layout
├── test_colors.py # palette functions, hex_to_rgba, gradients
├── test_presets.py # load_preset, list_presets, register_preset
├── test_render.py # render(), node/link colouring, annotations
├── test_integration.py # sankey() full pipeline (all 3 input formats)
└── conftest.py # Shared fixtures
Run the suite with:
pytest tests/ -v
Exporting Figures
Plotly figures support three output formats:
fig = sankey(data, preset="nature")
# Interactive HTML (browser-viewable, self-contained)
fig.write_html("figure.html")
# Raster image (specify scale for print resolution; scale=2 recommended)
fig.write_image("figure.png", width=1800, height=800, scale=2)
# Vector graphics (lossless, preferred for journal submission)
fig.write_image("figure.pdf", width=1800, height=800)
Note: PNG and PDF export require the kaleido package (pip install kaleido).
Dependencies
| Package | Minimum Version | Required | Purpose |
|---|---|---|---|
numpy |
1.24 | Yes | Numerical arrays, random sampling |
pandas |
2.0 | Yes | Tabular data structures |
plotly |
5.14 | Yes | Sankey trace construction & rendering |
matplotlib |
3.7 | No | Extended colormap support |
pytest |
7.0 | No | Test runner (dev only) |
kaleido |
— | No | PNG/PDF static image export |
License
This project is distributed under a proprietary license. See the LICENSE file for full terms.
- Free for personal and academic research use.
- Generated figures may be included in academic papers, reports, and presentations.
- Commercial use of any kind is prohibited.
- Redistribution, sublicensing, or public disclosure of source code is strictly forbidden.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sankeyplot-0.1.0.tar.gz.
File metadata
- Download URL: sankeyplot-0.1.0.tar.gz
- Upload date:
- Size: 19.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0eee25e80fde535f7ed2afadd831b7fcd6463ebdb88d1d8e80b4ff26b173a98
|
|
| MD5 |
6932571e73bb0dcf8accefb03cd86a63
|
|
| BLAKE2b-256 |
6cf17f4afa9e0f5765694f9ed4e26b93128c7fda048d442acc234aa5940ada56
|
File details
Details for the file sankeyplot-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sankeyplot-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10ab83407ca04b1238970dcd9a939c65a0d57f1f899ef4aa862e8f67b67dcdf3
|
|
| MD5 |
32478c321c76abd716d4155624755aea
|
|
| BLAKE2b-256 |
d3620577fdf9aa734c1e1aaf15d8dd34111296153d2d457dc79036b6b2cfd0aa
|