Skip to main content

Creating Sankey flow diagrams in Matplotlib

Project description

SankeyFlow

SankeyFlow is a lightweight python package that plots Sankey flow diagrams using Matplotlib.

sankey example

import matplotlib.pyplot as plt
from sankeyflow import Sankey

flows = [
    ('Product', 'Total revenue', 20779),
    ('Sevice and other', 'Total revenue', 30949),
    ('Total revenue', 'Gross margin', 34768),
    ('Total revenue', 'Cost of revenue', 16960),
    ...
]
s = Sankey(flows=flows)
s.draw()
plt.show()

See example/msft_FY22q2.py for full example.

Description

While Matplotlib does have a builtin sankey class, it is designed around single node flows. SankeyFlow instead focuses on directional flows, and looks more similar to plotly and SankeyMATIC. It also treats nodes and flows separately, so the node value, inflows, and outflows don't have to be equal.

cutflow example

SankeyFlow is also fully transparent with Matplotlib; the sankey diagram requires only an axis to be drawn: Sankey.draw(ax). All elements in the diagram are Matplotlib primitives (Patch and Text), and can be directly modified with the full suite of Matplotlib options.

Installation

Requires Matplotlib and numpy.

python3 -m pip install sankeyflow

You can then simpliy

from sankeyflow import Sankey

Usage

The core class is sankeyflow.Sankey, which builds and draws the diagram. Data is passed in the constructor or with Sankey.sankey(flows, nodes), and the diagram is drawn with Sankey.draw(ax).

The diagram defaults to a left-to-right flow pattern, and breaks the nodes into "levels," which correspond to the x position. The cutflow diagram above has 5 levels, for example.

  • nodes is a nested list of length nlevels, ordered from left to right. For each level, there is a list of nodes ordered from top to bottom. Each node is a (name, value) pair.
  • flows is a list of flows, coded as (source, destination, value). source and destination should match the names in nodes. Additionally, if value is given as a tuple, the value can represent the start and end value of the flow, respectively.

If nodes is None, the nodes will be automatically inferred and placed from the flows.

nodes = [
    [('A', 10)],
    [('B1', 4), ('B2', 5)],
    [('C', 3)]
]
flows = [
    ('A', 'B1', 4),
    ('A', 'B2', 5),
    ('B1', 'C', 1),
    ('B2', 'C', 2),
] 

plt.figure(figsize=(4, 3), dpi=144)
s = Sankey(flows=flows, nodes=nodes)
s.draw()

example 1

Configuration

Diagram and global configuration are set in the constructor. Individual nodes and flows can be further modified by adding a dictionary containing configuration arguments to the input tuples in Sankey.sankey(). See docstrings for complete argument lists.

For example, we can change the colormap to pastel, make all flows not curvy, and change the color of one flow.

flows = [
    ('A', 'B1', 4),
    ('A', 'B2', 5),
    ('B1', 'C', 1),
    ('B2', 'C', 2, {'color': 'red'}),
] 

s = Sankey(
    flows=flows,
    nodes=nodes,
    cmap=plt.cm.Pastel1,
    flow_opts=dict(curvature=0),
)
s.draw()

example 2

By default the color of the flows is the color of the destination node. This can be altered globally or per-flow with flow_color_mode.

flows = [
    ('A', 'B1', 4),
    ('A', 'B2', 5, {'flow_color_mode': 'dest'}),
    ('B1', 'C', 1),
    ('B2', 'C', 2),
] 

s = Sankey(
    flows=flows,
    nodes=nodes,
    flow_color_mode='source',
)
s.draw()

example 3

We can also easily adjust the label formatting and other node properties in the same way.

nodes = [
    [('A', 10)],
    [('B1', 4), ('B2', 5)],
    [('C', 3, {'label_pos':'top'})]
]
flows = [
    ('A', 'B1', 4),
    ('A', 'B2', 5),
    ('B1', 'C', 1),
    ('B2', 'C', 2),
] 

s = Sankey(
    flows=flows,
    nodes=nodes,
    node_opts=dict(label_format='{label} ${value:.2f}'),
)
s.draw()

example 4

Automatic Node Inference

Nodes can be automatically inferred from the flows by setting nodes=None in Sankey.sankey(). They are placed in the order they appear in the flows.

gross = [
    ('Gross margin', 'Operating\nincome', 200),
    ('Gross margin', 'MG&A', 100), 
    ('Gross margin', 'R&D', 100), 
]
income = [
    ('Operating\nincome', 'Income', 200),
    ('Other income', 'Income', 100, {'flow_color_mode': 'source'}),
]

plt.subplot(121)
s1 = Sankey(flows=gross + income)
s1.draw()

plt.subplot(122)
s2 = Sankey(flows=gross[:1] + income + gross[1:])
s2.draw()

plt.tight_layout()

example 5

If you want to configure individual nodes while using the automatic inference, you can either access the nodes directly:

s = Sankey(flows=flows)
s.find_node('name')[0].label = 'My label'

or retrieve the inferred nodes and edit the list before passing to Sankey.sankey():

nodes = Sankey.infer_nodes(flows)
# edit nodes
s.sankey(flows, nodes)

The latter is the only way to edit the ordering or level of the inferred nodes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sankeyflow-0.4.0.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

sankeyflow-0.4.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file sankeyflow-0.4.0.tar.gz.

File metadata

  • Download URL: sankeyflow-0.4.0.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.6

File hashes

Hashes for sankeyflow-0.4.0.tar.gz
Algorithm Hash digest
SHA256 78718b7c4e59b9fffef23a6acfdc76b8f386a0b17d6860a2fc064b7ac4b748d6
MD5 b3bd2bfcd6b8def01d616b86f183e151
BLAKE2b-256 1dea539b141d9ebee781b38333e76c435d51471e208dcfc8772759b99e5dda01

See more details on using hashes here.

File details

Details for the file sankeyflow-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: sankeyflow-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.62.3 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.6

File hashes

Hashes for sankeyflow-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e84b37cf9e77b595ba721fe49fdf09d3ccedfc253ed0c9dede574d2a55105859
MD5 b3ffe35e009e95f6e4798ed6b51e9a83
BLAKE2b-256 24761a5bc62267f6eddfd2544807128ad7742f5c87c4c695e930bd2ebed1a352

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page