Skip to main content

A package for generating source-target pair and node map from polars and pandas DataFrames for Sankey flow diagrams

Project description

IRENE-Sankey

Python PyPI - Version PyPI Downloads License: MIT GitHub

IRENE-Sankey is a Python library that enables the creation of customizable and informative source-target pair to create Sankey diagrams. It is designed to be intuitive for both beginners and experts, with flexible options for styling, data input, and configuration, making it easy to represent complex flows visually.

Table of Contents

Documentation

The full documentation is available on GitHub Pages.

Overview

IRENE-Sankey offers an easy-to-use interface for creating Sankey diagrams, which are ideal for visualizing flow distributions across different categories or entities. The package is built to work seamlessly with modern Polars or Pandas dataframes and provides a range of customization options for colors, labels, and node arrangements, making it perfect for data analysts, data scientists, and anyone interested in visualizing complex flows.

Features

  • Simple Sankey Diagrams: Easily create source-target pair and customize Sankey diagrams from Polars or Pandas DataFrames.
  • Customizable Styles: Customize colors, labels, node spacing, and flow thickness to suit your needs.
  • Support for Large Diagrams: Efficiently handles larger flows and multiple nodes.
  • Integrates with Polars and Pandas: Easily map columns and rows from Polars or Pandas DataFrames to nodes and links.

Installation

Install IRENE-Sankey using pip:

pip install irene-sankey

Note: Requires Python 3.10 or above.

Quick Start

Here’s a quick example to create a simple Sankey diagram with IRENE-Sankey.

import polars as pl

from irene_sankey.core.traverse import traverse_sankey_flow
from irene_sankey.plots.sankey import plot_irene_sankey_diagram

# Sample data to test the functionality
df = pl.DataFrame(
    {
        "country": ["NL","NL","NL","DE","DE","FR","FR","FR","US","US","US"],
        "industry": [
            "Technology","Finance","Healthcare",
            "Automotive","Engineering",
            "Technology","Agriculture","Healthcare",
            "Manufacturing","Technology","Finance"],
        "field": [
            "Software","Banking","Pharmaceuticals",
            "Car Manufacturing","Mechanical Engineering",
            "Software","Crop Science","Medical Devices",
            "Electronics","AI & Robotics","Investment Banking"],
    }
)

# Generate source-target pair, node map and link for Sankey diagrams
flow_df, node_map, link = traverse_sankey_flow(df, ["", "country", "industry", "field"])

# Plot Sankey diagram 
fig = plot_irene_sankey_diagram(node_map, link, title = "Irene-Sankey Demo", node_config={
        "pad": 10,
        "line": dict(color="black", width=1),
    }
)
fig.show()

Usage

The core function in the IRENE-Sankey package is traverse_sankey_flow. By passing a Polars or Pandas DataFrame and selecting string columns in the order of the flow, you can quickly generate a required source-target pair map to generate Sankey diagram.

Contribution

We welcome contributions! Visit our Github repository, and to contribute:

  1. Fork the repository.
  2. Create a branch (git checkout -b feature/NewFeature).
  3. Commit your changes (git commit -m 'Add NewFeature').
  4. Push to the branch (git push origin feature/NewFeature).
  5. Open a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

I would like to express my heartfelt gratitude to everyone who contributed their knowledge and support, making this project possible. Special thanks to Mike Bostock and Yan Holtz for their invaluable inspiration and insights, which profoundly influenced the direction and development of this project. Their expertise and knowledge were instrumental in shaping its final form. I am also grateful to the Plotly team for their incredible library, enabling the creation of beautiful, interactive visualizations that bring the data to life.

Special thanks to the contributors and the open-source community for their support and feedback. The Sankey visualization method is inspired by Matplotlib's Sankey capabilities, with enhancements for customization and usability.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

irene_sankey-2.0.1.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

irene_sankey-2.0.1-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file irene_sankey-2.0.1.tar.gz.

File metadata

  • Download URL: irene_sankey-2.0.1.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for irene_sankey-2.0.1.tar.gz
Algorithm Hash digest
SHA256 a50939fd4dd8596d5e93e2a3f43584c31fb3d83cd178aeef8a07957c5ad7416f
MD5 792046eabfd36f5c0e09d0fdecb43457
BLAKE2b-256 eb4b1312b09f08ba4426e1ad6a7d650ebcc35eed707c006e5ba4815c7eb7e840

See more details on using hashes here.

File details

Details for the file irene_sankey-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: irene_sankey-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for irene_sankey-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3181eb9703f2cd348affc2c6530d6660523eb39a4ae41ee85a1b08f05842c701
MD5 65e6a19b1f6144c9da13c3fece33edf9
BLAKE2b-256 d46cb7f14d2e36f1552db1c46c9a72a7e43a31ef10e10b476dfbc57dd0321fc6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page