Skip to main content

A package to visualize decision trees as Sankey diagrams using Plotly.

Project description

DecisionTree-to-Sankey

DecisionTree-to-Sankey is a Python library that visualizes decision trees (from scikit-learn) as interactive Sankey diagrams using Plotly. Decision trees are known for their interpretability, but large trees can become hard to read when plotted traditionally. This library presents decision trees as Sankey diagrams, where nodes can be dragged to adjust for overlapping labels, and conditions can be inspected interactively by hovering over the branches.

Features

  • Interactive Sankey Diagram: Visualize decision trees with adjustable nodes and hover-over conditions.
  • Improved Readability: Handles overlapping labels by allowing users to drag nodes.
  • Easy Integration: Use this tool with any decision tree created by scikit-learn.

Installation

  • Option 1: Installing via pip (if published to PyPI) Once the package is published on PyPI, you can install it via pip:
pip install decisiontree-to-sankey
  • Option 2: Manual Installation Clone the repository and install the environment via conda:
  1. Clone the repository:
git clone https://github.com/LukeADay/DecisionTree-to-Sankey.git
  1. Create the conda environment:
conda env create -f environment.yml
  1. Activate the environment
conda activate tree-sankey-visualizer
  1. Install the package
pip install .

Usage

Example Code

After installing the library, you can import and use it as follows:

from decisiontree_to_sankey import DecisionTree_to_Sankey
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Sample data
data = pd.DataFrame({
    'Feature1': [1, 2, 3, 4],
    'Feature2': [5, 6, 7, 8]
})
target = [0, 1, 0, 1]

# Train a decision tree
clf = DecisionTreeClassifier()
clf.fit(data, target)

# Create and visualize the Sankey diagram
dt_sankey = DecisionTree_to_Sankey(clf, data)
dt_sankey.create_sankey()  # Displays the interactive Sankey diagram

Warning check the complexity of the tree before plotting. A tree that is too complex will not work - this requires judgement. See the complexity of the trained model:

try:
    n_leaves = regressor.get_n_leaves()
    depth = regressor.get_depth()
    print(f"Regressor is trained with {n_leaves} leaves and depth {depth}.")
except AttributeError:
    print("The regressor is not trained (empty).")

Output Example

The following is an example of the Sankey diagram output. Nodes overlap initially, but the interactive version allows you to drag nodes around for better readability:

Sankey Diagram

See sankey_diagram.html for an interactive version.

Repository Structure

├── LICENSE
├── README.md
├── conda_requirements.txt
├── environment.yml
├── examples
│   ├── __init__.py
│   ├── penguine_dataset_example.ipynb
│   ├── penguine_dataset_example.py
│   ├── sankey_diagram.html
│   └── sankey_diagram.png
├── requirements.txt
├── setup.py
├── src
│   ├── DecisionTree_To_Sankey.py
│   ├── __init__.py
│   └── __pycache__
│       ├── DecisionTree_To_Sankey.cpython-310.pyc
│       └── __init__.cpython-310.pyc
└── tests
    ├── __pycache__
    │   └── test_decisiontree_to_sankey.cpython-310.pyc
    └── test_decisiontree_to_sankey.py
  • The src/decisiontree_to_sankey module contains the core DecisionTree_to_Sankey class.
  • Examples of how to use the library are available in the examples/ folder.
  • Unit tests are provided in the tests/ folder.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decisiontree_to_sankey-0.2.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

decisiontree_to_sankey-0.2-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file decisiontree_to_sankey-0.2.tar.gz.

File metadata

  • Download URL: decisiontree_to_sankey-0.2.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.10

File hashes

Hashes for decisiontree_to_sankey-0.2.tar.gz
Algorithm Hash digest
SHA256 e09554415f10d11ab12754b5511c4bf6526c4fbcdc332d2d3a1e665279e0614a
MD5 8f9298218c942ac39ca102982fb0b987
BLAKE2b-256 5d3122669dbcd3e325bf63e6b53f5f68dc1901b84c919b808bcbe7890920de6d

See more details on using hashes here.

File details

Details for the file decisiontree_to_sankey-0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for decisiontree_to_sankey-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c134d206ed9fc1ea7d037f600b277b3044aa489b9709e45eb07c97110f6edce7
MD5 6642ebc1278dd7cbb58aa52458bd3211
BLAKE2b-256 846490523c829c6ec3b0e1633541aa3bd7532caf46ff385220cb0189ed8c2ebb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page