Sample-conserving decision tree visualization for scikit-learn with Plotly and matplotlib renderers.
Project description
ifadw-treeviz
Sample-conserving decision tree visualization for scikit-learn. Every sample contributes a constant edge-width slice from the root to its leaf. Edge widths sum mathematically to the parent node's sample count at every branch — what you see is what the model splits.
Installation
Install directly from the public Codeberg repository:
pip install git+https://codeberg.org/IfaDW/ifadw-treeviz.git # Plotly only
pip install "ifadw-treeviz[mpl] @ git+https://codeberg.org/IfaDW/ifadw-treeviz.git" # plus matplotlib
PyPI and conda-forge distribution are planned; once available:
pip install ifadw-treeviz # Plotly only
pip install "ifadw-treeviz[mpl]" # plus matplotlib for PDF/PNG/SVG
conda install -c conda-forge ifadw-treeviz # conda-forge channel
Requires Python 3.12+.
Quick Start
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import ifadw_treeviz as itv
iris = load_iris()
clf = DecisionTreeClassifier(max_depth=4, random_state=0).fit(
iris.data, iris.target
)
# Interactive HTML
fig = itv.draw_tree(
clf,
feature_names=iris.feature_names,
class_names=iris.target_names.tolist(),
)
fig.write_html("iris_tree.html")
# Static PDF (requires the [mpl] extra)
fig_mpl = itv.draw_tree_mpl(
clf,
feature_names=iris.feature_names,
class_names=iris.target_names.tolist(),
)
fig_mpl.savefig("iris_tree.pdf", bbox_inches="tight", dpi=150)
Why?
Decision trees from scikit-learn are usually visualized via
sklearn.tree.plot_tree or export_graphviz. Both are functional but
visually flat: each node is a box, each edge is a line, sample
distributions live in text. They show the structure but not the flow.
pybaobabdt (Sengupta, van den Elzen, van Wijk; TU/e Eindhoven)
introduced a different paradigm: render the tree as a flow of sample
bands. Branch widths encode sample counts, class composition is visible
in stripes, the visual hierarchy emerges from the geometry itself.
Reading a pybaobabdt tree feels like reading a Sankey diagram — the
data flow is the picture.
ifadw-treeviz adopts that paradigm with a modern Python toolchain:
Plotly for interactivity, matplotlib for static export, type-checked
strict-mypy code, full pytest coverage, and a layered architecture
(layout, geometry, rendering as separate modules). The bands are
rendered as sample polygons along the centerline Bezier with
perpendicular offsets at every sample point, so band widths stay
consistent regardless of edge orientation and taper smoothly between
parent and child sample counts.
Features
- Multi-band Bezier flows: continuous bands from root to leaf, encoding class distribution and sample counts in geometry
- Two renderers: Plotly for interactive HTML with hover details, matplotlib for static PDF/PNG/SVG export
- Hover information: full feature names, sample counts, and class distributions visible on hover (Plotly)
- Phi-asymmetric layout: subtle deterministic asymmetry breaks the rigidity of strict Reingold-Tilford-Walker layouts
- Customizable branding: six font/label parameters, three built-in
palettes (
colorblinddefault,ifadw,viridis) - Word-wrap labels: long feature names are wrapped at word boundaries; truncated to 2 lines with ellipsis only when necessary
- Accessibility-first: colorblind-safe Okabe-Ito palette as default; high-contrast labels with semi-transparent backgrounds
Performance Notes
For trees with many nodes, rendering performance and HTML size matter:
- Recommended
max_depth: 4-6 for HTML output, up to 8-10 for PDF. Beyond that, the tree becomes hard to read regardless of rendering. max_depth=Noneis supported but produces extremely large outputs on real-world data. A depth-30 tree on 100k samples can have 50k+ nodes and several MB of HTML — unwieldy for interactive use.- Many classes (>8): the library auto-switches from the
colorblindpalette toviridisfor better distinguishability. Passpalette="tab20"for an alternative discrete palette with 20 colors, or passpalette=PALETTES["colorblind"](thePaletteinstance) to keep the cycled colorblind palette. - HTML-size warning: when the estimated output exceeds ~2 MB the
renderer emits a
UserWarninglisting the standard mitigations (smallermax_depth,edge_curve='straight', or static PDF/PNG viadraw_tree_mpl). - For static export (PDF/PNG/SVG via
draw_tree_mpl), tree size matters less since the renderer handles many nodes efficiently.
Sample Conservation (v0.9.3+)
ifadw-treeviz uses sample-conserving edge geometry: every sample
contributes a constant edge-width slice from the root to its leaf.
Edge widths sum mathematically to the parent node's sample count at
every branch. The root node renders as a horizontal line of fixed
width (default 0.7 layout units; configurable via
wurzel_linien_breite) on which all sample-bands originate at
proportional slots.
For trees with very many samples (>10k) or many leaves, individual
bands may reach sub-pixel widths at the default scale. The library
emits a UserWarning in this case. Workarounds:
- Reduce
max_depthto consolidate leaves. - Use
draw_tree_mplfor vector PDF export (sub-pixel bands remain precise in vector format). - Use
draw_tree_zoom_levels(...)to write a series of HTMLs at progressively higherwurzel_linien_breitescales — see below.
The Edge-Dicke-Legende rendered top-right (configurable via
legend_position) shows "1 Sample = N px" plus three demonstration
bands so users can visually calibrate band widths against sample
counts. At sub-pixel scaling the legend adapts its demo values (e.g.
200 / 2000 / 10000 instead of 1 / 10 / 50) so each band stays
visually distinct. Disable it with show_edge_width_legend=False.
Zoom and Detail Levels (v0.9.5+)
For interactive zoom in HTML output, use Plotly's built-in tools (no extra setup needed):
- Box zoom: click the "Zoom" button in the modebar and drag a rectangle, or hold Shift and drag.
- Scroll zoom: enable via
config={"scrollZoom": True}infig.write_html(..., config=...). - Reset view: double-click anywhere in the plot.
For trees with very dense classification (many classes x many leaves)
where the standard resolution renders sub-pixel bands, use the
draw_tree_zoom_levels(...) helper to write multiple HTMLs at
progressively higher pixel resolutions (v0.9.6 scales width /
height rather than the layout geometry — layout proportions stay
invariant, only the rendered SVG resolution grows):
import ifadw_treeviz as itv
itv.draw_tree_zoom_levels(
clf,
output_dir="./tree_zooms",
base_name="iris",
zoom_factors=(1.0, 3.0, 7.0),
feature_names=iris.feature_names,
class_names=iris.target_names.tolist(),
)
# Writes ./tree_zooms/iris_zoom1.0.html, ./tree_zooms/iris_zoom3.0.html, ...
For static export, prefer draw_tree_mpl and save to PDF — sub-pixel
bands remain precise in vector format regardless of scale.
Note: v0.9.4 shipped an experimental browser-side lazy-render with a vertical depth slider, but the JSON-embedding approach proved too heavy in practice (slow HTML loads, Firefox slider freezes). v0.9.5 rolls the experiment back to static rendering; the helpers above cover the practical use cases without the JS-side complexity.
Examples
The examples/ directory contains eight runnable scripts:
iris_basic.py— minimal Plotly classifier exampleiris_polish.py— branded example with custom palette and labelsiris_branding.py— IfaDW corporate-design paletteiris_long_labels.py— stress test with very long feature namesiris_matplotlib.py— static PDF exportdiabetes.py— regression tree examplesynthetic_classification.py— 10-class synthetic stress test that demonstrates the palette auto-switch and many-class behavioriris_legende_demo.py— v0.9.3 demo of the sample-conserving Wurzel-Linie + Edge-Dicke-Legendeiris_zoom_levels_demo.py— demo ofdraw_tree_zoom_levelsfor multi-resolution HTML export (v0.9.6: scales pixel resolution instead of layout geometry)
Examples Gallery
All screenshots below come from the matplotlib renderer; the Plotly
output is the same geometry plus interactive hover. Re-generate with
python docs/generate_screenshots.py.
Basic classification (Iris)
Polished branding
IfaDW corporate design
Long feature names with truncation
Regression tree (Diabetes)
Acknowledgments
This library is a reimplementation of the visualization paradigm introduced by pybaobabdt (Adrija Sengupta, Stef van den Elzen, Jarke van Wijk; TU/e Eindhoven), available at https://gitlab.tue.nl/20040367/pybaobab. Pybaobabdt is itself a Python library with Python dependencies (sklearn, numpy, pygraphviz, matplotlib, scipy, pandas); we acknowledge their foundational work.
ifadw-treeviz differs from pybaobabdt on technical grounds:
interactive Plotly output (in addition to static matplotlib),
strict-typed Python (mypy --strict), modern packaging
(pyproject.toml, pip install), and a layered architecture
separating layout, geometry, and rendering as distinct modules.
The implementation was developed by Daniel Daferner with significant assistance from Claude (Anthropic) for architecture, algorithm implementation, and test generation.
License
AGPL-3.0-or-later. See LICENSE.
Citation
If you use ifadw-treeviz in your research, please cite:
Daniel Daferner, IfaDW Institut für angewandte Datenwissenschaft GmbH.
ifadw-treeviz: Sample-conserving decision tree visualization.
Version 1.0.0. 2026. https://codeberg.org/IfaDW/ifadw-treeviz
See CITATION.cff for machine-readable citation data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ifadw_treeviz-1.0.1.tar.gz.
File metadata
- Download URL: ifadw_treeviz-1.0.1.tar.gz
- Upload date:
- Size: 165.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1a92f7ce625cd93a043cdb19ab7ef17e202d790dd5b9fe3ff542be216a146ff
|
|
| MD5 |
ee3af3771822b2860140b4bfe0ac9437
|
|
| BLAKE2b-256 |
137022f45957dcc2ad0d64a1d88ef4d5e748602a16f16ebd25125947bb23b317
|
File details
Details for the file ifadw_treeviz-1.0.1-py3-none-any.whl.
File metadata
- Download URL: ifadw_treeviz-1.0.1-py3-none-any.whl
- Upload date:
- Size: 101.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9054d6f153352b58421705f980b5e83b76cfe6b6089f2d56ead2e3aa9fd54263
|
|
| MD5 |
4a399be683af44799fdf55a8f4fa5459
|
|
| BLAKE2b-256 |
9f42f3c36632c9ada8aa89aef09f9e22e6c50d0c935eac941a956ea36c830a01
|