Skip to main content

Exact Integrated Gradients for tree ensembles.

Project description

TreeIG

TreeIG computes exact Integrated Gradients for tree ensembles. It decomposes the change in a fitted tree model's scalar output between a baseline input $x_0$ and an observation $x$ into additive feature contributions.

For each observation, TreeIG returns feature attributions $\phi_j$ satisfying

$$\sum_j \phi_j = F(x) - F(x_0),$$

where $F$ is the scalar model output being explained. For regression models, $F$ is the prediction. For supported classifiers, $F$ is the raw margin/logit, not the predicted probability.

Integrated Gradients (Sundararajan, Taly, and Yan, 2017) defines feature attributions by integrating model gradients along a straight-line path from a baseline $x_0$ to the observation $x$.

At first glance, Integrated Gradients appears mismatched with piecewise-constant tree models: gradients vanish almost everywhere and are undefined at split boundaries. The path-integral formulation resolves this. Rather than introducing numerical approximation error through quadrature, the tree structure permits an exact finite decomposition in which the attribution reduces to the sum of prediction jumps at split boundaries crossed along the integration path. The result is exact — no Monte Carlo sampling, no numerical quadrature, no approximation parameters.

Because TreeIG replaces numerical quadrature and sampling with a finite sum over split crossings, it is fast in practice. For many real-world models — hundreds of trees, hundreds of features, thousands of observations — attribution completes in under a millisecond on a modern laptop. (See the example notebook for timings.) For many typical use cases TreeIG is faster than TreeSHAP, which is itself considered fast.

Installation

Requires Python ≥ 3.9, NumPy, and Numba.

pip install treeig

Using TreeIG

TreeIG follows a familiar explainer pattern:

ig = treeig.TreeIG(model, baseline=x0)
phi = ig.attribute(X)

Why TreeIG?

Standard Integrated Gradients defines feature contributions by integrating model gradients along a straight-line path from a baseline input to the observation. Tree models are piecewise constant, so ordinary gradients are zero almost everywhere and undefined at split boundaries.

TreeIG uses the tree structure directly. Along the interpolation path

$$ x(t) = x0 + t \cdot (x - x0),\qquad 0 \le t \le 1, $$

a tree prediction changes only when the path crosses a split threshold. TreeIG finds those crossings exactly and assigns each prediction jump to the feature responsible for the crossing. For ensembles, contributions are summed across trees. The result is an exact additive decomposition without numerical quadrature.

The distributional-derivative perspective makes this precise. Along the interpolation path the prediction is piecewise constant, and its generalized derivative is a sum of localized impulses at split crossings. The path integral of each impulse is exactly the prediction jump at that crossing.

The top panel shows a step in the tree prediction along the interpolation path. The middle panel shows the corresponding distributional derivative: zero everywhere except at the split crossing. (Here, $\delta(t - t^\ast)$ is the Dirac delta distribution centered at $t^\ast$.) The bottom panel shows that the path integral localizes exactly at the crossing and recovers the prediction jump.

Standard numerical Integrated Gradients methods try to approximate these impulses using dense interpolation grids. TreeIG instead computes the split-crossing contributions analytically from the fitted tree structure.

Relation to SHAP and TreeSHAP

TreeIG and TreeSHAP answer different attribution questions and generally produce different decompositions. Neither dominates the other.

TreeIG answers: "How much does feature $j$ contribute to the change in prediction as we move continuously from baseline $x_0$ to observation $x$?"

  • Attribution is the integral of partial derivatives along the path from $x_0$ to $x$. (For piecewise-constant trees this integral reduces exactly to a sum of prediction jumps at split boundaries crossed along the path.)

TreeSHAP answers: "How much does feature $j$ shift the expected prediction, averaged over all possible subsets of the other features?"

  • Attribution is an average of discrete inclusion effects, where absent features are marginalized out over a background dataset. There is no path; the reference point is the expected prediction over the background distribution.

The methods differ in two fundamental ways.

First, TreeIG takes a specific baseline input $x_0$ as its reference, while TreeSHAP uses a background distribution.

Second, TreeIG measures contributions through calculus -- integrating how the prediction changes as features move continuously from their baseline values -- while TreeSHAP measures contributions through discrete feature inclusion, asking how much each feature changes the expected prediction when it enters a coalition.

SHAP's coalition construction is deliberately indifferent to the prediction surface between the background and the observation. A feature is either in the coalition or out — there is no interpolation, no path, no attention to what happens as the feature value moves from its background value to its observed value. The attribution is built entirely from discrete switches. This means SHAP explores a wide neighborhood of hybrid inputs, many of which may be far from any natural path between real observations, and measures how the model responds to that exploration.

IG by contrast follows a single specific path and pays close attention to everything that happens along it. The attribution accumulates exactly the prediction changes that occur as all features move continuously from their baseline values to their observed values, holding the model fixed throughout. Nothing synthetic is introduced — the model is only ever evaluated at convex combinations of two real inputs.

The practical implication: SHAP's breadth gives it sensitivity to how the model behaves across a wide range of feature combinations, including combinations that sit away from the natural data distribution. IG's specificity gives it a precise account of what the model does on a particular trajectory through input space. SHAP explores a neighborhood; IG traces a path.

For a linear model with $x_0$ equal to the background mean, TreeIG and TreeSHAP produce identical attributions. As the model becomes more nonlinear or the baseline $x_0$ diverges from the background distribution, the two methods increasingly disagree — reflecting genuine differences in the questions they answer rather than errors in either method.

Supported models

TreeIG currently supports tree models with finite numeric feature inputs.

Regression

  • sklearn.tree.DecisionTreeRegressor
  • sklearn.ensemble.RandomForestRegressor
  • sklearn.ensemble.ExtraTreesRegressor
  • sklearn.ensemble.GradientBoostingRegressor
  • xgboost.XGBRegressor
  • xgboost.Booster
  • lightgbm.LGBMRegressor
  • lightgbm.Booster

Classification (raw margins/logits only)

  • sklearn.ensemble.GradientBoostingClassifier
  • xgboost.XGBClassifier
  • lightgbm.LGBMClassifier

For classification models, TreeIG attributes raw margins or logits. It does not attribute predicted probabilities because these are not additive across trees.

TreeIG computes exact path decompositions directly from the fitted tree structure. Since tree representations differ substantially across implementations, each model family requires customized parsing and routing logic.

Not currently supported

TreeIG deliberately does not yet support:

  • CatBoost;
  • categorical splits;
  • missing-value routing (use feature augmentation for missingness);
  • probability-output attribution (because probability attribution is not additive);
  • probability-averaging or vote-share classifiers such as DecisionTreeClassifier, RandomForestClassifier, and ExtraTreesClassifier (because the produce probabilities, not scores).

Basic usage

import numpy as np
import treeig as tig

# model is a fitted supported tree model
x0 = X_train.mean(axis=0)
X_eval = X_test[:100]

ig = tig.TreeIG(model, baseline=x0)
phi = ig.attribute(X_eval)

phi has the same shape as X_eval. Row i, column j is the contribution of feature j to the model-output change from x0 to X_eval[i].

For regression models, the completeness property holds exactly:

np.testing.assert_allclose(
    phi.sum(axis=1),
    model.predict(X_eval) - model.predict(x0.reshape(1, -1))[0],
)

Diagnostics

Use explain when you want attributions together with completeness diagnostics.

ig = tig.TreeIG(model, baseline=x0)
phi, infos, summary = ig.explain(X_eval)

print(summary)

Each entry in infos contains diagnostics for one observation:

{
    "n_events":        ...,   # number of split-crossing events
    "endpoint_delta":  ...,   # F(x) - F(x0)
    "attribution_sum": ...,   # sum_j phi_j
    "residual":        ...,   # attribution_sum - endpoint_delta
    "abs_residual":    ...,
}

The summary dictionary reports aggregate residual and event-count statistics.

Classification targets

For binary additive-score classifiers, target=None and target=1 both attribute the positive-class margin. target=0 attributes the negative margin, implemented as the negative of the positive-class margin.

ig = tig.TreeIG(model, baseline=x0, target=1)
phi_pos = ig.attribute(X_eval)

ig = tig.TreeIG(model, baseline=x0, target=0)
phi_neg = ig.attribute(X_eval)

For multiclass classifiers, pass the class index explicitly.

ig = tig.TreeIG(model, baseline=x0, target=2)
phi_class_2 = ig.attribute(X_eval)

TreeIG attributes raw class margins. If probability-space explanations are needed, users should transform or interpret the margin-level contributions separately.

Functional interface

TreeIG also provides a direct functional interface.

phi, infos, summary = tig.compute(
    model,
    baseline=x0,
    X=X_eval,
)

Warmup

TreeIG uses Numba for fast parallel attribution kernels. The first call includes JIT compilation. You can compile in advance with warmup:

ig = tig.TreeIG(model, baseline=x0).warmup(X_eval[:3])
phi = ig.attribute(X_eval)

Subsequent calls on the same model are fast. Attribution for thousands of observations on a typical ensemble completes in well under a second after warmup.

Numerical conventions

TreeIG follows each backend's split-routing convention as closely as possible.

  • scikit-learn trees route left when x[j] <= threshold;
  • LightGBM numeric splits route left when x[j] <= threshold;
  • XGBoost numeric splits route left when x[j] < threshold using float32-style comparisons.

Inputs must be finite numeric arrays. Missing-value routing is not currently implemented, so NaN and Inf values raise errors.

Baselines

The baseline $x_0$ defines the reference point for the decomposition. Common choices include the training-sample mean, a median or representative observation, a domain-specific neutral input, or a fixed benchmark case.

The attribution always explains the difference between the model output at the observation and the model output at the chosen baseline. Different baselines answer different questions.

Interpretation

For an observation $x$, TreeIG reports how much each feature contributes to moving the model output from $F(x_0)$ to $F(x)$ along the straight-line path from $x_0$ to $x$. Positive contributions increase the scalar output relative to the baseline; negative contributions decrease it. The contributions are additive by construction.

Example: XGBoost regression

import numpy as np
import xgboost as xgb
import treeig as tig

model = xgb.XGBRegressor(
    n_estimators=100,
    max_depth=3,
    learning_rate=0.05,
    objective="reg:squarederror",
    random_state=0,
)
model.fit(X_train, y_train)

x0 = X_train.mean(axis=0)
X_eval = X_test[:100]

ig = tig.TreeIG(model, baseline=x0).warmup(X_eval[:3])
phi, infos, summary = ig.explain(X_eval)

print(phi.shape)
print(summary["max_abs_residual"])

Example: multiclass classification margins

import lightgbm as lgb
import treeig as tig

model = lgb.LGBMClassifier(...)
model.fit(X_train, y_train)

x0 = X_train.mean(axis=0)
X_eval = X_test[:100]

# Attribute class-2 raw margin
ig = tig.TreeIG(model, baseline=x0, target=2)
phi = ig.attribute(X_eval)

Project status

TreeIG is production-ready for exact attribution of fitted tree models in raw-output space. The current release covers the dominant tree ensemble backends in the Python ecosystem.

Future extensions may include:

  • CatBoost support, which requires customized analysis of oblivious trees and categorical split structure;
  • alternative allocation rules for simultaneous multi-feature effects at coincident split crossings.

References

TreeIG:

Integrated Gradients:

  • Sundararajan, Mukund, Ankur Taly, and Qiqi Yan. 2017. "Axiomatic Attribution for Deep Networks." International Conference on Machine Learning (ICML).

SHAP and TreeSHAP:

  • Lundberg, Scott M., and Su-In Lee. 2017. "A Unified Approach to Interpreting Model Predictions." Advances in Neural Information Processing Systems (NeurIPS).

  • Lundberg, Scott M., Gabriel Erion, and Su-In Lee. 2020. "From Local Explanations to Global Understanding with Explainable AI for Trees." Nature Machine Intelligence.

Popular implementations of Integrated Gradients for smooth models:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

treeig-0.1.5.tar.gz (34.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

treeig-0.1.5-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file treeig-0.1.5.tar.gz.

File metadata

  • Download URL: treeig-0.1.5.tar.gz
  • Upload date:
  • Size: 34.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for treeig-0.1.5.tar.gz
Algorithm Hash digest
SHA256 597020f304c12c53b28949347a182286af0fd5d46a7ec3efe1526d674d091bb8
MD5 9aa5a10851513f16fe731e285fe923ba
BLAKE2b-256 f03cf62c33c1450b313868fb6dae989c69fbce55a4042a7bc8add6f2cfc6f0ad

See more details on using hashes here.

File details

Details for the file treeig-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: treeig-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for treeig-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 6395f579255c9fb54a19962499ff7d39f93e7045510c956f381af4dc310650e4
MD5 af77b09a53c12a1a68819de649d22f91
BLAKE2b-256 c0300639db2d4a41ab0fe295544082bf0e822a4396751998ad7c521e2f3cb5d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page