Skip to main content

Influence Estimation for Gradient-Boosted Decision Trees

Project description

TreeInfluence: Influence Estimation for Gradient-Boosted Decision Trees

PyPi version Python version Github License Build

tree-influence is a python library that implements influence estimation for gradient-boosted decision trees (GBDTs), adapting popular techniques such as TracIn and Influence Functions to GBDTs. This library is compatible with all major GBDT frameworks including LightGBM, XGBoost, CatBoost, and SKLearn.

illustration

Installation

pip install tree-influence

Usage

Simple example using BoostIn to identify the most influential training instances to a given test instance:

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from lightgbm import LGBMClassifier
from tree_influence.explainers import BoostIn

# load iris data
data = load_iris()
X, y = data['data'], data['target']

# use two classes, then split into train and test
idxs = np.where(y != 2)[0]
X, y = X[idxs], y[idxs]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=1)

# train GBDT model
model = LGBMClassifier().fit(X_train, y_train)

# fit influence estimator
explainer = BoostIn().fit(model, X_train, y_train)

# estimate training influences on each test instance
influence = explainer.get_local_influence(X_test, y_test)  # shape=(no. train, no. test)

# extract influence values for the first test instance
values = influence[:, 0]  # shape=(no. train,)

# sort training examples from:
# - most positively influential (decreases loss of the test instance the most), to
# - most negatively influential (increases loss of the test instance the most)
training_idxs = np.argsort(values)[::-1]

Supported Estimators

tree-influence supports the following influence-estimation techniques in GBDTs:

Method Description
BoostIn Traces the influence of a training instance throughout the training process (adaptation of TracIn).
TREX Trains a surrogate kernel model that approximates the original model and decomposes any prediction into a weighted sum of the training examples (adaptation of representer-point methods).
LeafInfluence Estimates the impact of a training example on the final GBDT model (adaptation of influence functions).
TreeSim Computes influence via similarity in tree-kernel space.
LOO Leave-one-out retraining, measures the influence of a training instance by removing and retraining without that instance.

License

Apache License 2.0.

Reference

Brophy, Hammoudeh, and Lowd. Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees. arXiv 2022.

@article{brophy2022treeinfluence,
  title={Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees},
  author={Brophy, Jonathan, and Hammoudeh, Zayd, and Lowd, Daniel},
  journal={arXiv preprint arXiv:2205.00359},
  year={2022},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tree-influence-0.1.4.tar.gz (305.9 kB view hashes)

Uploaded Source

Built Distributions

tree_influence-0.1.4-cp310-cp310-win_amd64.whl (499.0 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

tree_influence-0.1.4-cp310-cp310-win32.whl (479.0 kB view hashes)

Uploaded CPython 3.10 Windows x86

tree_influence-0.1.4-cp310-cp310-musllinux_1_1_x86_64.whl (1.3 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

tree_influence-0.1.4-cp310-cp310-musllinux_1_1_i686.whl (1.3 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

tree_influence-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

tree_influence-0.1.4-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.3 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tree_influence-0.1.4-cp310-cp310-macosx_10_9_x86_64.whl (533.2 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

tree_influence-0.1.4-cp39-cp39-win_amd64.whl (501.5 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

tree_influence-0.1.4-cp39-cp39-win32.whl (481.0 kB view hashes)

Uploaded CPython 3.9 Windows x86

tree_influence-0.1.4-cp39-cp39-musllinux_1_1_x86_64.whl (1.3 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

tree_influence-0.1.4-cp39-cp39-musllinux_1_1_i686.whl (1.3 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

tree_influence-0.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

tree_influence-0.1.4-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.3 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tree_influence-0.1.4-cp39-cp39-macosx_10_9_x86_64.whl (533.5 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page