Skip to main content

Influence Estimation for Gradient-Boosted Decision Trees

Project description

TreeInfluence: Influence Estimation for Gradient-Boosted Decision Trees

PyPi version Python version Github License Build

tree-influence is a python library that implements influence estimation for gradient-boosted decision trees (GBDTs), adapting popular techniques such as TracIn and Influence Functions to GBDTs. This library is compatible with all major GBDT frameworks including LightGBM, XGBoost, CatBoost, and SKLearn.

illustration

Installation

pip install tree-influence

Usage

Simple example using BoostIn to identify the most influential training instances to a given test instance:

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from lightgbm import LGBMClassifier
from tree_influence.explainers import BoostIn

# load iris data
data = load_iris()
X, y = data['data'], data['target']

# use two classes, then split into train and test
idxs = np.where(y != 2)[0]
X, y = X[idxs], y[idxs]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=1)

# train GBDT model
model = LGBMClassifier().fit(X_train, y_train)

# fit influence estimator
explainer = BoostIn().fit(model, X_train, y_train)

# estimate training influences on each test instance
influence = explainer.get_local_influence(X_test, y_test)  # shape=(no. train, no. test)

# extract influence values for the first test instance
values = influence[:, 0]  # shape=(no. train,)

# sort training examples from:
# - most positively influential (decreases loss of the test instance the most), to
# - most negatively influential (increases loss of the test instance the most)
training_idxs = np.argsort(values)[::-1]

Supported Estimators

tree-influence supports the following influence-estimation techniques in GBDTs:

Method Description
BoostIn Traces the influence of a training instance throughout the training process (adaptation of TracIn).
TREX Trains a surrogate kernel model that approximates the original model and decomposes any prediction into a weighted sum of the training examples (adaptation of representer-point methods).
LeafInfluence Estimates the impact of a training example on the final GBDT model (adaptation of influence functions).
TreeSim Computes influence via similarity in tree-kernel space.
LOO Leave-one-out retraining, measures the influence of a training instance by removing and retraining without that instance.

License

Apache License 2.0.

Reference

Brophy, Hammoudeh, and Lowd. Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees. Journal of Machine Learning Research (JMLR), 2023.

@article{brophy2023treeinfluence,
  author  = {Jonathan Brophy and Zayd Hammoudeh and Daniel Lowd},
  title   = {Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees},
  journal = {Journal of Machine Learning Research},
  year    = {2023},
  volume  = {24},
  number  = {154},
  pages   = {1--48},
  url     = {http://jmlr.org/papers/v24/22-0449.html},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tree_influence-0.1.7.tar.gz (386.9 kB view details)

Uploaded Source

Built Distributions

tree_influence-0.1.7-cp310-cp310-win_amd64.whl (613.2 kB view details)

Uploaded CPython 3.10 Windows x86-64

tree_influence-0.1.7-cp310-cp310-win32.whl (585.2 kB view details)

Uploaded CPython 3.10 Windows x86

tree_influence-0.1.7-cp310-cp310-musllinux_1_1_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

tree_influence-0.1.7-cp310-cp310-musllinux_1_1_i686.whl (1.5 MB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

tree_influence-0.1.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

tree_influence-0.1.7-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tree_influence-0.1.7-cp310-cp310-macosx_10_9_x86_64.whl (644.2 kB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

tree_influence-0.1.7-cp39-cp39-win_amd64.whl (614.4 kB view details)

Uploaded CPython 3.9 Windows x86-64

tree_influence-0.1.7-cp39-cp39-win32.whl (586.3 kB view details)

Uploaded CPython 3.9 Windows x86

tree_influence-0.1.7-cp39-cp39-musllinux_1_1_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

tree_influence-0.1.7-cp39-cp39-musllinux_1_1_i686.whl (1.5 MB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

tree_influence-0.1.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

tree_influence-0.1.7-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

tree_influence-0.1.7-cp39-cp39-macosx_10_9_x86_64.whl (645.4 kB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

File details

Details for the file tree_influence-0.1.7.tar.gz.

File metadata

  • Download URL: tree_influence-0.1.7.tar.gz
  • Upload date:
  • Size: 386.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for tree_influence-0.1.7.tar.gz
Algorithm Hash digest
SHA256 c2f7c77d517dc39e6eea684f8c4e2aca22f2b1b4614ccbdde410b6a655791818
MD5 2561dc751446c5d3b61a057d6f9d7f4b
BLAKE2b-256 4c9727e941d8039c6ad2ec3c872283ff07cbcf59576c2968c8b197c0c6ea29e3

See more details on using hashes here.

File details

Details for the file tree_influence-0.1.7-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for tree_influence-0.1.7-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 60655722337795128daca362ba1f5858ea4b1a8181efa2b7350a8930c94dd61b
MD5 c2899f151cabd72ffa1b41927be4a99c
BLAKE2b-256 e3fbc0050d210403d233740ff8e3e6bc0a9eff13f0d81e728a522d2d231bcccd

See more details on using hashes here.

File details

Details for the file tree_influence-0.1.7-cp310-cp310-win32.whl.

File metadata

File hashes

Hashes for tree_influence-0.1.7-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 bae20c0fff2d634327cbba72978ed888533be8f45a90ee6d62399b4eca75d413
MD5 6fa4dd06a5e53d38a69e6e448736c507
BLAKE2b-256 3a3ce3349cd8c27d45d8a99e39983dd69c596dd1a84f86cf0f60410c7d3effbb

See more details on using hashes here.

File details

Details for the file tree_influence-0.1.7-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for tree_influence-0.1.7-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 6a25ffbad6ba6cbcaad400221302fcf846b3fa56527dac563e138f150aeddec5
MD5 6b3b85bdd1a150bb46c7fbe5d5eb4803
BLAKE2b-256 c2fab56ccc3bef2ec88a751093495a25af26b6617a3ab4e41469735ccfc2d3fd

See more details on using hashes here.

File details

Details for the file tree_influence-0.1.7-cp310-cp310-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for tree_influence-0.1.7-cp310-cp310-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 02665b00258578828cc43d1aca64974892e7c7df8f65a7ac536a4cfa0d8243e0
MD5 c3eb9f17b75eb1842185539a23fe0ba9
BLAKE2b-256 d9808cb375c8fa6abbafec304113adc608a5924e31878a445d6f3595c7c821e4

See more details on using hashes here.

File details

Details for the file tree_influence-0.1.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tree_influence-0.1.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 388d9549897230671d4baa94e95306ae20342a2f59f4f538d48ad7afc1a9bfed
MD5 e097dd9412b4f36c412cd5af8344622e
BLAKE2b-256 308a063bd56ed621c3eeab78bf75cc4f9a599b87ebafae32923b9cb0967abd30

See more details on using hashes here.

File details

Details for the file tree_influence-0.1.7-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tree_influence-0.1.7-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 dc57c922221f4f4d2c3241dd9f33626a95d97a9fd090b63cae1373cb7f73eae6
MD5 5051938b285388645fe165e9a5d58aae
BLAKE2b-256 f5d5f32d663e42d98fe3ee5e93f134cb8f20783f63821751abc492aa28421f06

See more details on using hashes here.

File details

Details for the file tree_influence-0.1.7-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tree_influence-0.1.7-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 2c39d1e3dafb24fdccb38a396b1d33016af9aa324c5b9fb409aa040b13a53901
MD5 bf2e323f36e467b70e143cf6686104cc
BLAKE2b-256 8f6a486968a129e2df13471e9ba796d07670dfa0a595e96e2daa63027158e8f2

See more details on using hashes here.

File details

Details for the file tree_influence-0.1.7-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for tree_influence-0.1.7-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 b49cdc93dd5fb915c8a4c231ab28a38a1df2a9a06053ab8dc95bf845eb361360
MD5 a4224716fdcf0ac7d9fd16628cd863e0
BLAKE2b-256 707747205f2c6a0e2efeb0e5ab9abfb5e2ad7e47e6ca3f6de1a495a315b60463

See more details on using hashes here.

File details

Details for the file tree_influence-0.1.7-cp39-cp39-win32.whl.

File metadata

File hashes

Hashes for tree_influence-0.1.7-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 332667dafc19e8f177edcec009742219bc9e4ecea24d54dacbb1b880050a2a6b
MD5 ee1d17d05e5a406692599cbc49aeba72
BLAKE2b-256 58bfe67ed01faca014b04e8ad1de535b1ad3054666a2eb4ae9bcaed26591ca2a

See more details on using hashes here.

File details

Details for the file tree_influence-0.1.7-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for tree_influence-0.1.7-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 b09dd135f8a9a1cb33e08dbbcdc64e1027c65f57aabca3f1e2df051585ddb116
MD5 9e5dfc008fc59fa24cc612f88f2eb73a
BLAKE2b-256 d25847e5180b272fc765fed762875b01d635944bd4c897c342a86037d5b80f1e

See more details on using hashes here.

File details

Details for the file tree_influence-0.1.7-cp39-cp39-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for tree_influence-0.1.7-cp39-cp39-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 d5261f477e86d8ac52415d26ac8f9ad44da3c01de6b57135049a211477b4938a
MD5 d03831047530dac924172395f54ff6cf
BLAKE2b-256 9ccd939b8ffa0f417eb0eb69bab7608490412a733232c8db20aa3fcdb8c61d55

See more details on using hashes here.

File details

Details for the file tree_influence-0.1.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tree_influence-0.1.7-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d80eedf3835b3932506f84869ec01d600714631d5a18579a00c9400df5e4ee9f
MD5 5b3140d01f678b3c0623b99bc12847b8
BLAKE2b-256 e630d0ff5f5c4df7165bb4d928cfc7664b144518ef2757905f756f6db9d2ad44

See more details on using hashes here.

File details

Details for the file tree_influence-0.1.7-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for tree_influence-0.1.7-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 b5d096a690db5ca431b8097928740a3a804cc8d747aeb02109ec2005af5ffa61
MD5 ad53c1f1635ac7705a705ca710003dd7
BLAKE2b-256 fb262c8b012a50f629d24f34d4dbee69f840d0ad6c20a9eed686c0edf8cb424f

See more details on using hashes here.

File details

Details for the file tree_influence-0.1.7-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tree_influence-0.1.7-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1cba28f2edd9983c82e9f657590415e1b067786bfd9a9e5e6c229ffe4b45ed34
MD5 378e452a607bcfd0a0beb657d185408c
BLAKE2b-256 5ffe3bb438031245e5a3cd0dcd2608f3bff615a8227f95cbc07c0a711f1e242b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page