Skip to main content

Post hoc explanations for ML models through measures of statistical dependence

Project description

build coverage Python 3.8 Python 3.9 Python 3.10 Python 3.11

Xi - Post hoc explanations

Xi is a python package that implements the paper "Explaining classifiers with measures of statistical association"[1] and "The Xi method: unlocking the mysteries of regression with Statistics"[2]

The growing size and complexity of data as well as the need of accurate predictions, forces analysts to use black-box model. While the success of those models extends statistical application, it also increases the need for interpretability and ,when possible, explainability.

The paper proposes an approach to the problem based on measures of statistical association.
Measures of statistical association deliver information regarding the strength of the statistical dependence between the target and the feature(s) of interest, inferring this insight from the data in a model-agnostic fashion.
In this respect, we note that an important class of measures of statistical associations is represented by probabilistic sensitivity measures.

We use these probabilistic sensitivity measures as part of the broad discourse of interpretability in statistical machine learning. For brevity, we call this part the Xi-method.
Briefly, the method consists in evaluating ML model predictions comparing the values of probabilistic sensitivity measures obtained in a model-agnostic fashion, i.e., directly from the data, with the same indices computed replacing the true targets with the ML model forecasts.

To sum up, Xi has three main advantages:

  • Model agnostic: as long as your model outputs predictions, you can use Xi with any model
  • Data agnostic: Xi works with structured (tabular) and unstructured data ( text, image ).
  • Computationally reasonable

Installation

Install from pypi:

pip install xi-method

Usage

The package is quite simple and it's designed to give you post hoc explainations for your dataset and machine learning model with minimal effort.
Import your data in a pandas dataframe format, splitting covariates and independent variable.

from xi_method.utils import load_dataset
from xi_method.ximp import *

# load wine quality
df = load_wine_quality_red_dataset()

Y = df.quality
df.drop(columns='quality', inplace=True)

Create an instance of XIClassifier or XIRegressor depending on the type of problem you are working with:

xi = XIClassifier(m=20)

For the classification tasks, you can specify the number of partitions in three different ways:

  • m: number of partitions can be a dictionary or an integer. The dictionary should have covariate name as key and number of desired partition as value. If m is an integer, the desired number of partition will be applied to all covariates.
  • discrete: A list of covariates name you want to treat as categorical.
  • obs: A dictionary mapping covariates name to number of desired observations in each partition.

For regression tasks, you can only specify m as an integer.

A default m value will be computed if nothing is provided by the user, as indicated in the paper.

To obtain post hoc explanations, run your favorite ML model, save the predictions as numpy array and provide the covariates ( test set) and the predictions to the method explain:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(df.values, Y, test_size=0.3, random_state=42)
lr = LogisticRegression(multi_class='multinomial',max_iter=100)
lr.fit(x_train, y_train)
y_pred = lr.predict(x_test)

xi = XIClassifier(m=20)
p = xi.explain(X=x_test, y=y_pred, replicates=10, separation_measurement='L1')

Object p is a python dictionary mapping separation measurement and explanation.
You can easily have access to the explanation:

p.get('L1').explanation

You can choose from different separation measurement, as specified in the paper. You can specify one separation measurement or more than one, using a list.

p = xi.explain(X=x_test, y=y_pred, separation_measurement=['L1','Kuiper'])

Implemented separation measurement:

  • Kullback - Leibler
  • Kuiper
  • L1
  • L2
  • Hellinger

You can get a list of implemented separation measurement running:

from xi_method.utils import *
get_separation_measurement()

Plot your result:

plot(separation_measurement='L1', type='tabular', explain=P, k=10)

References

[1] E. Borgonovo, V. Ghidini, R. Hahn a, E. Plischke (2023). Explaining classifiers with measures of statistical association Computational Statistics and Data Analysis, Volume 182, June 2023, 107701

[2] V. Ghidini (2023). The Xi method: unlocking the mysteries of regression with Statistics

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xi_method-0.1.7.tar.gz (2.8 MB view details)

Uploaded Source

Built Distribution

xi_method-0.1.7-py3-none-any.whl (2.9 MB view details)

Uploaded Python 3

File details

Details for the file xi_method-0.1.7.tar.gz.

File metadata

  • Download URL: xi_method-0.1.7.tar.gz
  • Upload date:
  • Size: 2.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for xi_method-0.1.7.tar.gz
Algorithm Hash digest
SHA256 8b69120c29afdb2d710925a90a45af6b5ba0aa95518f09c946f2800136a3d2e7
MD5 3be1bb2b52d3a9ad4eb1e958b97a8683
BLAKE2b-256 3b3760c898d5cf12383cd3a1db9c34dbe4679a05935c0be1d1c5c43738c6e765

See more details on using hashes here.

File details

Details for the file xi_method-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: xi_method-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 2.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for xi_method-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 87f5b87fc48559777d77de942f0d1aebcd98bd16b8b9f8b894991ea7ec78f379
MD5 24be913b508f757f9b1ace11ec01f688
BLAKE2b-256 20801797116ee52865f08dcb0effc5b9906afe09f673dbcee87f5aebfb4bbca7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page