Skip to main content

Python documentation generator

Project description

CIU-Py

Explainable Machine Learning through Contextual Importance and Utility

NOTE: This python implementation is currently a work in progress. As such some of the functionality present in the original R version is not quite yet available.

The CIU-Python library provides methods to generate post-hoc explanations for machine learning-based classifiers.

What is CIU?

Remark: It seems like Github Markdown doesn’t show correctly the “{” and “}” characters in Latex equations, whereas they are shown correctly in Rstudio. Therefore, in most cases where there is an $i$ shown in Github, it actually signifies {i} and where there is an $I$ it signifies {I}.

CIU is a model-agnostic method for producing outcome explanations of results of any “black-box” model y=f(x). CIU directly estimates two elements of explanation by observing the behaviour of the black-box model (without creating any “surrogate” model g of f(x)).

Contextual Importance (CI) answers the question: how much can the result (or the utility of it) change as a function of feature $i$ or a set of features ${i}$ jointly, in the context $x$?

Contextual Utility (CU) answers the question: how favorable is the value of feature $i$ (or a set of features ${i}$ jointly) for a good (high-utility) result, in the context $x$?

CI of one feature or a set of features (jointly) ${i}$ compared to a superset of features ${I}$ is defined as

$$ \omega_{j,{i},{I}}(x)=\frac{umax_{j}(x,{i})-umin_{j}(x,{i})}{umax_{j}(x,{I})-umin_{j}(x,{I})},
$$

where ${i} \subseteq {I}$ and ${I} \subseteq {1,\dots,n}$. $x$ is the instance/context to be explained and defines the values of input features that do not belong to ${i}$ or ${I}$. In practice, CI is calculated as:

$$ \omega_{j,{i},{I}}(x)= \frac{ymax_{j,{i}}(x)-ymin_{j,{i}}(x)}{ ymax_{j,{I}}(x)-ymin_{j,{I}}(x)}, $$

where $ymin_{j}()$ and $ymax_{j}()$ are the minimal and maximal $y_{j}$ values observed for output $j$.

CU is defined as

$$ CU_{j,{i}}(x)=\frac{u_{j}(x)-umin_{j,{i}}(x)}{umax_{j,{i}}(x)-umin_{j,{i}}(x)}. $$

When $u_{j}(y_{j})=Ay_{j}+b$, this can be written as:

$$ CU_{j,{i}}(x)=\left|\frac{ y_{j}(x)-yumin_{j,{i}}(x)}{ymax_{j,{i}}(x)-ymin_{j,{i}}(x)}\right|, $$

where $yumin=ymin$ if $A$ is positive and $yumin=ymax$ if $A$ is negative.

Usage

First, install the required dependencies. NOTE: this is to be run in your environment's terminal; some environments such as Google Colab might require an exclamation mark before the command, such as !pip install.

pip install CIU_Py

Import the library:

from ciu import determine_ciu

Now, we can call the determine_ciu function which takes the following parameters:

  • case: A dictionary that contains the data of the case.

  • predictor: The prediction function of the black-box model py-ciu should call.

  • dataset: Dataset to deduct min_maxs from (dictionary). Defaults to None.

  • min_maxs (optional): dictionary ('feature_name': [min, max, is_int] for each feature), or infered from dataset. Defaults to None

  • samples (optional): The number of samples py-ciu will generate. Defaults to 1000.

  • prediction_index (optional): In case the model returns several predictions, it is possible to provide the index of the relevant prediction. Defaults to None.

  • category_mapping (optional): A mapping of one-hot encoded categorical variables to lists of categories and category names. Defaults to None.

  • feature_interactions (optional): A list of {key: list} tuples of features whose interactions should be evaluated. Defaults to [].

Here we can use a simple example with the well known Iris flower dataset

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis


iris=datasets.load_iris()

df = pd.DataFrame(data = np.c_[iris['data'], iris['target']],
              columns = iris['feature_names'] + ['target'])
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
df.columns = ['s_length', 's_width', 'p_length', 'p_width', 'target', 'species']

X = df[['s_length', 's_width', 'p_length', 'p_width']]
y = df['species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=123)

Then create and train a model, in this case an LDA model

model = LinearDiscriminantAnalysis()
model.fit(X_train, y_train)

Now simply use our Iris flower data and the model, following the parameter descriptions above

iris_df = df.apply(pd.to_numeric, errors='ignore')

iris_ciu = determine_ciu(
    X_test.iloc[[42]],
    model.predict_proba,
    iris_df.to_dict('list'),
    samples = 1000,
    prediction_index = 2
)

Example Output

Let's import a test from the ciu_tests file

from ciu_tests.ciu_tests import get_boston_gbm_test

The get_boston_gbm_test function returns a CIU Object that we can simply store and use as such

boston_ciu = get_boston_gbm_test()
boston_ciu.explain_tabular()

Now we can also plot the CI/CU values using the CIU Object's plot_ciu function

boston_ciu.plot_ciu()

Likewise there are also several options available using the following parameters:

  • plot_mode: defines the type plot to use between 'default', 'overlap' and 'combined'.
  • include: defines whether to include interactions or not.
  • sort: defines the order of the plot bars by the 'ci' (default), 'cu' values or unsorted if None.
  • color_blind: defines accessible color maps to use for the plots, such as 'protanopia',
    'deuteranopia' and 'tritanopia'.
  • color_edge_cu: defines the hex or named color for the CU edge in the overlap plot mode.
  • color_fill_cu: defines the hex or named color for the CU fill in the overlap plot mode.
  • color_edge_ci: defines the hex or named color for the CI edge in the overlap plot mode.
  • color_fill_ci: defines the hex or named color for the CI fill in the overlap plot mode.

Here's quick example using some of these parameters to create a modified version of the above plot

boston_ciu.plot_ciu(plot_mode="combined", color_blind='tritanopia', sort='cu')

Authors

This implementation replaces an older one, available at https://github.com/TimKam/py-ciu

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CIU_Py-1.0.tar.gz (10.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page