Python documentation generator

# CIU-Py

Explainable Machine Learning through Contextual Importance and Utility

NOTE: This python implementation is currently a work in progress. As such some of the functionality present in the original R version is not quite yet available.

The CIU-Python library provides methods to generate post-hoc explanations for machine learning-based classifiers.

# What is CIU?

Remark: It seems like Github Markdown doesn’t show correctly the “{” and “}” characters in Latex equations, whereas they are shown correctly in Rstudio. Therefore, in most cases where there is an $i$ shown in Github, it actually signifies {i} and where there is an $I$ it signifies {I}.

CIU is a model-agnostic method for producing outcome explanations of results of any “black-box” model y=f(x). CIU directly estimates two elements of explanation by observing the behaviour of the black-box model (without creating any “surrogate” model g of f(x)).

Contextual Importance (CI) answers the question: how much can the result (or the utility of it) change as a function of feature $i$ or a set of features ${i}$ jointly, in the context $x$?

Contextual Utility (CU) answers the question: how favorable is the value of feature $i$ (or a set of features ${i}$ jointly) for a good (high-utility) result, in the context $x$?

CI of one feature or a set of features (jointly) ${i}$ compared to a superset of features ${I}$ is defined as

$$\omega_{j,{i},{I}}(x)=\frac{umax_{j}(x,{i})-umin_{j}(x,{i})}{umax_{j}(x,{I})-umin_{j}(x,{I})},$$

where ${i} \subseteq {I}$ and ${I} \subseteq {1,\dots,n}$. $x$ is the instance/context to be explained and defines the values of input features that do not belong to ${i}$ or ${I}$. In practice, CI is calculated as:

$$\omega_{j,{i},{I}}(x)= \frac{ymax_{j,{i}}(x)-ymin_{j,{i}}(x)}{ ymax_{j,{I}}(x)-ymin_{j,{I}}(x)},$$

where $ymin_{j}()$ and $ymax_{j}()$ are the minimal and maximal $y_{j}$ values observed for output $j$.

CU is defined as

$$CU_{j,{i}}(x)=\frac{u_{j}(x)-umin_{j,{i}}(x)}{umax_{j,{i}}(x)-umin_{j,{i}}(x)}.$$

When $u_{j}(y_{j})=Ay_{j}+b$, this can be written as:

$$CU_{j,{i}}(x)=\left|\frac{ y_{j}(x)-yumin_{j,{i}}(x)}{ymax_{j,{i}}(x)-ymin_{j,{i}}(x)}\right|,$$

where $yumin=ymin$ if $A$ is positive and $yumin=ymax$ if $A$ is negative.

## Usage

First, install the required dependencies. NOTE: this is to be run in your environment's terminal; some environments such as Google Colab might require an exclamation mark before the command, such as !pip install.

pip install CIU_Py


Import the library:

from ciu import determine_ciu


Now, we can call the determine_ciu function which takes the following parameters:

• case: A dictionary that contains the data of the case.

• predictor: The prediction function of the black-box model py-ciu should call.

• dataset: Dataset to deduct min_maxs from (dictionary). Defaults to None.

• min_maxs (optional): dictionary ('feature_name': [min, max, is_int] for each feature), or infered from dataset. Defaults to None

• samples (optional): The number of samples py-ciu will generate. Defaults to 1000.

• prediction_index (optional): In case the model returns several predictions, it is possible to provide the index of the relevant prediction. Defaults to None.

• category_mapping (optional): A mapping of one-hot encoded categorical variables to lists of categories and category names. Defaults to None.

• feature_interactions (optional): A list of {key: list} tuples of features whose interactions should be evaluated. Defaults to [].

Here we can use a simple example with the well known Iris flower dataset

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

df = pd.DataFrame(data = np.c_[iris['data'], iris['target']],
columns = iris['feature_names'] + ['target'])
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
df.columns = ['s_length', 's_width', 'p_length', 'p_width', 'target', 'species']

X = df[['s_length', 's_width', 'p_length', 'p_width']]
y = df['species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=123)


Then create and train a model, in this case an LDA model

model = LinearDiscriminantAnalysis()
model.fit(X_train, y_train)


Now simply use our Iris flower data and the model, following the parameter descriptions above

iris_df = df.apply(pd.to_numeric, errors='ignore')

iris_ciu = determine_ciu(
X_test.iloc[[42]],
model.predict_proba,
iris_df.to_dict('list'),
samples = 1000,
prediction_index = 2
)


## Example Output

Let's import a test from the ciu_tests file

from ciu_tests.ciu_tests import get_boston_gbm_test


The get_boston_gbm_test function returns a CIU Object that we can simply store and use as such

boston_ciu = get_boston_gbm_test()
boston_ciu.explain_tabular()


Now we can also plot the CI/CU values using the CIU Object's plot_ciu function

boston_ciu.plot_ciu()


Likewise there are also several options available using the following parameters:

• plot_mode: defines the type plot to use between 'default', 'overlap' and 'combined'.
• include: defines whether to include interactions or not.
• sort: defines the order of the plot bars by the 'ci' (default), 'cu' values or unsorted if None.
• color_blind: defines accessible color maps to use for the plots, such as 'protanopia',
'deuteranopia' and 'tritanopia'.
• color_edge_cu: defines the hex or named color for the CU edge in the overlap plot mode.
• color_fill_cu: defines the hex or named color for the CU fill in the overlap plot mode.
• color_edge_ci: defines the hex or named color for the CI edge in the overlap plot mode.
• color_fill_ci: defines the hex or named color for the CI fill in the overlap plot mode.

Here's quick example using some of these parameters to create a modified version of the above plot

boston_ciu.plot_ciu(plot_mode="combined", color_blind='tritanopia', sort='cu')


## Authors

This implementation replaces an older one, available at https://github.com/TimKam/py-ciu

## Project details

Uploaded source