Python documentation generator
Project description
CIU-Py
Explainable Machine Learning through Contextual Importance and Utility
NOTE: This python implementation is currently a work in progress. As such some of the functionality present in the original R version is not quite yet available.
The CIU-Python library provides methods to generate post-hoc explanations for machine learning-based classifiers.
What is CIU?
Remark: It seems like Github Markdown doesn’t show correctly the “{”
and “}” characters in Latex equations, whereas they are shown correctly
in Rstudio. Therefore, in most cases where there is an $i$ shown in
Github, it actually signifies {i}
and where there is an $I$ it
signifies {I}
.
CIU is a model-agnostic method for producing outcome explanations of
results of any “black-box” model y=f(x)
. CIU directly estimates two
elements of explanation by observing the behaviour of the black-box
model (without creating any “surrogate” model g
of f(x)
).
Contextual Importance (CI) answers the question: how much can the result (or the utility of it) change as a function of feature $i$ or a set of features ${i}$ jointly, in the context $x$?
Contextual Utility (CU) answers the question: how favorable is the value of feature $i$ (or a set of features ${i}$ jointly) for a good (high-utility) result, in the context $x$?
CI of one feature or a set of features (jointly) ${i}$ compared to a superset of features ${I}$ is defined as
$$
\omega_{j,{i},{I}}(x)=\frac{umax_{j}(x,{i})-umin_{j}(x,{i})}{umax_{j}(x,{I})-umin_{j}(x,{I})},
$$
where ${i} \subseteq {I}$ and ${I} \subseteq {1,\dots,n}$. $x$ is the instance/context to be explained and defines the values of input features that do not belong to ${i}$ or ${I}$. In practice, CI is calculated as:
$$ \omega_{j,{i},{I}}(x)= \frac{ymax_{j,{i}}(x)-ymin_{j,{i}}(x)}{ ymax_{j,{I}}(x)-ymin_{j,{I}}(x)}, $$
where $ymin_{j}()$ and $ymax_{j}()$ are the minimal and maximal $y_{j}$ values observed for output $j$.
CU is defined as
$$ CU_{j,{i}}(x)=\frac{u_{j}(x)-umin_{j,{i}}(x)}{umax_{j,{i}}(x)-umin_{j,{i}}(x)}. $$
When $u_{j}(y_{j})=Ay_{j}+b$, this can be written as:
$$ CU_{j,{i}}(x)=\left|\frac{ y_{j}(x)-yumin_{j,{i}}(x)}{ymax_{j,{i}}(x)-ymin_{j,{i}}(x)}\right|, $$
where $yumin=ymin$ if $A$ is positive and $yumin=ymax$ if $A$ is negative.
Usage
First, install the required dependencies. NOTE: this is to be run in your environment's terminal; some environments such as Google Colab might require an exclamation mark before the command, such as !pip install
.
pip install CIU_Py
Import the library:
from ciu import determine_ciu
Now, we can call the determine_ciu
function which takes the following parameters:
-
case
: A dictionary that contains the data of the case. -
predictor
: The prediction function of the black-box model py-ciu should call. -
dataset
: Dataset to deduct min_maxs from (dictionary). Defaults toNone
. -
min_maxs
(optional): dictionary ('feature_name': [min, max, is_int]
for each feature), or infered from dataset. Defaults toNone
-
samples
(optional): The number of samples py-ciu will generate. Defaults to1000
. -
prediction_index
(optional): In case the model returns several predictions, it is possible to provide the index of the relevant prediction. Defaults toNone
. -
category_mapping
(optional): A mapping of one-hot encoded categorical variables to lists of categories and category names. Defaults toNone
. -
feature_interactions
(optional): A list of{key: list}
tuples of features whose interactions should be evaluated. Defaults to[]
.
Here we can use a simple example with the well known Iris flower dataset
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
iris=datasets.load_iris()
df = pd.DataFrame(data = np.c_[iris['data'], iris['target']],
columns = iris['feature_names'] + ['target'])
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
df.columns = ['s_length', 's_width', 'p_length', 'p_width', 'target', 'species']
X = df[['s_length', 's_width', 'p_length', 'p_width']]
y = df['species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=123)
Then create and train a model, in this case an LDA
model
model = LinearDiscriminantAnalysis()
model.fit(X_train, y_train)
Now simply use our Iris flower data and the model, following the parameter descriptions above
iris_df = df.apply(pd.to_numeric, errors='ignore')
iris_ciu = determine_ciu(
X_test.iloc[[42]],
model.predict_proba,
iris_df.to_dict('list'),
samples = 1000,
prediction_index = 2
)
Example Output
Let's import a test from the ciu_tests file
from ciu_tests.ciu_tests import get_boston_gbm_test
The get_boston_gbm_test
function returns a CIU Object that we can simply store and use as such
boston_ciu = get_boston_gbm_test()
boston_ciu.explain_tabular()
Now we can also plot the CI/CU values using the CIU Object's plot_ciu
function
boston_ciu.plot_ciu()
Likewise there are also several options available using the following parameters:
plot_mode
: defines the type plot to use between 'default', 'overlap' and 'combined'.include
: defines whether to include interactions or not.sort
: defines the order of the plot bars by the 'ci' (default), 'cu' values or unsorted if None.color_blind
: defines accessible color maps to use for the plots, such as 'protanopia',
'deuteranopia' and 'tritanopia'.color_edge_cu
: defines the hex or named color for the CU edge in the overlap plot mode.color_fill_cu
: defines the hex or named color for the CU fill in the overlap plot mode.color_edge_ci
: defines the hex or named color for the CI edge in the overlap plot mode.color_fill_ci
: defines the hex or named color for the CI fill in the overlap plot mode.
Here's quick example using some of these parameters to create a modified version of the above plot
boston_ciu.plot_ciu(plot_mode="combined", color_blind='tritanopia', sort='cu')
Authors
This implementation replaces an older one, available at https://github.com/TimKam/py-ciu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.