Explore, Explain and Examine Predictive Models

## Project description

# dalex

## Overview

Unverified black box model is the path to the failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection.

The `dalex`

package xrays any model and helps to explore and explain its behaviour, helps to understand how complex models are working.
The main `Explainer`

object creates a wrapper around a predictive model. Wrapped models may then be explored and compared with a collection of local and global explainers.
Recent developents from the area of Interpretable Machine Learning/eXplainable Artificial Intelligence.

The philosophy behind `dalex`

explanations is described in the Explanatory Model Analysis e-book.

The `dalex`

package is a part of DrWhy.AI universe.

## Installation

The `dalex`

package is available on PyPI

```
pip install dalex -U
```

## Resources

- Introduction to the
`dalex`

package: Titanic: tutorial and examples - Important features explained: FIFA20: explain default vs tuned model with dalex
- How to use dalex with xgboost
- How to use dalex with tensorflow
- Interesting features in v0.2.1
- Fairness module
- Code in the form of jupyter notebook
- YouTube video showing how to do Break Down analysis
- Changelog: NEWS
- Theoretical introduction to the plots: Explanatory Model Analysis: Explore, Explain and Examine Predictive Models

## Plots

This package uses plotly to render the plots:

- Install extensions to use
`plotly`

in**JupyterLab**: Getting Started Troubleshooting - Use
`show=False`

parameter in`plot`

method to return`plotly Figure`

object - It is possible to edit the figures and save them

## Learn more

Machine Learning models are widely used and have various applications in classification or regression tasks. Due to increasing computational power, availability of new data sources and new methods, ML models are more and more complex. Models created with techniques like boosting, bagging of neural networks are true black boxes. It is hard to trace the link between input variables and model outcomes. They are use because of high performance, but lack of interpretability is one of their weakest sides.

In many applications we need to know, understand or prove how input variables are used in the model and what impact do they have on final model prediction.
`dalex`

is a set of tools that help to understand how complex models are working.

Talk with your model! at USeR 2020

## Authors

Main authors of the `dalex`

package are:

Under the supervision of Przemyslaw Biecek.

Other contributors:

- Piotr Piatyszek maintains the
`arena`

module - Jakub Wisnewski maintains the
`fairness`

module

Changelog

## v0.4.0 (17/11/2020)

- added new
`arena`

module, which adds the backend for Arena dashboard @piotrpiatyszek

#### features

- added new aliases to
`dx.Explainer`

methods (#350) in`model_parts`

it is`{'permutational': 'variable_importance', 'feature_importance': 'variable_importance'}`

, in`model_profile`

it is`{'pdp': 'partial', 'ale': 'accumulated'}`

- added
`Arena`

object for dashboard backend. See https://github.com/ModelOriented/Arena - new
`fairness`

plot types:`stacked`

,`radar`

,`performance_and_fairness`

,`heatmap`

,`ceteris_paribus_cutoff`

- upgraded
`fairness_check()`

## v0.3.0 (26/10/2020)

- added new
`fairness`

module, which will focus on bias detection, visualization and mitigation @jakwisn

#### bug fixes

- removed unnecessary warning when
`precalculate=False and verbose=False`

(#340)

#### features

- added
`model_fairness`

method to the`Explainer`

, which performs fairness explanation - added
`GroupFairnessClassification`

object, with the`plot`

method having two types`fairness_check`

and`metric_scores`

#### defaults

- added the
`N=50000`

argument to`ResidualDiagnostics.plot`

, which samples observations from the`result`

parameter to omit performance issues when`smooth=True`

(#341)

## dalex 0.2.2

- added support for
`tensorflow.python.keras.engine.sequential.Sequential`

and`tensorflow.python.keras.engine.training.Model`

(#326) - updated the
`tqdm`

dependency to`>=4.48.2`

,`pandas`

dependency to`>=1.1.2`

and`numpy`

dependency to`>=1.18.4`

#### bug fixes

- fixed the wrong order of
`Explainer`

verbose messages - fixed a bug that caused
`model_info`

parameter to be overwritten by the default values - fixed a bug occurring when the variable from
`groups`

was not of`str`

type (#327) - fixed
`model_profile`

:`variable_type='categorical'`

not working when user passed`variables`

parameter (#329) + the reverse order of bars in`'categorical'`

plots + (again) added`variable_splits_type`

parameter to`model_profile`

to specify how grid points shall be calculated (#266) + allow for both`'quantile'`

and`'quantiles'`

types (alias)

#### features

- added informative error messages when importing optional dependencies (#316)
- allow for
`data`

and`y`

to be`None`

- added checks in`Explainer`

methods

#### defaults

- wrong parameter name
`title_x`

changed to`y_title`

in`CeterisParibus.plot`

and`AggregatedProfiles.plot`

(#317) - now warning the user in
`Explainer`

when`predict_function`

returns an error or doesn't return`numpy.ndarray (1d)`

(#325)

## dalex 0.2.1

- updated the
`pandas`

dependency to`>=1.1.0`

#### bug fixes

`ModelPerformance.plot`

now uses a drwhy color palette- use
`unique`

method instead of`np.unique`

in`variable_splits`

(#293) `v0.2.0`

didn't export new datasets- fixed a bug where
`predict_parts(type='shap')`

calculated wrong`contributions`

(#300) `model_profile`

uses observation mean instead of profile mean in`_yhat_`

centering- fixed barplot baseline in categorical
`model_profile`

and`predict_profile`

plots (#297) - fixed
`model_profile(type='accumulated')`

giving wrong results (#302) - vertical/horizontal lines in plots now end on the plot edges

#### features

- added new
`type='shap_wrapper'`

to`predict_parts`

and`model_parts`

methods, which returns a new`ShapWrapper`

object. It contains the main result attribute (`shapley_values`

) and the plot method (`force_plot`

and`summary_plot`

respectively). These come from the shap package `Explainer.predict`

method now accepts`numpy.ndarray`

- added the
`ResidualDiagnostics`

object with a`plot`

method - added
`model_diagnostics`

method to the`Explainer`

, which performs residual diagnostics - added
`predict_surrogate`

method to the`Explainer`

, which is a wrapper for the`lime`

tabular explanation from the lime package - added
`model_surrogate`

method to the`Explainer`

, which creates a basic surrogate decision tree or linear model from the black-box model using the scikit-learn package - added a
`_repr_html_`

method to all of the explanation objects (it prints the`result`

attribute) - added
`dalex.__version__`

- added informative error messages in
`Explainer`

methods when`y`

is of wrong type (#294) `CeterisParibus.plot(variable_type='categorical')`

now allows for multiple observations- new verbose checks for
`model_type`

- add
`type`

to`model_info`

in`dump`

and`dumps`

for R compatibility (#303) `ModelPerformance.result`

now has`label`

as index

#### defaults

- removed
`_grid_`

column in`AggregatedProfiles.result`

and`center`

only works with`type=accumulated`

- use
`Pipeline._final_estimator`

to extract`model_class`

of the actual model - use
`model._estimator_type`

to extract`model_type`

if possible

## dalex 0.2.0

- major documentation update (#270)
- unified the order of function parameters

#### bug fixes

`v0.1.9`

had wrong`_original_`

column in`predict_profile`

`vertical_spacing`

acts as intended in`VariableImportance.plot`

when`split='variable'`

`loss_function='auc'`

now uses`loss_one_minus_auc`

as this should be a descending measure- plots are now saved with the original height and width
`model_profile`

now properly passes the`variables`

parameter to`CeterisParibus`

`variables`

parameter in`predict_profile`

now can also be a string

#### features

- use
`px.express`

instead of core`plotly`

to make`model_profile`

and`predict_profile`

plots; thus, enhance performance and scalability - added
`verbose`

parameter where`tqdm`

is used to verbose progress bar - added
`loss_one_minus_auc`

function that can be used with`loss_function='1-auc'`

in`model_parts`

- added new example data sets:
`apartments`

,`dragons`

and`hr`

- added
`color`

,`opacity`

,`title_x`

parameters to`model_profile`

and`predict_profile`

plots (#236), changed tooltips and legends (#262) - added
`geom='profiles'`

parameter to`model_profile`

plot and`raw_profiles`

attribute to`AggregatedProfiles`

- added
`variable_splits_type`

parameter to`predict_profile`

to specify how grid points shall be calculated (#266) - added
`variable_splits_with_obs`

parameter to`predict_profile`

function to extend split points with observation variable values (#269) - added
`variable_splits`

parameter to`model_profile`

#### defaults

- use different
`loss_function`

for classification and regression (#248) - models that use
`proba`

yhats now get`model_type='classification'`

if it's not specified - use uniform way of grid points calculation in
`predict_profile`

and`model_profile`

(see`variable_splits_type`

parameter) - add the variable values of
`new_observation`

to`variable_splits`

in`predict_profile`

(see`variable_splits_with_obs`

parameter) - use
`N=1000`

in`model_parts`

and`N=300`

in`model_profile`

to comply with the R version `keep_raw_permutation`

is now set to`False`

instead of`None`

in`model_parts`

`intercept`

parameter in`model_profile`

is now named`center`

## dalex 0.1.9

*feature:*added`random_state`

parameter for`predict_parts(type='shap')`

and`model_profile`

for reproducible calculations*fix:*fixed`random_state`

parameter in`model_parts`

*feature:*multiprocessing added for:`model_profile`

,`model_parts`

,`predict_profile`

and`predict_parts(type='shap')`

, through the`processes`

parameter*fix:*significantly improved the speed of`accumulated`

and`conditional`

types in`model_profile`

*bugfix:*use pd.api.types.is_numeric_dtype() instead of`np.issubdtype()`

to cover more types; e.g. it caused errors with`string`

type*defaults:*use pd.convert_dtypes() on the result of`CeterisParibus`

to fix variable dtypes and later allow for a concatenation without the dtype conversion*fix:*`variables`

parameter now can be a single`str`

value*fix:*number rounding in`predict_parts`

,`model_parts`

(#245)*fix:*CP calculations for models that take only variables as an input

## dalex 0.1.8

*bugfix:*`variable_splits`

parameter now works correctly in`predict_profile`

*bugfix:*fix baseline for 3+ models in`AggregatedProfiles.plot`

(#234)*printing:*now rounding numbers in`Explainer`

messages*fix:*minor checks fixes in`instance_level`

*bugfix:*`AggregatedProfiles.plot`

now works with`groups`

## dalex 0.1.7

*feature:*parameter`N`

in`model_profile`

can be set to`None`

, to select all observations*input:*`groups`

and`variable`

parameters in`model_profile`

can be:`str`

,`list`

,`numpy.ndarray`

,`pandas.Series`

*fix:*`check_label`

returned only a first letter*bugfix:*removed the conversion of`all_variables`

to`str`

in`prepare_all_variables`

, which caused an error in`model_profile`

(#214)*defaults:*change numpy data variable names from numbers to strings

## dalex 0.1.6

*fix:*change`short_name`

encoding in`fifa`

dataset (utf8->ascii)*fix:*remove`scipy`

dependency*defaults:*default`loss_root_mean_square`

in model parts changed to`rmse`

*bugfix:*checks related to`new_observation`

in`BreakDown, Shap, CeterisParibus`

now work for multiple inputs (#207)*bugfix:*`CeterisParibus.fit`

and`CeterisParibus.plot`

now work for more types of`new_observation.index`

, but won't work for a`bolean`

type (#211)

## dalex 0.1.5

*feature:*add`xgboost`

package compatibility (#188)*feature:*added`model_class`

parameter to`Explainer`

to handle wrapped models*feature:*`Exaplainer`

attribute`model_info`

remembers if parameters are default*bugfix:*`variable_groups`

parameter now works correctly in`model_parts`

*fix:*changed parameter order in`Explainer`

:`model_type`

,`model_info`

,`colorize`

*documentation:*`model_parts`

documentation is updated*feature:*new`show`

parameter in`plot`

methods that (`if False`

) returns`plotly Figure`

(#190)*feature:*`load_fifa()`

function which loads the preprocessed players_20 dataset*fix:*`CeterisParibus.plot`

tooltip

## dalex 0.1.4

*feature:*new`Explainer.residual`

method which uses`residual_function`

to calculate`residuals`

*feature:*new`dump`

and`dumps`

methods for saving`Explainer`

in a binary form;`load`

and`loads`

methods for loading`Explainer`

from binary form*fix:*`Explainer`

constructor verbose text*bugfix:*`B:=B+1`

-`Shap`

now stores average results as`B=0`

and path results as`B=1,2,...`

*bugfix:*`Explainer.model_performance`

method uses`self.model_type`

when`model_type`

is`None`

*bugfix:*values in`BreakDown`

and`Shap`

are now rounded to 4 significant places (#180)*bugfix:*`Shap`

by default uses`path='average'`

,`sign`

column is properly updated and bars in`plot`

are sorted by`abs(contribution)`

## dalex 0.1.3

- release of the
`dalex`

package `Explainer`

object with`predict`

,`predict_parts`

,`predict_profile`

,`model_performance`

,`model_parts`

and`model_profile`

methods`BreakDown`

,`Shap`

,`CeterisParibus`

,`ModelPerformance`

,`VariableImportance`

and`AggregatedProfiles`

objects with a`plot`

method`load_titanic()`

function which loads the`titanic_imputed`

dataset

## Project details

## Release history Release notifications | RSS feed

## Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.