Explore, Explain and Examine Predictive Models
Project description
dalex
moDel Agnostic Language for Exploration and eXplanation
Overview
Unverified black box model is the path to the failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection.
The dalex
package xrays any model and helps to explore and explain its behaviour, helps to understand how complex models are working.
The main Explainer
object creates a wrapper around a predictive model. Wrapped models may then be explored and compared with a collection of local and global explainers.
Recent developents from the area of Interpretable Machine Learning/eXplainable Artificial Intelligence.
The philosophy behind dalex
explanations is described in the Explanatory Model Analysis e-book.
The dalex
package is a part of DrWhy.AI universe.
Installation
pip install dalex -U
Resources
- Introduction to the
dalex
package: Titanic: tutorial and examples - Important features explained: FIFA20: explain default vs tuned model with dalex
- How to use dalex with xgboost
- How to use dalex with tensorflow
- Interesting features in v0.2.1
- New fairness module
- Code in the form of jupyter notebook
- YouTube video showing how to do Break Down analysis
- Changelog: NEWS
- Theoretical introduction to the plots: Explanatory Model Analysis: Explore, Explain and Examine Predictive Models
Plots
This package uses plotly to render the plots:
- Install extensions to use
plotly
in JupyterLab: Getting Started Troubleshooting - Use
show=False
parameter inplot
method to returnplotly Figure
object - It is possible to edit the figures and save them
Learn more
Machine Learning models are widely used and have various applications in classification or regression tasks. Due to increasing computational power, availability of new data sources and new methods, ML models are more and more complex. Models created with techniques like boosting, bagging of neural networks are true black boxes. It is hard to trace the link between input variables and model outcomes. They are use because of high performance, but lack of interpretability is one of their weakest sides.
In many applications we need to know, understand or prove how input variables are used in the model and what impact do they have on final model prediction.
dalex
is a set of tools that help to understand how complex models are working.
Talk with your model! at USeR 2020
Authors
Main authors of the dalex
package are:
Under the supervision of Przemyslaw Biecek.
Other contributors:
- Jakub Wisnewski maintains the
fairness
module
dalex 0.3.0
- added new
fairness
module, which will focus on bias detection, visualization and mitigation @jakwisn
bug fixes
- removed unnecessary warning when
precalculate=False and verbose=False
(#340)
features
- added
model_fairness
method to theExplainer
, which performs fairness explanation - added
GroupFairnessClassification
object, with theplot
method having two typesfairness_check
andmetric_scores
defaults
- added the
N=50000
argument toResidualDiagnostics.plot
, which samples observations from theresult
parameter to omit performance issues whensmooth=True
(#341)
dalex 0.2.2
- added support for
tensorflow.python.keras.engine.sequential.Sequential
andtensorflow.python.keras.engine.training.Model
(#326) - updated the
tqdm
dependency to>=4.48.2
,pandas
dependency to>=1.1.2
andnumpy
dependency to>=1.18.4
bug fixes
- fixed the wrong order of
Explainer
verbose messages - fixed a bug that caused
model_info
parameter to be overwritten by the default values - fixed a bug occurring when the variable from
groups
was not ofstr
type (#327) - fixed
model_profile
:variable_type='categorical'
not working when user passedvariables
parameter (#329) + the reverse order of bars in'categorical'
plots + (again) addedvariable_splits_type
parameter tomodel_profile
to specify how grid points shall be calculated (#266) + allow for both'quantile'
and'quantiles'
types (alias)
features
- added informative error messages when importing optional dependencies (#316)
- allow for
data
andy
to beNone
- added checks inExplainer
methods
defaults
- wrong parameter name
title_x
changed toy_title
inCeterisParibus.plot
andAggregatedProfiles.plot
(#317) - now warning the user in
Explainer
whenpredict_function
returns an error or doesn't returnnumpy.ndarray (1d)
(#325)
dalex 0.2.1
- updated the
pandas
dependency to>=1.1.0
bug fixes
ModelPerformance.plot
now uses a drwhy color palette- use
unique
method instead ofnp.unique
invariable_splits
(#293) v0.2.0
didn't export new datasets- fixed a bug where
predict_parts(type='shap')
calculated wrongcontributions
(#300) model_profile
uses observation mean instead of profile mean in_yhat_
centering- fixed barplot baseline in categorical
model_profile
andpredict_profile
plots (#297) - fixed
model_profile(type='accumulated')
giving wrong results (#302) - vertical/horizontal lines in plots now end on the plot edges
features
- added new
type='shap_wrapper'
topredict_parts
andmodel_parts
methods, which returns a newShapWrapper
object. It contains the main result attribute (shapley_values
) and the plot method (force_plot
andsummary_plot
respectively). These come from the shap package Explainer.predict
method now acceptsnumpy.ndarray
- added the
ResidualDiagnostics
object with aplot
method - added
model_diagnostics
method to theExplainer
, which performs residual diagnostics - added
predict_surrogate
method to theExplainer
, which is a wrapper for thelime
tabular explanation from the lime package - added
model_surrogate
method to theExplainer
, which creates a basic surrogate decision tree or linear model from the black-box model using the scikit-learn package - added a
_repr_html_
method to all of the explanation objects (it prints theresult
attribute) - added
dalex.__version__
- added informative error messages in
Explainer
methods wheny
is of wrong type (#294) CeterisParibus.plot(variable_type='categorical')
now allows for multiple observations- new verbose checks for
model_type
- add
type
tomodel_info
indump
anddumps
for R compatibility (#303) ModelPerformance.result
now haslabel
as index
defaults
- removed
_grid_
column inAggregatedProfiles.result
andcenter
only works withtype=accumulated
- use
Pipeline._final_estimator
to extractmodel_class
of the actual model - use
model._estimator_type
to extractmodel_type
if possible
dalex 0.2.0
- major documentation update (#270)
- unified the order of function parameters
bug fixes
v0.1.9
had wrong_original_
column inpredict_profile
vertical_spacing
acts as intended inVariableImportance.plot
whensplit='variable'
loss_function='auc'
now usesloss_one_minus_auc
as this should be a descending measure- plots are now saved with the original height and width
model_profile
now properly passes thevariables
parameter toCeterisParibus
variables
parameter inpredict_profile
now can also be a string
features
- use
px.express
instead of coreplotly
to makemodel_profile
andpredict_profile
plots; thus, enhance performance and scalability - added
verbose
parameter wheretqdm
is used to verbose progress bar - added
loss_one_minus_auc
function that can be used withloss_function='1-auc'
inmodel_parts
- added new example data sets:
apartments
,dragons
andhr
- added
color
,opacity
,title_x
parameters tomodel_profile
andpredict_profile
plots (#236), changed tooltips and legends (#262) - added
geom='profiles'
parameter tomodel_profile
plot andraw_profiles
attribute toAggregatedProfiles
- added
variable_splits_type
parameter topredict_profile
to specify how grid points shall be calculated (#266) - added
variable_splits_with_obs
parameter topredict_profile
function to extend split points with observation variable values (#269) - added
variable_splits
parameter tomodel_profile
defaults
- use different
loss_function
for classification and regression (#248) - models that use
proba
yhats now getmodel_type='classification'
if it's not specified - use uniform way of grid points calculation in
predict_profile
andmodel_profile
(seevariable_splits_type
parameter) - add the variable values of
new_observation
tovariable_splits
inpredict_profile
(seevariable_splits_with_obs
parameter) - use
N=1000
inmodel_parts
andN=300
inmodel_profile
to comply with the R version keep_raw_permutation
is now set toFalse
instead ofNone
inmodel_parts
intercept
parameter inmodel_profile
is now namedcenter
dalex 0.1.9
- feature: added
random_state
parameter forpredict_parts(type='shap')
andmodel_profile
for reproducible calculations - fix: fixed
random_state
parameter inmodel_parts
- feature: multiprocessing added for:
model_profile
,model_parts
,predict_profile
andpredict_parts(type='shap')
, through theprocesses
parameter - fix: significantly improved the speed of
accumulated
andconditional
types inmodel_profile
- bugfix: use pd.api.types.is_numeric_dtype()
instead of
np.issubdtype()
to cover more types; e.g. it caused errors withstring
type - defaults: use pd.convert_dtypes()
on the result of
CeterisParibus
to fix variable dtypes and later allow for a concatenation without the dtype conversion - fix:
variables
parameter now can be a singlestr
value - fix: number rounding in
predict_parts
,model_parts
(#245) - fix: CP calculations for models that take only variables as an input
dalex 0.1.8
- bugfix:
variable_splits
parameter now works correctly inpredict_profile
- bugfix: fix baseline for 3+ models in
AggregatedProfiles.plot
(#234) - printing: now rounding numbers in
Explainer
messages - fix: minor checks fixes in
instance_level
- bugfix:
AggregatedProfiles.plot
now works withgroups
dalex 0.1.7
- feature: parameter
N
inmodel_profile
can be set toNone
, to select all observations - input:
groups
andvariable
parameters inmodel_profile
can be:str
,list
,numpy.ndarray
,pandas.Series
- fix:
check_label
returned only a first letter - bugfix: removed the conversion of
all_variables
tostr
inprepare_all_variables
, which caused an error inmodel_profile
(#214) - defaults: change numpy data variable names from numbers to strings
dalex 0.1.6
- fix: change
short_name
encoding infifa
dataset (utf8->ascii) - fix: remove
scipy
dependency - defaults: default
loss_root_mean_square
in model parts changed tormse
- bugfix: checks related to
new_observation
inBreakDown, Shap, CeterisParibus
now work for multiple inputs (#207) - bugfix:
CeterisParibus.fit
andCeterisParibus.plot
now work for more types ofnew_observation.index
, but won't work for abolean
type (#211)
dalex 0.1.5
- feature: add
xgboost
package compatibility (#188) - feature: added
model_class
parameter toExplainer
to handle wrapped models - feature:
Exaplainer
attributemodel_info
remembers if parameters are default - bugfix:
variable_groups
parameter now works correctly inmodel_parts
- fix: changed parameter order in
Explainer
:model_type
,model_info
,colorize
- documentation:
model_parts
documentation is updated - feature: new
show
parameter inplot
methods that (if False
) returnsplotly Figure
(#190) - feature:
load_fifa()
function which loads the preprocessed players_20 dataset - fix:
CeterisParibus.plot
tooltip
dalex 0.1.4
- feature: new
Explainer.residual
method which usesresidual_function
to calculateresiduals
- feature: new
dump
anddumps
methods for savingExplainer
in a binary form;load
andloads
methods for loadingExplainer
from binary form - fix:
Explainer
constructor verbose text - bugfix:
B:=B+1
-Shap
now stores average results asB=0
and path results asB=1,2,...
- bugfix:
Explainer.model_performance
method usesself.model_type
whenmodel_type
isNone
- bugfix: values in
BreakDown
andShap
are now rounded to 4 significant places (#180) - bugfix:
Shap
by default usespath='average'
,sign
column is properly updated and bars inplot
are sorted byabs(contribution)
dalex 0.1.3
- release of the
dalex
package Explainer
object withpredict
,predict_parts
,predict_profile
,model_performance
,model_parts
andmodel_profile
methodsBreakDown
,Shap
,CeterisParibus
,ModelPerformance
,VariableImportance
andAggregatedProfiles
objects with aplot
methodload_titanic()
function which loads thetitanic_imputed
dataset
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.