Responsible Machine Learning in Python
Project description
dalex
dalex: Responsible Machine Learning in Python
Overview
Unverified black box model is the path to the failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection.
The dalex
package xrays any model and helps to explore and explain its behaviour, helps to understand how complex models are working.
The main Explainer
object creates a wrapper around a predictive model. Wrapped models may then be explored and compared with a collection of model-level and predict-level explanations. Moreover, there are fairness methods and interactive exploration dashboards available to the user.
The philosophy behind dalex
explanations is described in the Explanatory Model Analysis book.
Installation
The dalex
package is available on PyPI and conda-forge.
pip install dalex -U
conda install -c conda-forge dalex
One can install optional dependencies for all additional features using pip install dalex[full]
.
Resources: https://dalex.drwhy.ai/python
API reference: https://dalex.drwhy.ai/python/api
Authors
The authors of the dalex
package are:
- Hubert Baniecki
- Wojciech Kretowicz
- Piotr Piatyszek maintains the
arena
module - Jakub Wisniewski maintains the
fairness
module - Mateusz Krzyzinski maintains the
aspect
module - Artur Zolkowski maintains the
aspect
module - Przemyslaw Biecek
We welcome contributions: start by opening an issue on GitHub.
Citation
If you use dalex
, please cite our JMLR paper:
@article{JMLR:v22:20-1473,
author = {Hubert Baniecki and
Wojciech Kretowicz and
Piotr Piatyszek and
Jakub Wisniewski and
Przemyslaw Biecek},
title = {dalex: Responsible Machine Learning
with Interactive Explainability and Fairness in Python},
journal = {Journal of Machine Learning Research},
year = {2021},
volume = {22},
number = {214},
pages = {1-7},
url = {http://jmlr.org/papers/v22/20-1473.html}
}
Changelog
v1.5.0 (2022-09-07)
This release consists of mostly maintenance updates and, after a year, marks the Beta -> Stable release.
- increase the dependency from
python>=3.6
topython>=3.7
(at this moment, bothnumpy
andpandas
depend onpython>=3.8
), and addpython>=3.10
to CI - increase the dependencies to
pandas>=1.2.5
,numpy>=1.20.3
(#526),scipy>=1.6.3
,plotly>=5.1.0
, andtqdm>=4.61.2
due to errors withpandas
(see tqdm/#1199) - remove the use of
pd.Series.append()
(#489) - remove the use of
np.isnan
causing error indalex.fairness
(#491) - fix iBreakDown plot y-axis labels (#493)
- stop the Arena's
werkzeug
server using a clearner and still supported API (#518)
v1.4.1 (2021-11-08)
features
- added fairness plot for regression models to
Arena
(dalex/#408) - added new
facet_scales
parameter toAP.plot
andCP.plot
, which allows to free the y-axis withfacet_scales="free"
(dalex/#469); consistent with R (DALEX/#468, ingredients/#140)
fixes
- fixed
AP
andCP
progress bars
v1.4.0 (2021-09-09)
- added new
aspect
module, which will focus on groups of dependent variables @krzyzinskim & @arturzolkowski - added new
scipy>=1.5.4
dependency
breaking changes
- improved the calculation of AUC, ROC plot (#459)
fixes
- wrong yaxis labels in
VariableImportance.plot(split="variable")
(#451) repr_html()
didn't work for explanation objects before using thefit
method (#449)
features
- added new
Aspect
object with thepredict_triplot
,model_triplot
,predict_parts
,model_parts
,get_aspects
methods - added new
PredictTriplot
,ModelTriplot
,PredictAspectImportance
,ModelAspectImportance
objects with theplot
method
v1.3.0 (2021-07-17)
features
- added bias mitigation techniques (
resample
,reweight
,roc_pivot
) into thefairness
module (#432)
v1.2.0 (2021-05-31)
breaking changes
- method
set_options
in Arena now takiesoption_category
instead ofplot_type
(SHAPValues
=>ShapleyValues
,FeatureImportance
=>VariableImportance
) (#420) - methods using the
N
parameter now properly sample rows fromdata
fixes
- fixed wrong error value when no
predict_function
is found inExplainer
(77ca90d) - set multiprocessing context to
'spawn'
(#412) - fixed bug in
metric_scores
plot that made only one subgroup appear on y-axis (#416) - added support for older keras models (#415)
features
- added a resource mechanism to Arena (#419)
- added
ShapleyValuesImportance
andShapleyValuesDependence
plots to Arena (#420) - return
error
instead ofNaN
when AUC is calculated on observations from one class only (#415)
v1.1.0 (2021-04-18)
breaking changes
- fixed concurrent random seeds when
processes > 1
(#392), which means that the results of parallel computation will vary betweenv1.1.0
and previous versions
fixes
GroupFairnessX.plot(type='fairness_check')
generates ticks according to the x-axis range (#409)GroupFainressRegression.plot(type='density')
has a more readable hover - only for outliers (#409)BreakDown.plot()
wrongly displayed the "+all factors" bar whenmax_vars < p
(#401)GroupFairnessClassification.plot(type='metric_scores')
did not handleNaN
's (#399)
features
- Experimental support for regression models in the
fairness
module. AddedGroupFairnessRegression
object, with theplot
method having two types:fairness_check
anddensity
.Explainer.model_fairness
method now depends on themodel_type
attribute. (#391) - added
N
parameter to thepredict_parts
method which isNone
by default (#402) epsilon
is now an argument of theGroupFairnessClassification
object (#397)
v1.0.1 (2021-02-19)
fixes
- fixed broken range on
yaxis
infairness_check
plot (#376) - warnings because
np.float
is depracated sincenumpy
v1.20 (#384)
other
- added
ipython
to test dependencies
v1.0.0 (2020-12-29)
breaking changes
These are summed up in (#368):
- rename modules:
dataset_level
intomodel_explanations
,instance_level
intopredict_explanations
,_arena
module intoarena
- use
__dir__
method to define autocompletion in IPython environment - show only['Explainer', 'Arena', 'fairness', 'datasets']
- add
plot
method andresult
attribute toLimeExplanation
(uselime.explanation.Explanation.as_pyplot_figure()
andlime.explanation.Explanation.as_list()
) CeterisParibus.plot(variable_type='categorical')
now has horizontal barplots -horizontal_spacing=None
by default (varies onvariable_type
). Also, once again added the "dot" for observation value.predict_fn
inpredict_surrogate
now usespredict_function
(trying to make it work for more frameworks)
fixes
- fixed wrong verbose output when any value in
y_hat/residuals
was anint
notfloat
- added proper
"-"
sign to negative dropout losses inVariableImportance.plot
features
- added
geom='bars'
toAggregateProfiles.plot
to force the categorical plot - added
geom='roc'
andgeom='lift'
toModelPerformance.plot
- added Fairness plot to Arena
other
- remove
colorize
fromExplainer
- updated the documentation, refactored code (import modules not functions, unify variable names in
object.py
, move utils funcitons fromchecks.py
toutils.py
, etc.) - added license notice next to data
v0.4.1 (2020-12-03)
- added support for
h2o.estimators.*
(#332) - added
tensorflow.python.keras.engine.functional.Functional
to thetensorflow
list - updated the
plotly
dependency to>=4.12.0
- code maintenance:
yhat
,check_data
fixes
- fixed
check_if_empty_fields()
used in loading theExplainer
from a pickle file, since several checks were changed - fixed
plot()
method inGroupFairnessClassification
as it omitted plotting a metric whenNaN
was present in metric ratios (result) - fixed
dragons
andHR
datasets having,
delimeter instead of.
, which transformed numerical columns into categorical. - fixed representation of the
ShapWrapper
class (removed_repr_html_
method)
features
- allow for
y
to be apandas.DataFrame
(converted) - allow for
data
,y
to be aH2OFrame
(converted) - added
label
parameter to all the relevantdx.Explainer
methods, which overrides the default label in explanation'sresult
- now using
GradientExplainer
fortf.keras.engine.sequential.Sequential
, added proper warning whenshap_explainer_type
isNone
(#366)
defaults
- unify verbose output of
Explainer
v0.4.0 (2020-11-17)
- added new
arena
module, which adds the backend for Arena dashboard @piotrpiatyszek
features
- added new aliases to
dx.Explainer
methods (#350) inmodel_parts
it is{'permutational': 'variable_importance', 'feature_importance': 'variable_importance'}
, inmodel_profile
it is{'pdp': 'partial', 'ale': 'accumulated'}
- added
Arena
object for dashboard backend. See https://github.com/ModelOriented/Arena - new
fairness
plot types:stacked
,radar
,performance_and_fairness
,heatmap
,ceteris_paribus_cutoff
- upgraded
fairness_check()
v0.3.0 (2020-10-26)
- added new
fairness
module, which will focus on bias detection, visualization and mitigation @jakwisn
fixes
- removed unnecessary warning when
precalculate=False and verbose=False
(#340)
features
- added
model_fairness
method to theExplainer
, which performs fairness explanation - added
GroupFairnessClassification
object, with theplot
method having two types:fairness_check
andmetric_scores
defaults
- added the
N=50000
argument toResidualDiagnostics.plot
, which samples observations from theresult
parameter to omit performance issues whensmooth=True
(#341)
v0.2.2 (2020-09-21)
- added support for
tensorflow.python.keras.engine.sequential.Sequential
andtensorflow.python.keras.engine.training.Model
(#326) - updated the
tqdm
dependency to>=4.48.2
,pandas
dependency to>=1.1.2
andnumpy
dependency to>=1.18.4
fixes
- fixed the wrong order of
Explainer
verbose messages - fixed a bug that caused
model_info
parameter to be overwritten by the default values - fixed a bug occurring when the variable from
groups
was not ofstr
type (#327) - fixed
model_profile
:variable_type='categorical'
not working when user passedvariables
parameter (#329) + the reverse order of bars in'categorical'
plots + (again) addedvariable_splits_type
parameter tomodel_profile
to specify how grid points shall be calculated (#266) + allow for both'quantile'
and'quantiles'
types (alias)
features
- added informative error messages when importing optional dependencies (#316)
- allow for
data
andy
to beNone
- added checks inExplainer
methods
defaults
- wrong parameter name
title_x
changed toy_title
inCeterisParibus.plot
andAggregatedProfiles.plot
(#317) - now warning the user in
Explainer
whenpredict_function
returns an error or doesn't returnnumpy.ndarray (1d)
(#325)
v0.2.1 (2020-08-31)
- updated the
pandas
dependency to>=1.1.0
fixes
ModelPerformance.plot
now uses a drwhy color palette- use
unique
method instead ofnp.unique
invariable_splits
(#293) v0.2.0
didn't export new datasets- fixed a bug where
predict_parts(type='shap')
calculated wrongcontributions
(#300) model_profile
uses observation mean instead of profile mean in_yhat_
centering- fixed barplot baseline in categorical
model_profile
andpredict_profile
plots (#297) - fixed
model_profile(type='accumulated')
giving wrong results (#302) - vertical/horizontal lines in plots now end on the plot edges
features
- added new
type='shap_wrapper'
topredict_parts
andmodel_parts
methods, which returns a newShapWrapper
object. It contains the main result attribute (shapley_values
) and the plot method (force_plot
andsummary_plot
respectively). These come from the shap package Explainer.predict
method now acceptsnumpy.ndarray
- added the
ResidualDiagnostics
object with aplot
method - added
model_diagnostics
method to theExplainer
, which performs residual diagnostics - added
predict_surrogate
method to theExplainer
, which is a wrapper for thelime
tabular explanation from the lime package - added
model_surrogate
method to theExplainer
, which creates a basic surrogate decision tree or linear model from the black-box model using the scikit-learn package - added a
_repr_html_
method to all of the explanation objects (it prints theresult
attribute) - added
dalex.__version__
- added informative error messages in
Explainer
methods wheny
is of wrong type (#294) CeterisParibus.plot(variable_type='categorical')
now allows for multiple observations- new verbose checks for
model_type
- add
type
tomodel_info
indump
anddumps
for R compatibility (#303) ModelPerformance.result
now haslabel
as index
defaults
- removed
_grid_
column inAggregatedProfiles.result
andcenter
only works withtype=accumulated
- use
Pipeline._final_estimator
to extractmodel_class
of the actual model - use
model._estimator_type
to extractmodel_type
if possible
v0.2.0 (2020-08-07)
- major documentation update (#270)
- unified the order of function parameters
fixes
v0.1.9
had wrong_original_
column inpredict_profile
vertical_spacing
acts as intended inVariableImportance.plot
whensplit='variable'
loss_function='auc'
now usesloss_one_minus_auc
as this should be a descending measure- plots are now saved with the original height and width
model_profile
now properly passes thevariables
parameter toCeterisParibus
variables
parameter inpredict_profile
now can also be a string
features
- use
px.express
instead of coreplotly
to makemodel_profile
andpredict_profile
plots; thus, enhance performance and scalability - added
verbose
parameter wheretqdm
is used to verbose progress bar - added
loss_one_minus_auc
function that can be used withloss_function='1-auc'
inmodel_parts
- added new example data sets:
apartments
,dragons
andhr
- added
color
,opacity
,title_x
parameters tomodel_profile
andpredict_profile
plots (#236), changed tooltips and legends (#262) - added
geom='profiles'
parameter tomodel_profile
plot andraw_profiles
attribute toAggregatedProfiles
- added
variable_splits_type
parameter topredict_profile
to specify how grid points shall be calculated (#266) - added
variable_splits_with_obs
parameter topredict_profile
function to extend split points with observation variable values (#269) - added
variable_splits
parameter tomodel_profile
defaults
- use different
loss_function
for classification and regression (#248) - models that use
proba
yhats now getmodel_type='classification'
if it's not specified - use uniform way of grid points calculation in
predict_profile
andmodel_profile
(seevariable_splits_type
parameter) - add the variable values of
new_observation
tovariable_splits
inpredict_profile
(seevariable_splits_with_obs
parameter) - use
N=1000
inmodel_parts
andN=300
inmodel_profile
to comply with the R version keep_raw_permutation
is now set toFalse
instead ofNone
inmodel_parts
intercept
parameter inmodel_profile
is now namedcenter
v0.1.9 (2020-07-01)
- feature: added
random_state
parameter forpredict_parts(type='shap')
andmodel_profile
for reproducible calculations - fix: fixed
random_state
parameter inmodel_parts
- feature: multiprocessing added for:
model_profile
,model_parts
,predict_profile
andpredict_parts(type='shap')
, through theprocesses
parameter - fix: significantly improved the speed of
accumulated
andconditional
types inmodel_profile
- bugfix: use pd.api.types.is_numeric_dtype()
instead of
np.issubdtype()
to cover more types; e.g. it caused errors withstring
type - defaults: use pd.convert_dtypes()
on the result of
CeterisParibus
to fix variable dtypes and later allow for a concatenation without the dtype conversion - fix:
variables
parameter now can be a singlestr
value - fix: number rounding in
predict_parts
,model_parts
(#245) - fix: CP calculations for models that take only variables as an input
v0.1.8 (2020-05-28)
- bugfix:
variable_splits
parameter now works correctly inpredict_profile
- bugfix: fix baseline for 3+ models in
AggregatedProfiles.plot
(#234) - printing: now rounding numbers in
Explainer
messages - fix: minor checks fixes in
instance_level
- bugfix:
AggregatedProfiles.plot
now works withgroups
v0.1.7 (2020-05-10)
- feature: parameter
N
inmodel_profile
can be set toNone
, to select all observations - input:
groups
andvariable
parameters inmodel_profile
can be:str
,list
,numpy.ndarray
,pandas.Series
- fix:
check_label
returned only a first letter - bugfix: removed the conversion of
all_variables
tostr
inprepare_all_variables
, which caused an error inmodel_profile
(#214) - defaults: change numpy data variable names from numbers to strings
v0.1.6 (2020-04-30)
- fix: change
short_name
encoding infifa
dataset (utf8->ascii) - fix: remove
scipy
dependency - defaults: default
loss_root_mean_square
in model parts changed tormse
- bugfix: checks related to
new_observation
inBreakDown, Shap, CeterisParibus
now work for multiple inputs (#207) - bugfix:
CeterisParibus.fit
andCeterisParibus.plot
now work for more types ofnew_observation.index
, but won't work for abolean
type (#211)
v0.1.5 (2020-04-21)
- feature: add
xgboost
package compatibility (#188) - feature: added
model_class
parameter toExplainer
to handle wrapped models - feature:
Exaplainer
attributemodel_info
remembers if parameters are default - bugfix:
variable_groups
parameter now works correctly inmodel_parts
- fix: changed parameter order in
Explainer
:model_type
,model_info
,colorize
- documentation:
model_parts
documentation is updated - feature: new
show
parameter inplot
methods that (if False
) returnsplotly Figure
(#190) - feature:
load_fifa()
function which loads the preprocessed players_20 dataset - fix:
CeterisParibus.plot
tooltip
v0.1.4 (2020-04-14)
- feature: new
Explainer.residual
method which usesresidual_function
to calculateresiduals
- feature: new
dump
anddumps
methods for savingExplainer
in a binary form;load
andloads
methods for loadingExplainer
from binary form - fix:
Explainer
constructor verbose text - bugfix:
B:=B+1
-Shap
now stores average results asB=0
and path results asB=1,2,...
- bugfix:
Explainer.model_performance
method usesself.model_type
whenmodel_type
isNone
- bugfix: values in
BreakDown
andShap
are now rounded to 4 significant places (#180) - bugfix:
Shap
by default usespath='average'
,sign
column is properly updated and bars inplot
are sorted byabs(contribution)
v0.1.3 (2020-04-10)
- release of the
dalex
package Explainer
object withpredict
,predict_parts
,predict_profile
,model_performance
,model_parts
andmodel_profile
methodsBreakDown
,Shap
,CeterisParibus
,ModelPerformance
,VariableImportance
andAggregatedProfiles
objects with aplot
methodload_titanic()
function which loads thetitanic_imputed
dataset
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.