Responsible Machine Learning in Python

These details have not been verified by PyPI

Project links

Project description

dalex

dalex: Responsible Machine Learning in Python

Overview

Unverified black box model is the path to the failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection.

The dalex package xrays any model and helps to explore and explain its behaviour, helps to understand how complex models are working. The main Explainer object creates a wrapper around a predictive model. Wrapped models may then be explored and compared with a collection of model-level and predict-level explanations. Moreover, there are fairness methods and interactive exploration dashboards available to the user.

The philosophy behind dalex explanations is described in the Explanatory Model Analysis e-book.

Installation

The dalex package is available on PyPI

pip install dalex -U

Resources: https://dalex.drwhy.ai/python

API reference: https://dalex.drwhy.ai/python/api

Authors

Main authors of the dalex package are:

Under the supervision of Przemyslaw Biecek.

Other contributors:

Piotr Piatyszek maintains the arena module
Jakub Wisnewski maintains the fairness module

Citation

If you use dalex, please cite our paper:

@article{dalex,
  title={{dalex: Responsible Machine Learning with Interactive
          Explainability and Fairness in Python}},
  author={Hubert Baniecki and Wojciech Kretowicz and Piotr Piatyszek
          and Jakub Wisniewski and Przemyslaw Biecek},
  year={2020},
  journal={arXiv:2012.14406},
  url={https://arxiv.org/abs/2012.14406}
}

Changelog

v1.1.0 (18/04/2021)

breaking changes

fixed concurrent random seeds when processes > 1 (#392), which means that the results of parallel computation will vary between v1.1.0 and previous versions

fixes

GroupFairnessX.plot(type='fairness_check') generates ticks according to the x-axis range (#409)
GroupFainressRegression.plot(type='density') has a more readable hover - only for outliers (#409)
BreakDown.plot() wrongly displayed the "+all factors" bar when max_vars < p (#401)
GroupFairnessClassification.plot(type='metric_scores') did not handle NaN's (#399)

features

Experimental support for regression models in the fairness module. Added GroupFairnessRegression object, with the plot method having two types: fairness_check and density. Explainer.model_fairness method now depends on the model_type attribute. (#391)
added N parameter to the predict_parts method which is None by default (#402)
epsilon is now an argument of the GroupFairnessClassification object (#397)

v1.0.1 (19/02/2021)

fixes

fixed broken range on yaxis in fairness_check plot (#376)
warnings because np.float is depracated since numpy v1.20 (#384)

other

added ipython to test dependencies

v1.0.0 (29/12/2020)

breaking changes

These are summed up in (#368):

rename modules: dataset_level into model_explanations, instance_level into predict_explanations, _arena module into arena
use __dir__ method to define autocompletion in IPython environment - show only ['Explainer', 'Arena', 'fairness', 'datasets']
add plot method and result attribute to LimeExplanation (use lime.explanation.Explanation.as_pyplot_figure() and lime.explanation.Explanation.as_list())
CeterisParibus.plot(variable_type='categorical') now has horizontal barplots - horizontal_spacing=None by default (varies on variable_type). Also, once again added the "dot" for observation value.
predict_fn in predict_surrogate now uses predict_function (trying to make it work for more frameworks)

fixes

fixed wrong verbose output when any value in y_hat/residuals was an int not float
added proper "-" sign to negative dropout losses in VariableImportance.plot

features

added geom='bars' to AggregateProfiles.plot to force the categorical plot
added geom='roc' and geom='lift' to ModelPerformance.plot
added Fairness plot to Arena

other

remove colorize from Explainer
updated the documentation, refactored code (import modules not functions, unify variable names in object.py, move utils funcitons from checks.py to utils.py, etc.)
added license notice next to data

v0.4.1 (02/12/2020)

added support for h2o.estimators.* (#332)
added tensorflow.python.keras.engine.functional.Functional to the tensorflow list
updated the plotly dependency to >=4.12.0
code maintenance: yhat, check_data

fixes

fixed check_if_empty_fields() used in loading the Explainer from a pickle file, since several checks were changed
fixed plot() method in GroupFairnessClassification as it omitted plotting a metric when NaN was present in metric ratios (result)
fixed dragons and HR datasets having , delimeter instead of ., which transformed numerical columns into categorical.
fixed representation of the ShapWrapper class (removed _repr_html_ method)

features

allow for y to be a pandas.DataFrame (converted)
allow for data, y to be a H2OFrame (converted)
added label parameter to all the relevant dx.Explainer methods, which overrides the default label in explanation's result
now using GradientExplainer for tf.keras.engine.sequential.Sequential, added proper warning when shap_explainer_type is None (#366)

defaults

unify verbose output of Explainer

v0.4.0 (17/11/2020)

added new arena module, which adds the backend for Arena dashboard @piotrpiatyszek

features

added new aliases to dx.Explainer methods (#350) in model_parts it is {'permutational': 'variable_importance', 'feature_importance': 'variable_importance'}, in model_profile it is {'pdp': 'partial', 'ale': 'accumulated'}
added Arena object for dashboard backend. See https://github.com/ModelOriented/Arena
new fairness plot types: stacked, radar, performance_and_fairness, heatmap, ceteris_paribus_cutoff
upgraded fairness_check()

v0.3.0 (26/10/2020)

added new fairness module, which will focus on bias detection, visualization and mitigation @jakwisn

fixes

removed unnecessary warning when precalculate=False and verbose=False (#340)

features

added model_fairness method to the Explainer, which performs fairness explanation
added GroupFairnessClassification object, with the plot method having two types: fairness_check and metric_scores

defaults

added the N=50000 argument to ResidualDiagnostics.plot, which samples observations from the result parameter to omit performance issues when smooth=True (#341)

dalex 0.2.2

added support for tensorflow.python.keras.engine.sequential.Sequential and tensorflow.python.keras.engine.training.Model (#326)
updated the tqdm dependency to >=4.48.2, pandas dependency to >=1.1.2 and numpy dependency to >=1.18.4

fixes

fixed the wrong order of Explainer verbose messages
fixed a bug that caused model_info parameter to be overwritten by the default values
fixed a bug occurring when the variable from groups was not of str type (#327)
fixed model_profile: variable_type='categorical' not working when user passed variables parameter (#329) + the reverse order of bars in 'categorical' plots + (again) added variable_splits_type parameter to model_profile to specify how grid points shall be calculated (#266) + allow for both 'quantile' and 'quantiles' types (alias)

features

added informative error messages when importing optional dependencies (#316)
allow for data and y to be None - added checks in Explainer methods

defaults

wrong parameter name title_x changed to y_title in CeterisParibus.plot and AggregatedProfiles.plot (#317)
now warning the user in Explainer when predict_function returns an error or doesn't return numpy.ndarray (1d) (#325)

dalex 0.2.1

updated the pandas dependency to >=1.1.0

fixes

ModelPerformance.plot now uses a drwhy color palette
use unique method instead of np.unique in variable_splits (#293)
v0.2.0 didn't export new datasets
fixed a bug where predict_parts(type='shap') calculated wrong contributions (#300)
model_profile uses observation mean instead of profile mean in _yhat_ centering
fixed barplot baseline in categorical model_profile and predict_profile plots (#297)
fixed model_profile(type='accumulated') giving wrong results (#302)
vertical/horizontal lines in plots now end on the plot edges

features

added new type='shap_wrapper' to predict_parts and model_parts methods, which returns a new ShapWrapper object. It contains the main result attribute (shapley_values) and the plot method (force_plot and summary_plot respectively). These come from the shap package
Explainer.predict method now accepts numpy.ndarray
added the ResidualDiagnostics object with a plot method
added model_diagnostics method to the Explainer, which performs residual diagnostics
added predict_surrogate method to the Explainer, which is a wrapper for the lime tabular explanation from the lime package
added model_surrogate method to the Explainer, which creates a basic surrogate decision tree or linear model from the black-box model using the scikit-learn package
added a _repr_html_ method to all of the explanation objects (it prints the result attribute)
added dalex.__version__
added informative error messages in Explainer methods when y is of wrong type (#294)
CeterisParibus.plot(variable_type='categorical') now allows for multiple observations
new verbose checks for model_type
add type to model_info in dump and dumps for R compatibility (#303)
ModelPerformance.result now has label as index

defaults

removed _grid_ column in AggregatedProfiles.result and center only works with type=accumulated
use Pipeline._final_estimator to extract model_class of the actual model
use model._estimator_type to extract model_type if possible

dalex 0.2.0

major documentation update (#270)
unified the order of function parameters

fixes

v0.1.9 had wrong _original_ column in predict_profile
vertical_spacing acts as intended in VariableImportance.plot when split='variable'
loss_function='auc' now uses loss_one_minus_auc as this should be a descending measure
plots are now saved with the original height and width
model_profile now properly passes the variables parameter to CeterisParibus
variables parameter in predict_profile now can also be a string

features

use px.express instead of core plotly to make model_profile and predict_profile plots; thus, enhance performance and scalability
added verbose parameter where tqdm is used to verbose progress bar
added loss_one_minus_auc function that can be used with loss_function='1-auc' in model_parts
added new example data sets: apartments, dragons and hr
added color, opacity, title_x parameters to model_profile and predict_profile plots (#236), changed tooltips and legends (#262)
added geom='profiles' parameter to model_profile plot and raw_profiles attribute to AggregatedProfiles
added variable_splits_type parameter to predict_profile to specify how grid points shall be calculated (#266)
added variable_splits_with_obs parameter to predict_profile function to extend split points with observation variable values (#269)
added variable_splits parameter to model_profile

defaults

use different loss_function for classification and regression (#248)
models that use proba yhats now get model_type='classification' if it's not specified
use uniform way of grid points calculation in predict_profile and model_profile (see variable_splits_type parameter)
add the variable values of new_observation to variable_splits in predict_profile (see variable_splits_with_obs parameter)
use N=1000 in model_parts and N=300 in model_profile to comply with the R version
keep_raw_permutation is now set to False instead of None in model_parts
intercept parameter in model_profile is now named center

dalex 0.1.9

feature: added random_state parameter for predict_parts(type='shap') and model_profile for reproducible calculations
fix: fixed random_state parameter in model_parts
feature: multiprocessing added for: model_profile, model_parts, predict_profile and predict_parts(type='shap'), through the processes parameter
fix: significantly improved the speed of accumulated and conditional types in model_profile
bugfix: use pd.api.types.is_numeric_dtype() instead of np.issubdtype() to cover more types; e.g. it caused errors with string type
defaults: use pd.convert_dtypes() on the result of CeterisParibus to fix variable dtypes and later allow for a concatenation without the dtype conversion
fix: variables parameter now can be a single str value
fix: number rounding in predict_parts, model_parts (#245)
fix: CP calculations for models that take only variables as an input

dalex 0.1.8

bugfix: variable_splits parameter now works correctly in predict_profile
bugfix: fix baseline for 3+ models in AggregatedProfiles.plot (#234)
printing: now rounding numbers in Explainer messages
fix: minor checks fixes in instance_level
bugfix: AggregatedProfiles.plot now works with groups

dalex 0.1.7

feature: parameter N in model_profile can be set to None, to select all observations
input: groups and variable parameters in model_profile can be: str, list, numpy.ndarray, pandas.Series
fix: check_label returned only a first letter
bugfix: removed the conversion of all_variables to str in prepare_all_variables, which caused an error in model_profile (#214)
defaults: change numpy data variable names from numbers to strings

dalex 0.1.6

fix: change short_name encoding in fifa dataset (utf8->ascii)
fix: remove scipy dependency
defaults: default loss_root_mean_square in model parts changed to rmse
bugfix: checks related to new_observation in BreakDown, Shap, CeterisParibus now work for multiple inputs (#207)
bugfix: CeterisParibus.fit and CeterisParibus.plot now work for more types of new_observation.index, but won't work for a bolean type (#211)

dalex 0.1.5

feature: add xgboost package compatibility (#188)
feature: added model_class parameter to Explainer to handle wrapped models
feature: Exaplainer attribute model_info remembers if parameters are default
bugfix: variable_groups parameter now works correctly in model_parts
fix: changed parameter order in Explainer: model_type, model_info, colorize
documentation: model_parts documentation is updated
feature: new show parameter in plot methods that (if False) returns plotly Figure (#190)
feature: load_fifa() function which loads the preprocessed players_20 dataset
fix: CeterisParibus.plot tooltip

dalex 0.1.4

feature: new Explainer.residual method which uses residual_function to calculate residuals
feature: new dump and dumps methods for saving Explainer in a binary form; load and loads methods for loading Explainer from binary form
fix: Explainer constructor verbose text
bugfix: B:=B+1 - Shap now stores average results as B=0 and path results as B=1,2,...
bugfix: Explainer.model_performance method uses self.model_type when model_type is None
bugfix: values in BreakDown and Shap are now rounded to 4 significant places (#180)
bugfix: Shap by default uses path='average', sign column is properly updated and bars in plot are sorted by abs(contribution)

dalex 0.1.3

release of the dalex package
Explainer object with predict, predict_parts, predict_profile, model_performance, model_parts and model_profile methods
BreakDown, Shap, CeterisParibus, ModelPerformance, VariableImportance and AggregatedProfiles objects with a plot method
load_titanic() function which loads the titanic_imputed dataset

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.7.2

Feb 12, 2025

1.7.1

Oct 2, 2024

1.7.0

Feb 28, 2024

1.6.0

Feb 16, 2023

1.5.0

Sep 7, 2022

1.4.1

Nov 8, 2021

1.4.0

Sep 9, 2021

1.3.0

Jul 17, 2021

1.2.0

May 31, 2021

This version

1.1.0

Apr 18, 2021

1.0.1

Feb 19, 2021

1.0.0

Dec 28, 2020

0.4.1

Dec 2, 2020

0.4.0

Nov 17, 2020

0.3.0

Oct 26, 2020

0.2.2

Sep 21, 2020

0.2.1

Aug 31, 2020

0.2.0

Aug 7, 2020

0.1.9

Jul 1, 2020

0.1.8

May 28, 2020

0.1.7

May 10, 2020

0.1.6

Apr 30, 2020

0.1.5

Apr 21, 2020

0.1.4

Apr 14, 2020

0.1.3

Apr 9, 2020

0.1.2

Mar 27, 2020

0.1.1

Mar 13, 2020

0.1.0

Mar 13, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dalex-1.1.0.tar.gz (966.4 kB view details)

Uploaded Apr 18, 2021 Source

File details

Details for the file dalex-1.1.0.tar.gz.

File metadata

Download URL: dalex-1.1.0.tar.gz
Upload date: Apr 18, 2021
Size: 966.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/3.10.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.9.2

File hashes

Hashes for dalex-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`fbcceab03ea3c9a6a34e0b16449af75eb7933f7f2113548fcaba6d618a607499`
MD5	`4d3b959644bafe2458665bf66bbcede6`
BLAKE2b-256	`f3505ad59eccfe1d4fd86a9518929ad0c16c7ddb5575cdbf93fcc539e77e177c`

See more details on using hashes here.

dalex 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dalex

Overview

Installation

Resources: https://dalex.drwhy.ai/python

API reference: https://dalex.drwhy.ai/python/api

Authors

Citation

Changelog

v1.1.0 (18/04/2021)

breaking changes

fixes

features

v1.0.1 (19/02/2021)

fixes

other

v1.0.0 (29/12/2020)

breaking changes

fixes

features

other

v0.4.1 (02/12/2020)

fixes

features

defaults

v0.4.0 (17/11/2020)

features

v0.3.0 (26/10/2020)

fixes

features

defaults

dalex 0.2.2

fixes

features

defaults

dalex 0.2.1

fixes

features

defaults

dalex 0.2.0

fixes

features

defaults

dalex 0.1.9

dalex 0.1.8

dalex 0.1.7

dalex 0.1.6

dalex 0.1.5

dalex 0.1.4

dalex 0.1.3

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes