Skip to main content

Responsible Machine Learning in Python

Project description

dalex

dalex: Responsible Machine Learning in Python

Python-check Supported Python versions PyPI version Downloads

Overview

Unverified black box model is the path to the failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection.

The dalex package xrays any model and helps to explore and explain its behaviour, helps to understand how complex models are working. The main Explainer object creates a wrapper around a predictive model. Wrapped models may then be explored and compared with a collection of model-level and predict-level explanations. Moreover, there are fairness methods and interactive exploration dashboards available to the user.

The philosophy behind dalex explanations is described in the Explanatory Model Analysis book.

Installation

The dalex package is available on PyPI and conda-forge.

pip install dalex -U

conda install -c conda-forge dalex

One can install optional dependencies for all additional features using pip install dalex[full].

Resources: https://dalex.drwhy.ai/python

API reference: https://dalex.drwhy.ai/python/api

http://python.drwhy.ai/

Authors

The authors of the dalex package are:

We welcome contributions: start by opening an issue on GitHub.

Citation

If you use dalex, please cite our JMLR paper:

@article{JMLR:v22:20-1473,
  author  = {Hubert Baniecki and
             Wojciech Kretowicz and
             Piotr Piatyszek and 
             Jakub Wisniewski and 
             Przemyslaw Biecek},
  title   = {dalex: Responsible Machine Learning 
             with Interactive Explainability and Fairness in Python},
  journal = {Journal of Machine Learning Research},
  year    = {2021},
  volume  = {22},
  number  = {214},
  pages   = {1-7},
  url     = {http://jmlr.org/papers/v22/20-1473.html}
}

Changelog

v1.7.1 (2024-10-02)

  • numpy>=2.0.0 compatibility: replace instances of x.ptp() with np.ptp(x) and np.Inf with np.inf (#571)
  • added a way to pass sample_weight to loss functions in model_parts() (variable importance) using weights from dx.Explainer (#563)
  • fixed the visualization of shap_wrapper for shap==0.45.0

v1.7.0 (2024-02-28)

  • increase the dependencies to python>=3.8, pandas>=1.5.0, numpy>=1.23.3 and add python==3.11 to CI
  • added keras.src.models.sequential.Sequential to classes with a known predict_function; it should fix changes in keras==3.0.0 and tensorflow==2.16.0
  • turn off verbose in the predict method of tensorflow/keras models that changed in tensorflow>=2.9.0
  • update the warning occurring when specifying variable_splits (#558)
  • fix an error occuring in predict_profile() when a DataFrame has MultiIndex in pandas>=1.3.0 (#550)
  • fix gaussian norm() calculation in model_profile() from pi*sqrt(2) to sqrt(2*pi)
  • fix a warning (future error) between prepare_numerical_categorical() and prepare_x() with pandas==2.1.0
  • fix a warning (future error) concerning the default value of numeric_only in pandas.DataFrame.corr() in dalex.aspect.calculate_assoc_matrix()

v1.6.0 (2023-02-16)

  • add ZeroDivisionError to precision and recall functions (#532)
  • add a warning to calculate_depend_matrix() when there is a variable with only one value (#537)
  • fix missing EDA plots in (Python) Arena (#544)
  • fix baseline positions in the subplots of the predict parts explanations: BreakDown, Shap (#545)

v1.5.0 (2022-09-07)

This release consists of mostly maintenance updates and, after a year, marks the Beta -> Stable release.

  • increase the dependency from python>=3.6 to python>=3.7 (at this moment, both numpy and pandas depend on python>=3.8), and add python==3.10 to CI
  • increase the dependencies to pandas>=1.2.5, numpy>=1.20.3 (#526), scipy>=1.6.3, plotly>=5.1.0, and tqdm>=4.61.2 due to errors with pandas (see tqdm/#1199)
  • remove the use of pd.Series.append() (#489)
  • remove the use of np.isnan causing error in dalex.fairness (#491)
  • fix iBreakDown plot y-axis labels (#493)
  • stop the Arena's werkzeug server using a clearner and still supported API (#518)

v1.4.1 (2021-11-08)

features

  • added fairness plot for regression models to Arena (dalex/#408)
  • added new facet_scales parameter to AP.plot and CP.plot, which allows to free the y-axis with facet_scales="free" (dalex/#469); consistent with R (DALEX/#468, ingredients/#140)

fixes

  • fixed AP and CP progress bars

v1.4.0 (2021-09-09)

  • added new aspect module, which will focus on groups of dependent variables @krzyzinskim & @arturzolkowski
  • added new scipy>=1.5.4 dependency

breaking changes

  • improved the calculation of AUC, ROC plot (#459)

fixes

  • wrong yaxis labels in VariableImportance.plot(split="variable") (#451)
  • repr_html() didn't work for explanation objects before using the fit method (#449)

features

  • added new Aspect object with the predict_triplot, model_triplot, predict_parts, model_parts, get_aspects methods
  • added new PredictTriplot, ModelTriplot, PredictAspectImportance, ModelAspectImportance objects with the plot method

v1.3.0 (2021-07-17)

features

  • added bias mitigation techniques (resample, reweight, roc_pivot) into the fairness module (#432)

v1.2.0 (2021-05-31)

breaking changes

  • method set_options in Arena now takies option_category instead of plot_type (SHAPValues => ShapleyValues, FeatureImportance => VariableImportance) (#420)
  • methods using the N parameter now properly sample rows from data

fixes

  • fixed wrong error value when no predict_function is found in Explainer (77ca90d)
  • set multiprocessing context to 'spawn' (#412)
  • fixed bug in metric_scores plot that made only one subgroup appear on y-axis (#416)
  • added support for older keras models (#415)

features

  • added a resource mechanism to Arena (#419)
  • added ShapleyValuesImportance and ShapleyValuesDependence plots to Arena (#420)
  • return error instead of NaN when AUC is calculated on observations from one class only (#415)

v1.1.0 (2021-04-18)

breaking changes

  • fixed concurrent random seeds when processes > 1 (#392), which means that the results of parallel computation will vary between v1.1.0 and previous versions

fixes

  • GroupFairnessX.plot(type='fairness_check') generates ticks according to the x-axis range (#409)
  • GroupFainressRegression.plot(type='density') has a more readable hover - only for outliers (#409)
  • BreakDown.plot() wrongly displayed the "+all factors" bar when max_vars < p (#401)
  • GroupFairnessClassification.plot(type='metric_scores') did not handle NaN's (#399)

features

  • Experimental support for regression models in the fairness module. Added GroupFairnessRegression object, with the plot method having two types: fairness_check and density. Explainer.model_fairness method now depends on the model_type attribute. (#391)
  • added N parameter to the predict_parts method which is None by default (#402)
  • epsilon is now an argument of the GroupFairnessClassification object (#397)

v1.0.1 (2021-02-19)

fixes

  • fixed broken range on yaxis in fairness_check plot (#376)
  • warnings because np.float is depracated since numpy v1.20 (#384)

other

  • added ipython to test dependencies

v1.0.0 (2020-12-29)

breaking changes

These are summed up in (#368):

  • rename modules: dataset_level into model_explanations, instance_level into predict_explanations, _arena module into arena
  • use __dir__ method to define autocompletion in IPython environment - show only ['Explainer', 'Arena', 'fairness', 'datasets']
  • add plot method and result attribute to LimeExplanation (use lime.explanation.Explanation.as_pyplot_figure() and lime.explanation.Explanation.as_list())
  • CeterisParibus.plot(variable_type='categorical') now has horizontal barplots - horizontal_spacing=None by default (varies on variable_type). Also, once again added the "dot" for observation value.
  • predict_fn in predict_surrogate now uses predict_function (trying to make it work for more frameworks)

fixes

  • fixed wrong verbose output when any value in y_hat/residuals was an int not float
  • added proper "-" sign to negative dropout losses in VariableImportance.plot

features

  • added geom='bars' to AggregateProfiles.plot to force the categorical plot
  • added geom='roc' and geom='lift' to ModelPerformance.plot
  • added Fairness plot to Arena

other

  • remove colorize from Explainer
  • updated the documentation, refactored code (import modules not functions, unify variable names in object.py, move utils funcitons from checks.py to utils.py, etc.)
  • added license notice next to data

v0.4.1 (2020-12-03)

  • added support for h2o.estimators.* (#332)
  • added tensorflow.python.keras.engine.functional.Functional to the tensorflow list
  • updated the plotly dependency to >=4.12.0
  • code maintenance: yhat, check_data

fixes

  • fixed check_if_empty_fields() used in loading the Explainer from a pickle file, since several checks were changed
  • fixed plot() method in GroupFairnessClassification as it omitted plotting a metric when NaN was present in metric ratios (result)
  • fixed dragons and HR datasets having , delimeter instead of ., which transformed numerical columns into categorical.
  • fixed representation of the ShapWrapper class (removed _repr_html_ method)

features

  • allow for y to be a pandas.DataFrame (converted)
  • allow for data, y to be a H2OFrame (converted)
  • added label parameter to all the relevant dx.Explainer methods, which overrides the default label in explanation's result
  • now using GradientExplainer for tf.keras.engine.sequential.Sequential, added proper warning when shap_explainer_type is None (#366)

defaults

  • unify verbose output of Explainer

v0.4.0 (2020-11-17)

  • added new arena module, which adds the backend for Arena dashboard @piotrpiatyszek

features

  • added new aliases to dx.Explainer methods (#350) in model_parts it is {'permutational': 'variable_importance', 'feature_importance': 'variable_importance'}, in model_profile it is {'pdp': 'partial', 'ale': 'accumulated'}
  • added Arena object for dashboard backend. See https://github.com/ModelOriented/Arena
  • new fairness plot types: stacked, radar, performance_and_fairness, heatmap, ceteris_paribus_cutoff
  • upgraded fairness_check()

v0.3.0 (2020-10-26)

  • added new fairness module, which will focus on bias detection, visualization and mitigation @jakwisn

fixes

  • removed unnecessary warning when precalculate=False and verbose=False (#340)

features

  • added model_fairness method to the Explainer, which performs fairness explanation
  • added GroupFairnessClassification object, with the plot method having two types: fairness_check and metric_scores

defaults

  • added the N=50000 argument to ResidualDiagnostics.plot, which samples observations from the result parameter to omit performance issues when smooth=True (#341)

v0.2.2 (2020-09-21)

  • added support for tensorflow.python.keras.engine.sequential.Sequential and tensorflow.python.keras.engine.training.Model (#326)
  • updated the tqdm dependency to >=4.48.2, pandas dependency to >=1.1.2 and numpy dependency to >=1.18.4

fixes

  • fixed the wrong order of Explainer verbose messages
  • fixed a bug that caused model_info parameter to be overwritten by the default values
  • fixed a bug occurring when the variable from groups was not of str type (#327)
  • fixed model_profile: variable_type='categorical' not working when user passed variables parameter (#329) + the reverse order of bars in 'categorical' plots + (again) added variable_splits_type parameter to model_profile to specify how grid points shall be calculated (#266) + allow for both 'quantile' and 'quantiles' types (alias)

features

  • added informative error messages when importing optional dependencies (#316)
  • allow for data and y to be None - added checks in Explainer methods

defaults

  • wrong parameter name title_x changed to y_title in CeterisParibus.plot and AggregatedProfiles.plot (#317)
  • now warning the user in Explainer when predict_function returns an error or doesn't return numpy.ndarray (1d) (#325)

v0.2.1 (2020-08-31)

  • updated the pandas dependency to >=1.1.0

fixes

  • ModelPerformance.plot now uses a drwhy color palette
  • use unique method instead of np.unique in variable_splits (#293)
  • v0.2.0 didn't export new datasets
  • fixed a bug where predict_parts(type='shap') calculated wrong contributions (#300)
  • model_profile uses observation mean instead of profile mean in _yhat_ centering
  • fixed barplot baseline in categorical model_profile and predict_profile plots (#297)
  • fixed model_profile(type='accumulated') giving wrong results (#302)
  • vertical/horizontal lines in plots now end on the plot edges

features

  • added new type='shap_wrapper' to predict_parts and model_parts methods, which returns a new ShapWrapper object. It contains the main result attribute (shapley_values) and the plot method (force_plot and summary_plot respectively). These come from the shap package
  • Explainer.predict method now accepts numpy.ndarray
  • added the ResidualDiagnostics object with a plot method
  • added model_diagnostics method to the Explainer, which performs residual diagnostics
  • added predict_surrogate method to the Explainer, which is a wrapper for the lime tabular explanation from the lime package
  • added model_surrogate method to the Explainer, which creates a basic surrogate decision tree or linear model from the black-box model using the scikit-learn package
  • added a _repr_html_ method to all of the explanation objects (it prints the result attribute)
  • added dalex.__version__
  • added informative error messages in Explainer methods when y is of wrong type (#294)
  • CeterisParibus.plot(variable_type='categorical') now allows for multiple observations
  • new verbose checks for model_type
  • add type to model_info in dump and dumps for R compatibility (#303)
  • ModelPerformance.result now has label as index

defaults

  • removed _grid_ column in AggregatedProfiles.result and center only works with type=accumulated
  • use Pipeline._final_estimator to extract model_class of the actual model
  • use model._estimator_type to extract model_type if possible

v0.2.0 (2020-08-07)

  • major documentation update (#270)
  • unified the order of function parameters

fixes

  • v0.1.9 had wrong _original_ column in predict_profile
  • vertical_spacing acts as intended in VariableImportance.plot when split='variable'
  • loss_function='auc' now uses loss_one_minus_auc as this should be a descending measure
  • plots are now saved with the original height and width
  • model_profile now properly passes the variables parameter to CeterisParibus
  • variables parameter in predict_profile now can also be a string

features

  • use px.express instead of core plotly to make model_profile and predict_profile plots; thus, enhance performance and scalability
  • added verbose parameter where tqdm is used to verbose progress bar
  • added loss_one_minus_auc function that can be used with loss_function='1-auc' in model_parts
  • added new example data sets: apartments, dragons and hr
  • added color, opacity, title_x parameters to model_profile and predict_profile plots (#236), changed tooltips and legends (#262)
  • added geom='profiles' parameter to model_profile plot and raw_profiles attribute to AggregatedProfiles
  • added variable_splits_type parameter to predict_profile to specify how grid points shall be calculated (#266)
  • added variable_splits_with_obs parameter to predict_profile function to extend split points with observation variable values (#269)
  • added variable_splits parameter to model_profile

defaults

  • use different loss_function for classification and regression (#248)
  • models that use proba yhats now get model_type='classification' if it's not specified
  • use uniform way of grid points calculation in predict_profile and model_profile (see variable_splits_type parameter)
  • add the variable values of new_observation to variable_splits in predict_profile (see variable_splits_with_obs parameter)
  • use N=1000 in model_parts and N=300 in model_profile to comply with the R version
  • keep_raw_permutation is now set to False instead of None in model_parts
  • intercept parameter in model_profile is now named center

v0.1.9 (2020-07-01)

  • feature: added random_state parameter for predict_parts(type='shap') and model_profile for reproducible calculations
  • fix: fixed random_state parameter in model_parts
  • feature: multiprocessing added for: model_profile, model_parts, predict_profile and predict_parts(type='shap'), through the processes parameter
  • fix: significantly improved the speed of accumulated and conditional types in model_profile
  • bugfix: use pd.api.types.is_numeric_dtype() instead of np.issubdtype() to cover more types; e.g. it caused errors with string type
  • defaults: use pd.convert_dtypes() on the result of CeterisParibus to fix variable dtypes and later allow for a concatenation without the dtype conversion
  • fix: variables parameter now can be a single str value
  • fix: number rounding in predict_parts, model_parts (#245)
  • fix: CP calculations for models that take only variables as an input

v0.1.8 (2020-05-28)

  • bugfix: variable_splits parameter now works correctly in predict_profile
  • bugfix: fix baseline for 3+ models in AggregatedProfiles.plot (#234)
  • printing: now rounding numbers in Explainer messages
  • fix: minor checks fixes in instance_level
  • bugfix: AggregatedProfiles.plot now works with groups

v0.1.7 (2020-05-10)

  • feature: parameter N in model_profile can be set to None, to select all observations
  • input: groups and variable parameters in model_profile can be: str, list, numpy.ndarray, pandas.Series
  • fix: check_label returned only a first letter
  • bugfix: removed the conversion of all_variables to str in prepare_all_variables, which caused an error in model_profile (#214)
  • defaults: change numpy data variable names from numbers to strings

v0.1.6 (2020-04-30)

  • fix: change short_name encoding in fifa dataset (utf8->ascii)
  • fix: remove scipy dependency
  • defaults: default loss_root_mean_square in model parts changed to rmse
  • bugfix: checks related to new_observation in BreakDown, Shap, CeterisParibus now work for multiple inputs (#207)
  • bugfix: CeterisParibus.fit and CeterisParibus.plot now work for more types of new_observation.index, but won't work for a bolean type (#211)

v0.1.5 (2020-04-21)

  • feature: add xgboost package compatibility (#188)
  • feature: added model_class parameter to Explainer to handle wrapped models
  • feature: Exaplainer attribute model_info remembers if parameters are default
  • bugfix: variable_groups parameter now works correctly in model_parts
  • fix: changed parameter order in Explainer: model_type, model_info, colorize
  • documentation: model_parts documentation is updated
  • feature: new show parameter in plot methods that (if False) returns plotly Figure (#190)
  • feature: load_fifa() function which loads the preprocessed players_20 dataset
  • fix: CeterisParibus.plot tooltip

v0.1.4 (2020-04-14)

  • feature: new Explainer.residual method which uses residual_function to calculate residuals
  • feature: new dump and dumps methods for saving Explainer in a binary form; load and loads methods for loading Explainer from binary form
  • fix: Explainer constructor verbose text
  • bugfix: B:=B+1 - Shap now stores average results as B=0 and path results as B=1,2,...
  • bugfix: Explainer.model_performance method uses self.model_type when model_type is None
  • bugfix: values in BreakDown and Shap are now rounded to 4 significant places (#180)
  • bugfix: Shap by default uses path='average', sign column is properly updated and bars in plot are sorted by abs(contribution)

v0.1.3 (2020-04-10)

  • release of the dalex package
  • Explainer object with predict, predict_parts, predict_profile, model_performance, model_parts and model_profile methods
  • BreakDown, Shap, CeterisParibus, ModelPerformance, VariableImportance and AggregatedProfiles objects with a plot method
  • load_titanic() function which loads the titanic_imputed dataset

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dalex-1.7.1.tar.gz (1.0 MB view details)

Uploaded Source

File details

Details for the file dalex-1.7.1.tar.gz.

File metadata

  • Download URL: dalex-1.7.1.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.13

File hashes

Hashes for dalex-1.7.1.tar.gz
Algorithm Hash digest
SHA256 59b0ecbf1c405688f659f3942cd83988da3890f1acf367977a8539e544f1866a
MD5 4e5cb509e184bc6b3dda7aec3cb0a499
BLAKE2b-256 e5e2c330ce24295846687814900776285f639a47f24a729f563ff8970b3140da

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page