Skip to main content

Explore, Explain and Examine Predictive Models

Project description

dalex

http://dalex.drwhy.ai/

Python-check Supported Python versions PyPI version Downloads

Overview

Unverified black box model is the path to the failure. Opaqueness leads to distrust. Distrust leads to ignoration. Ignoration leads to rejection.

The dalex package xrays any model and helps to explore and explain its behaviour, helps to understand how complex models are working. The main Explainer object creates a wrapper around a predictive model. Wrapped models may then be explored and compared with a collection of local and global explainers. Recent developents from the area of Interpretable Machine Learning/eXplainable Artificial Intelligence.

The philosophy behind dalex explanations is described in the Explanatory Model Analysis e-book.

The dalex package is a part of DrWhy.AI universe.

Installation

The dalex package is available on PyPI

pip install dalex -U

Resources

Plots

This package uses plotly to render the plots:

Learn more

Machine Learning models are widely used and have various applications in classification or regression tasks. Due to increasing computational power, availability of new data sources and new methods, ML models are more and more complex. Models created with techniques like boosting, bagging of neural networks are true black boxes. It is hard to trace the link between input variables and model outcomes. They are use because of high performance, but lack of interpretability is one of their weakest sides.

In many applications we need to know, understand or prove how input variables are used in the model and what impact do they have on final model prediction. dalex is a set of tools that help to understand how complex models are working.

Talk with your model! at USeR 2020

Authors

Main authors of the dalex package are:

Under the supervision of Przemyslaw Biecek.

Other contributors:


Changelog

v0.4.0 (17/11/2020)

  • added new arena module, which adds the backend for Arena dashboard @piotrpiatyszek

features

  • added new aliases to dx.Explainer methods (#350) in model_parts it is {'permutational': 'variable_importance', 'feature_importance': 'variable_importance'}, in model_profile it is {'pdp': 'partial', 'ale': 'accumulated'}
  • added Arena object for dashboard backend. See https://github.com/ModelOriented/Arena
  • new fairness plot types: stacked, radar, performance_and_fairness, heatmap, ceteris_paribus_cutoff
  • upgraded fairness_check()

v0.3.0 (26/10/2020)

  • added new fairness module, which will focus on bias detection, visualization and mitigation @jakwisn

bug fixes

  • removed unnecessary warning when precalculate=False and verbose=False (#340)

features

  • added model_fairness method to the Explainer, which performs fairness explanation
  • added GroupFairnessClassification object, with the plot method having two types fairness_check and metric_scores

defaults

  • added the N=50000 argument to ResidualDiagnostics.plot, which samples observations from the result parameter to omit performance issues when smooth=True (#341)

dalex 0.2.2

  • added support for tensorflow.python.keras.engine.sequential.Sequential and tensorflow.python.keras.engine.training.Model (#326)
  • updated the tqdm dependency to >=4.48.2, pandas dependency to >=1.1.2 and numpy dependency to >=1.18.4

bug fixes

  • fixed the wrong order of Explainer verbose messages
  • fixed a bug that caused model_info parameter to be overwritten by the default values
  • fixed a bug occurring when the variable from groups was not of str type (#327)
  • fixed model_profile: variable_type='categorical' not working when user passed variables parameter (#329) + the reverse order of bars in 'categorical' plots + (again) added variable_splits_type parameter to model_profile to specify how grid points shall be calculated (#266) + allow for both 'quantile' and 'quantiles' types (alias)

features

  • added informative error messages when importing optional dependencies (#316)
  • allow for data and y to be None - added checks in Explainer methods

defaults

  • wrong parameter name title_x changed to y_title in CeterisParibus.plot and AggregatedProfiles.plot (#317)
  • now warning the user in Explainer when predict_function returns an error or doesn't return numpy.ndarray (1d) (#325)

dalex 0.2.1

  • updated the pandas dependency to >=1.1.0

bug fixes

  • ModelPerformance.plot now uses a drwhy color palette
  • use unique method instead of np.unique in variable_splits (#293)
  • v0.2.0 didn't export new datasets
  • fixed a bug where predict_parts(type='shap') calculated wrong contributions (#300)
  • model_profile uses observation mean instead of profile mean in _yhat_ centering
  • fixed barplot baseline in categorical model_profile and predict_profile plots (#297)
  • fixed model_profile(type='accumulated') giving wrong results (#302)
  • vertical/horizontal lines in plots now end on the plot edges

features

  • added new type='shap_wrapper' to predict_parts and model_parts methods, which returns a new ShapWrapper object. It contains the main result attribute (shapley_values) and the plot method (force_plot and summary_plot respectively). These come from the shap package
  • Explainer.predict method now accepts numpy.ndarray
  • added the ResidualDiagnostics object with a plot method
  • added model_diagnostics method to the Explainer, which performs residual diagnostics
  • added predict_surrogate method to the Explainer, which is a wrapper for the lime tabular explanation from the lime package
  • added model_surrogate method to the Explainer, which creates a basic surrogate decision tree or linear model from the black-box model using the scikit-learn package
  • added a _repr_html_ method to all of the explanation objects (it prints the result attribute)
  • added dalex.__version__
  • added informative error messages in Explainer methods when y is of wrong type (#294)
  • CeterisParibus.plot(variable_type='categorical') now allows for multiple observations
  • new verbose checks for model_type
  • add type to model_info in dump and dumps for R compatibility (#303)
  • ModelPerformance.result now has label as index

defaults

  • removed _grid_ column in AggregatedProfiles.result and center only works with type=accumulated
  • use Pipeline._final_estimator to extract model_class of the actual model
  • use model._estimator_type to extract model_type if possible

dalex 0.2.0

  • major documentation update (#270)
  • unified the order of function parameters

bug fixes

  • v0.1.9 had wrong _original_ column in predict_profile
  • vertical_spacing acts as intended in VariableImportance.plot when split='variable'
  • loss_function='auc' now uses loss_one_minus_auc as this should be a descending measure
  • plots are now saved with the original height and width
  • model_profile now properly passes the variables parameter to CeterisParibus
  • variables parameter in predict_profile now can also be a string

features

  • use px.express instead of core plotly to make model_profile and predict_profile plots; thus, enhance performance and scalability
  • added verbose parameter where tqdm is used to verbose progress bar
  • added loss_one_minus_auc function that can be used with loss_function='1-auc' in model_parts
  • added new example data sets: apartments, dragons and hr
  • added color, opacity, title_x parameters to model_profile and predict_profile plots (#236), changed tooltips and legends (#262)
  • added geom='profiles' parameter to model_profile plot and raw_profiles attribute to AggregatedProfiles
  • added variable_splits_type parameter to predict_profile to specify how grid points shall be calculated (#266)
  • added variable_splits_with_obs parameter to predict_profile function to extend split points with observation variable values (#269)
  • added variable_splits parameter to model_profile

defaults

  • use different loss_function for classification and regression (#248)
  • models that use proba yhats now get model_type='classification' if it's not specified
  • use uniform way of grid points calculation in predict_profile and model_profile (see variable_splits_type parameter)
  • add the variable values of new_observation to variable_splits in predict_profile (see variable_splits_with_obs parameter)
  • use N=1000 in model_parts and N=300 in model_profile to comply with the R version
  • keep_raw_permutation is now set to False instead of None in model_parts
  • intercept parameter in model_profile is now named center

dalex 0.1.9

  • feature: added random_state parameter for predict_parts(type='shap') and model_profile for reproducible calculations
  • fix: fixed random_state parameter in model_parts
  • feature: multiprocessing added for: model_profile, model_parts, predict_profile and predict_parts(type='shap'), through the processes parameter
  • fix: significantly improved the speed of accumulated and conditional types in model_profile
  • bugfix: use pd.api.types.is_numeric_dtype() instead of np.issubdtype() to cover more types; e.g. it caused errors with string type
  • defaults: use pd.convert_dtypes() on the result of CeterisParibus to fix variable dtypes and later allow for a concatenation without the dtype conversion
  • fix: variables parameter now can be a single str value
  • fix: number rounding in predict_parts, model_parts (#245)
  • fix: CP calculations for models that take only variables as an input

dalex 0.1.8

  • bugfix: variable_splits parameter now works correctly in predict_profile
  • bugfix: fix baseline for 3+ models in AggregatedProfiles.plot (#234)
  • printing: now rounding numbers in Explainer messages
  • fix: minor checks fixes in instance_level
  • bugfix: AggregatedProfiles.plot now works with groups

dalex 0.1.7

  • feature: parameter N in model_profile can be set to None, to select all observations
  • input: groups and variable parameters in model_profile can be: str, list, numpy.ndarray, pandas.Series
  • fix: check_label returned only a first letter
  • bugfix: removed the conversion of all_variables to str in prepare_all_variables, which caused an error in model_profile (#214)
  • defaults: change numpy data variable names from numbers to strings

dalex 0.1.6

  • fix: change short_name encoding in fifa dataset (utf8->ascii)
  • fix: remove scipy dependency
  • defaults: default loss_root_mean_square in model parts changed to rmse
  • bugfix: checks related to new_observation in BreakDown, Shap, CeterisParibus now work for multiple inputs (#207)
  • bugfix: CeterisParibus.fit and CeterisParibus.plot now work for more types of new_observation.index, but won't work for a bolean type (#211)

dalex 0.1.5

  • feature: add xgboost package compatibility (#188)
  • feature: added model_class parameter to Explainer to handle wrapped models
  • feature: Exaplainer attribute model_info remembers if parameters are default
  • bugfix: variable_groups parameter now works correctly in model_parts
  • fix: changed parameter order in Explainer: model_type, model_info, colorize
  • documentation: model_parts documentation is updated
  • feature: new show parameter in plot methods that (if False) returns plotly Figure (#190)
  • feature: load_fifa() function which loads the preprocessed players_20 dataset
  • fix: CeterisParibus.plot tooltip

dalex 0.1.4

  • feature: new Explainer.residual method which uses residual_function to calculate residuals
  • feature: new dump and dumps methods for saving Explainer in a binary form; load and loads methods for loading Explainer from binary form
  • fix: Explainer constructor verbose text
  • bugfix: B:=B+1 - Shap now stores average results as B=0 and path results as B=1,2,...
  • bugfix: Explainer.model_performance method uses self.model_type when model_type is None
  • bugfix: values in BreakDown and Shap are now rounded to 4 significant places (#180)
  • bugfix: Shap by default uses path='average', sign column is properly updated and bars in plot are sorted by abs(contribution)

dalex 0.1.3

  • release of the dalex package
  • Explainer object with predict, predict_parts, predict_profile, model_performance, model_parts and model_profile methods
  • BreakDown, Shap, CeterisParibus, ModelPerformance, VariableImportance and AggregatedProfiles objects with a plot method
  • load_titanic() function which loads the titanic_imputed dataset

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dalex-0.4.0.tar.gz (937.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page