Skip to main content

PyMint (Python-based Model INTerpretations) is a user-friendly python package for computing and plotting machine learning interpretation output.

Project description

codecov Updates Python 3 Code style: black PyPI Documentation Status

PyMint (Python-based Model INTerpretations) is designed to be a user-friendly package for computing and plotting machine learning interpretability/explainability output in Python. Current computation includes

All of these methods are discussed at length in Christoph Molnar's interpretable ML book. Most calculations can be performed in parallel when multi-core processing is available. The primary feature of this package is the accompanying built-in plotting methods, which are desgined to be easy to use while producing publication-level quality figures. Documentation for PyMint can be found at https://py-mint.readthedocs.io/en/master/.

The package is under active development and will likely contain bugs or errors. Feel free to raise issues!

This package is largely original code, but also includes snippets or chunks of code from preexisting packages. Our goal is not take credit from other code authors, but to make a single source for computing several machine learning interpretation methods. Here is a list of packages used in PyMint: PyALE, PermutationImportance, ALEPython, SHAP, Scikit-Learn

If you employ PyMint in your research, please cite this github and the relevant packages listed above.

If you are experiencing issues with loading the tutorial jupyter notebooks, you can enter the URL/location of the notebooks into the following address: https://nbviewer.jupyter.org/.

Install

PyMint can be installed through pip, but we are working on uploading to conda-forge.

pip install py-mint

Dependencies

PyMint is compatible with Python 3.6 or newer. PyMint requires the following packages:

numpy 
pandas
scikit-learn
matplotlib
shap>=0.30.0
xarray>=0.16.0
tqdm
statsmodels
seaborn>=0.11.0

Initializing PyMint

The interface of PyMint is the InterpretToolkit, which houses the computations and plotting methods for all the interpretability methods contained within. Once initialized InterpretToolkit can compute a variety of interpretability methods and plot them. See the tutorial notebooks for examples.

import pymint

# Loads three ML models (random forest, gradient-boosted tree, and logistic regression)
# trained on a subset of the road surface temperature data from Handler et al. (2020).
estimators = pymint.load_models()
X,y = pymint.load_data()

explainer = pymint.InterpretToolkit(estimators=estimators,X=X,y=y,)

Permutation Importance

For predictor ranking, PyMint uses both single-pass and multiple-pass permutation importance method ([Breiman 2001]; Lakshmanan et al. 2015; McGovern et al. 2019). We can calculate the permutation importance and then plot the results. In the tutorial it discusses options to make the figure publication-quality giving the plotting method additional argument to convert the feature names to a more readable format or color coding by feature type.

perm_results = explainer.permutation_importance(n_vars=10, evaluation_fn='auc')
explainer.plot_importance(data=perm_results)

Sample notebook can be found here: Permutation Importance

Partial dependence and Accumulated Local Effects

To compute the expected functional relationship between a feature and an ML model's prediction, you can use partial dependence or accumulated local effects. There is also an option for second-order interaction effects. For the choice of feature, you can manually select or can run the permutation importance and a built-in method will retrieve those features. It is also possible to configure the plot for readable feature names.

# Assumes the .permutation_importance has already been run.
important_vars = explainer.get_important_vars(results, multipass=True, nvars=7)

ale = explainer.ale(features=important_vars, n_bins=20)
explainer.plot_ale(ale)

Additionally, you can use the same code snippet to compute the second-order ALE (see the notebook for more details).

Sample notebook can be found here:

Feature Contributions

To explain specific examples, you can use SHAP values. PyMint employs both KernelSHAP for any model and TreeSHAP for tree-based methods. In future work, PyMint will also include DeepSHAP for convolution neural network-based models. PyMint can create the summary and dependence plots from the shap python package, but is adapted for multiple predictors and an easier user interface. It is also possible to plot contributions for a single example or summarized by model performance.

import shap
single_example = examples.iloc[[0]]
explainer = pymint.InterpretToolkit(estimators=estimators[0], X=single_example,)

background_dataset = shap.sample(examples, 100)
results = explainer.local_contributions(method='shap', background_dataset=background_dataset)
fig = explainer.plot_contributions(results)

explainer = pymint.InterpretToolkit(estimators=estimators[0],X=X, y=y)

background_dataset = shap.sample(examples, 100)
results = explainer.local_contributions(method='shap', background_dataset=background_dataset, performance_based=True,)
fig = myInterpreter.plot_contributions(results)

explainer = pymint.InterpretToolkit(estimators=estimators[0],X=X, y=y)
                                
background_dataset = shap.sample(examples, 100)
results = explainer.shap(background_dataset=background_dataset)
shap_values, bias = results['Random Forest']
explainer.plot_shap(plot_type = 'summary', shap_values=shap_values,) 

from pymint.common import plotting_config

features = ['tmp2m_hrs_bl_frez', 'sat_irbt', 'sfcT_hrs_ab_frez', 'tmp2m_hrs_ab_frez', 'd_rad_d']
explainer.plot_shap(features=features,
                        plot_type = 'dependence',
                        shap_values=shap_values,
                        display_feature_names=plotting_config.display_feature_names,
                        display_units = plotting_config.display_units,
                        to_probability=True)

Sample notebook can be found here:

Tutorial notebooks

The notebooks provides the package documentation and demonstrate PyMint API, which was used to create the above figures. If you are experiencing issues with loading the jupyter notebooks, you can enter the URL/location of the notebooks into the following address: https://nbviewer.jupyter.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py-mint-0.2.6.tar.gz (103.6 kB view details)

Uploaded Source

Built Distribution

py_mint-0.2.6-py2.py3-none-any.whl (35.1 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file py-mint-0.2.6.tar.gz.

File metadata

  • Download URL: py-mint-0.2.6.tar.gz
  • Upload date:
  • Size: 103.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.11

File hashes

Hashes for py-mint-0.2.6.tar.gz
Algorithm Hash digest
SHA256 3a685485243eb303a505fe841f8460ec6b7eeba50779a45cd0a38f2329d60529
MD5 fb6141d35f41c945bab1a84404bca31c
BLAKE2b-256 7dd57385509ec3ae53d6d4d1aa6a0b2d9ae1579c399ac15944dea38bf7074094

See more details on using hashes here.

File details

Details for the file py_mint-0.2.6-py2.py3-none-any.whl.

File metadata

  • Download URL: py_mint-0.2.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 35.1 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.11

File hashes

Hashes for py_mint-0.2.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6dc1e1e46bcc91d5a568910cf32512516c16c4358dd8c85cf16dfbbb171b8362
MD5 ce29025b98b2a4d0f1dbec1f0fd572d7
BLAKE2b-256 96cbaa4149d68bfeeaa561a6b2b15e2707d6f42c1caa838c62cbbf8ded0e2577

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page