Skip to main content

Dynamask - Explaining Time Series Predictions with Dynamic Masks

Project description

Dynamask - Explaining Time Series Predictions with Dynamic Masks

Tests Downloads pdf License: MIT

image

Code Author: Jonathan Crabbé (jc2133@cam.ac.uk)

This repository contains the implementation of Dynamask, a method to identify the features that are salient for a model to issue its prediction when the data is represented in terms of time series. For more details on the theoretical side, please read our ICML 2021 paper: 'Explaining Time Series Predictions with Dynamic Masks'.

Part of the experiments in our paper are relying on FIT, another repository associated to the NeurIPS 2021 paper: 'What went wrong and when? Instance-wise feature importance for time-series black-box models'. We have included all the relevant files in the folder fit.

:rocket: Installation

The library requires libpq-dev.

The library can be installed from PyPI using

$ pip install dynamask

or from source, using

$ pip install .

Toy example

It is very easy to fit a mask on a time series model. Bellow, you can find a toy demonstration where we fit a mask to an input time series. In this case, the mask area is fixed to 0.1 (the 10% most important features are highlighted by the mask). All the relevant code can be found in the file mask.

import torch
from dynamask.attribution.mask import Mask
from dynamask.attribution.perturbation import GaussianBlur
from dynamask.utils.losses import mse

torch.manual_seed(42)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Define a pseudo-black box:
def black_box(input):
    output = input[-1, :]  # The black-box returns the features of the last time step
    return output
# Define a random input:
X = torch.randn(10, 3).to(device) # The shape of the input has to be (T, N_features)

# Fit a mask to the input with a Gaussian Blur perturbation:
pert = GaussianBlur(device)
mask = Mask(pert, device)
mask.fit(X, black_box, loss_function=mse, keep_ratio=0.1, size_reg_factor_init=0.01) # Select the 10% most important features

# Plot the resulting saliency map:
mask.plot_mask()

If the proportion of features to select is unknown, a good approach is to fit a group of masks with different areas. Then, the extremal mask can be extracted from the group. The relevant code can be found in the file mask_group.

import torch
from dynamask.attribution.mask_group import MaskGroup
from dynamask.attribution.perturbation import GaussianBlur
from dynamask.utils.losses import mse

torch.manual_seed(42)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Define a pseudo-black box:
def black_box(input):
    output = input[-1, :]  # The black-box returns the features of the last time step
    return output

# Define a random input:
X = torch.randn(10, 3).to(device) # The shape of the input has to be (T, N_features)

# Fit a group of masks to the input with a Gaussian Blur perturbation:
areas = [.1, .15, .2, .25] # These are the areas of the different masks
pert = GaussianBlur(device)
masks = MaskGroup(pert, device)
masks.fit(X, black_box, loss_function=mse, area_list=areas, size_reg_factor_init=0.01)

# Extract the extremal mask:
epsilon = 0.01
mask = masks.get_extremal_mask(threshold=epsilon)

# Plot the resulting saliency map:
mask.plot_mask()

Replicate experiments

All experiments in the ICML paper can be replicated easily. The necessary code is in experiments. Bellow, we detail the procedure for each experiment.

Scripts to run the experiments are also provided: experiments/run_<EXPERIMENT>.sh.

Replicate the Rare experiments

  1. Run the following command from the repository folder:
    python -m dynamask.experiments.rare_feature # Runs the Rare Feature experiment
    python -m dynamask.experiments.rare_time # Runs the Rare Time experiment
    
    To do the experiment with various seeds, please add the following specification to these commands:
    Options:
    --cv # An integer that sets the random seed (first run cv=0, second run cv=1, ...)
    
  2. The results of these experiments are saved in the two following folders: Rare Feature and Rare Time. To process the results and compute the associated metrics run:
    python -m dynamask.experiments.results.rare_feature.get_results
    python -m dynamask.experiments.results.rare_time.get_results
    
    The following options need to be specified:
    Options:
    --CV # The number of runs you have done for the experiment
    --explainers # The baselines you have used among: dynamask, fo, fp, ig, shap (separated by a space)
    

Replicate the State experiment

  1. Run this command to generate the synthetic data and store it in data/state:

    python -m dynamask.fit.data_generator.state_data --signal_len 200 --signal_num 1000
    
  2. Run the following command to fit a model together with a baseline saliency method:

    python -m dynamask.fit.evaluation.baselines --explainer fit --train
    

    To do the experiment with various baselines, please change the explainer:

    Options:
    --explainer # The baselines can be: fit, lime, retain, integrated_gradient, deep_lift, fo, afo, gradient_shap
    --train # Only put this option when fitting the FIRST baseline (this is to avoid retraining a model for each baseline); however, required for retain baseline
    --cv # An integer that sets the random seed (first run cv=0, second run cv=1, ...)
    
  3. The models and baselines saliency maps are all saved in this folder. Now fit a mask for each of these time series by running:

    python -m dynamask.experiments.state
    

    Please use the same --cv option as for the previous command.

  4. The masks are all saved in this folder. To process the results and compute the associated metrics run:

    python -m dynamask.experiments.results.state.get_results
    

    The following options need to be specified:

    Options:
    --CV # The number of runs you have done for the experiment
    --explainers # The baselines you have used among: dynamask, fo, afo, deep_lift, fit, gradient_shap, integrated_gradient, lime, retain (separated by a space)
    

Replicate the MIMIC experiment

  1. MIMIC-III is a private dataset. For the following, you need to have the MIMIC-III database running on a local server. For more information, please refer to the official MIMIC-III documentation.

  2. Run this command to acquire the data and store it:

    python dynamask/fit/data_generator/icu_mortality.py --sqluser YOUR_USER --sqlpass YOUR_PASSWORD
    

    If everything happens properly, two files named adult_icu_vital.gz and adult_icu_lab.gz are stored in data/mimic.

  3. Run this command to preprocess the data:

    python dynamask/fit/data_generator/data_preprocess.py
    

    If everything happens properly, a file patient_vital_preprocessed.pkl is stored in data/mimic.

  4. Run the following command to fit a model together with a baseline saliency method:

    python -m dynamask.fit.evaluation.baselines --data mimic --explainer fit --train
    

    To do the experiment with various baselines, please change the explainer:

    Options:
    --explainer # The baselines can be: fit, lime, retain, integrated_gradient, deep_lift, fo, afo, gradient_shap
    --train # Only put this option when fitting the FIRST baseline (this is to avoid retraining a model for each baseline); however, required for retain baseline
    --cv # An integer that sets the random seed (first run cv=0, second run cv=1, ...)
    
  5. The models and baselines saliency maps are all saved in this folder. Now fit a mask for each of these time series by running:

    python -m dynamask.experiments.mimic
    

    Please use the same --cv option as for the previous command.

    Options:
    --cv # Same as in the previous command
    --area # The area of the mask to fit (a number between 0 and 1)
    
  6. The masks are all saved in this folder. To process the results and compute the associated metrics run:

    python -m dynamask.experiments.results.state.plot_benchmarks
    

    The following options need to be specified:

    Options:
    --CV # The number of runs you have done for the experiment
    --explainers # The baselines you have used among: dynamask, fo, afo, deep_lift, fit, gradient_shap, integrated_gradient, lime, retain (separated by a space)
    --areas # The mask areas that you have computed (separated by a space)
    

    The resulting plots are saved in this folder.

:hammer: Tests

Install the testing dependencies using

pip install .[testing]

The tests can be executed using

pytest -vsx

Citing

If you use this code, please cite the associated paper:

@InProceedings{pmlr-v139-crabbe21a,
  title = 	 {Explaining Time Series Predictions with Dynamic Masks},
  author =       {Crabb{\'e}, Jonathan and Van Der Schaar, Mihaela},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {2166--2177},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/crabbe21a/crabbe21a.pdf},
  url = 	 {https://proceedings.mlr.press/v139/crabbe21a.html},
  abstract = 	 {How can we explain the predictions of a machine learning model? When the data is structured as a multivariate time series, this question induces additional difficulties such as the necessity for the explanation to embody the time dependency and the large number of inputs. To address these challenges, we propose dynamic masks (Dynamask). This method produces instance-wise importance scores for each feature at each time step by fitting a perturbation mask to the input sequence. In order to incorporate the time dependency of the data, Dynamask studies the effects of dynamic perturbation operators. In order to tackle the large number of inputs, we propose a scheme to make the feature selection parsimonious (to select no more feature than necessary) and legible (a notion that we detail by making a parallel with information theory). With synthetic and real-world data, we demonstrate that the dynamic underpinning of Dynamask, together with its parsimony, offer a neat improvement in the identification of feature importance over time. The modularity of Dynamask makes it ideal as a plug-in to increase the transparency of a wide range of machine learning models in areas such as medicine and finance, where time series are abundant.}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

dynamask-0.0.1-py3-none-macosx_10_14_x86_64.whl (92.8 kB view details)

Uploaded Python 3 macOS 10.14+ x86-64

dynamask-0.0.1-py3-none-any.whl (93.4 kB view details)

Uploaded Python 3

File details

Details for the file dynamask-0.0.1-py3-none-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for dynamask-0.0.1-py3-none-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 ea7f93ac573b12a4732d3add3db39b50000e31dbf1daf4df3be106bf170e6bc0
MD5 60c50b9ae62ab870816ae35dd32f441f
BLAKE2b-256 9360d7091ee661edbc0519ed6ae86161f3b0e3d1f9a3458148b31ccfa3292008

See more details on using hashes here.

File details

Details for the file dynamask-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: dynamask-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 93.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.9

File hashes

Hashes for dynamask-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a64c6714cc4b97d7d5e30c520df3593008b2a2fcf6d8664eaaf6a1ae7f51239d
MD5 91e4003031cbfbfcf0644e5d67f9bd0c
BLAKE2b-256 2604770485a3a4e45d8d35f80208dc37a677510f8715c5114f2b8cd148dcd285

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page