Benchmark interpretability methods.
Project description
BIM - Benchmark Interpretability Method
This repository contains dataset, models, and metrics for benchmarking interpretability methods (BIM) described in paper: * Title: "BIM: Towards Quantitative Evaluation of Interpretability Methods with Ground Truth" * Authors: Sherry (Mengjiao) Yang, Been Kim
Upon using this library, please cite: @Article{BIM2019, title = {{BIM: Towards Quantitative Evaluation of Interpretability Methods with Ground Truth}}, author = {Yang, Mengjiao and Kim, Been}, year = {2019} }
BIM atasets and models will be fully released by the end of June 2019.
Dataset
The core of BIM dataset, obj and scene, are constructed by pasting object pixels from MSCOCO to scene images from MiniPlaces. The obj set and scene set have object labels and scene labels respectively. In each set, val_loc.txt contains x_min, y_min, x_max, y_max of the objects, and val_mask contains objects' binary masks.
To compute the BIM metrics, we provide additional image sets described in the table below.
Download | Training | Validation | Usage | Description |
---|---|---|---|---|
obj | 90,000 | 10,000 | Model contrast | Objects and scenes with object labels |
scene | 90,000 | 10,000 | Model contrast Input dependence |
Objects and scenes with scene labels |
scene_only | 90,000 | 10,000 | Input dependence | Images in scene with objects removed |
dog_bedroom | - | 200 | Relative model contrast | Dog in bedroom labeled as bedroom |
bamboo_forest | - | 100 | Input independence | Scene-only bamboo forest |
bamboo_forest_patch | - | 100 | Input independence | Bamboo forest with functionally insignificant dog patch |
Models
As shown in the figure above, the obj model is trained on object labels and the scene model is trained on scene labels. We also provide the model trained on scene-only images and a set of models where the object occurs in a different number of classes. All models are in TensorFlow's SavedModel format.
Download | obj | scene | scene_only | scene1 | scene2 | scene3 | scene4 | scene5 | scene6 | scene7 | scene8 | scene9 | scene10 -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
Metrics
BIM metrics compare how interpretability methods perform across models (model contrast), across inputs to the same model (input dependence), and across functionally equivalent inputs to the same model (input independence).
Model contrast scores
Given images that contain both objects and scenes, model contrast measures the difference in attributions between the model trained on object labels and the model trained on scene labels.
Input dependence rate
Given a model trained on scene labels, input dependence measures the ratio of which the objects are attributed as less important compared to when objects are absent.
Input independence rate
Given a model trained on scene-only images, input independence measures the ratio of which a functionally insignificant patch (e.g., a dog) does not affect explanations significantly.
Examples
Run pip install bim
to install python dependencies. You can choose to run
download.sh
to download the entire dataset and models specified above, or
follow the download link for a particular data or model and extract the tar.gz
to the corresponding data
or models
directory. Then you can run python3 metrics.py --metrics=MCS --num_imgs=10
to compute the model contrast scores
(MCS) over randomly sampled 10 images. Since computing saliency maps for a large
amount of input images can take a while, we also provide
precomputed attributions.
To compute BIM metrics using precomputed attributions, run python3 metrics.py --metrics=MCS --scratch=0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.