Skip to main content

Benchmark interpretability methods.

Project description

BIM - Benchmark Interpretability Method

This repository contains dataset, models, and metrics for benchmarking interpretability methods (BIM) described in paper: * Title: "BIM: Towards Quantitative Evaluation of Interpretability Methods with Ground Truth" * Authors: Sherry (Mengjiao) Yang, Been Kim

Upon using this library, please cite: @Article{BIM2019, title = {{BIM: Towards Quantitative Evaluation of Interpretability Methods with Ground Truth}}, author = {Yang, Mengjiao and Kim, Been}, year = {2019} }

BIM atasets and models will be fully released by the end of June 2019.


The core of BIM dataset, obj and scene, are constructed by pasting object pixels from MSCOCO to scene images from MiniPlaces. The obj set and scene set have object labels and scene labels respectively. In each set, val_loc.txt contains x_min, y_min, x_max, y_max of the objects, and val_mask contains objects' binary masks.

To compute the BIM metrics, we provide additional image sets described in the table below.

Download Training Validation Usage Description
obj 90,000 10,000 Model contrast Objects and scenes with object labels
scene 90,000 10,000 Model contrast
Input dependence
Objects and scenes with scene labels
scene_only 90,000 10,000 Input dependence Images in scene with objects removed
dog_bedroom - 200 Relative model contrast Dog in bedroom labeled as bedroom
bamboo_forest - 100 Input independence Scene-only bamboo forest
bamboo_forest_patch - 100 Input independence Bamboo forest with functionally insignificant dog patch


As shown in the figure above, the obj model is trained on object labels and the scene model is trained on scene labels. We also provide the model trained on scene-only images and a set of models where the object occurs in a different number of classes. All models are in TensorFlow's SavedModel format.

Download | obj | scene | scene_only | scene1 | scene2 | scene3 | scene4 | scene5 | scene6 | scene7 | scene8 | scene9 | scene10 -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --


BIM metrics compare how interpretability methods perform across models (model contrast), across inputs to the same model (input dependence), and across functionally equivalent inputs to the same model (input independence).

Model contrast scores

Given images that contain both objects and scenes, model contrast measures the difference in attributions between the model trained on object labels and the model trained on scene labels.

Input dependence rate

Given a model trained on scene labels, input dependence measures the ratio of which the objects are attributed as less important compared to when objects are absent.

Input independence rate

Given a model trained on scene-only images, input independence measures the ratio of which a functionally insignificant patch (e.g., a dog) does not affect explanations significantly.


Run pip install bim to install python dependencies. You can choose to run to download the entire dataset and models specified above, or follow the download link for a particular data or model and extract the tar.gz to the corresponding data or models directory. Then you can run python3 --metrics=MCS --num_imgs=10 to compute the model contrast scores (MCS) over randomly sampled 10 images. Since computing saliency maps for a large amount of input images can take a while, we also provide precomputed attributions. To compute BIM metrics using precomputed attributions, run python3 --metrics=MCS --scratch=0

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bim-0.0.tar.gz (11.3 kB view hashes)

Uploaded source

Built Distributions

bim-0.0.0-py3-none-any.whl (17.0 kB view hashes)

Uploaded py3

bim-0.0-py3-none-any.whl (17.1 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page