metaquantus

MetaQuantus is a XAI performance tool for identifying reliable metrics.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

An XAI performance tool for the identification of reliable metrics

PyTorch

This repository contains the code and experimental results for the paper The Meta-Evaluation Problem in Explainable AI: Identifying Reliable Estimators with MetaQuantus.

Python version

MetaQuantus is currently under active development. Carefully note the release version to ensure reproducibility of your work.

Motivation
Library
Installation
Getting started
MetaQuantus methodology
Reproduce the experiments

Motivation

In Explainable AI (XAI), the problem of meta-evaluation (i.e., the process of evaluating the evaluation method itself) arises as we select and quantitatively compare explanation methods for a given model, dataset and task---where the use of multiple metrics or evaluation techniques oftentimes lead to conflicting results. For example, scores from different metrics vary, both in range and direction, with lower or higher scores indicating higher quality explanations, making it difficult for practitioners to interpret the scores and select the best explanation method.

As illustrated in the Figure below, the two metrics, Faithfulness Correlation (FC) (Bhatt et al., 2020) and Pixel-Flipping (PF) (Bach et al., 2015) rank the same explanation methods differently. For example, the Gradient method (Mørch et al., 1995) (Baehrens et al., 2010) is both ranked the highest (R=1) and the lowest (R=3) depending on the metric used. From a practitioner's perspective, this causes confusion.

With MetaQuantus, we address the problem of meta-evaluation by providing a simple yet comprehensive framework that evaluates metrics against two failure modes: resilience to noise (NR) and reactivity to adversaries (AR). In a similar way that software systems undergo vulnerability and penetration tests before deployment, this open-sourced tool is designed to stress test evaluation methods (e.g., as provided by Quantus).

Library

MetaQuantus is an open-source, development tool for XAI researchers and Machine Learning (ML) practitioners to verify and benchmark newly constructed metrics (i.e., ``quality estimators''). It offers an easy-to-use API that simplifies metric selection such that the explanation method selection in XAI can be performed more reliably, with minimal code. MetaQuantus includes:

A series of pre-built tests such as ModelPerturbationTest and InputPertubrationTest that can be applied to various metrics
Supporting source code such as for plotting and analysis
Various tutorials e.g., Getting-Started-with-MetaQuantus and Reproduce-Paper-Experiments

Installation

If you already have PyTorch installed on your machine, the most light-weight version of MetaQuantus can be obtained from PyPI:

pip install metaquantus

Alternatively, you can download a local copy (and then, access the folder):

git clone https://github.com/annahedstroem/MetaQuantus.git
cd MetaQuantus

And then install it locally:

pip install -e .

Alternatively, you can simply install MetaQuantus with requirements.txt.

pip install -r requirements.txt

Note that these installation options require that PyTorch is already installed on your machine.

Package requirements

The package requirements are as follows:

python>=3.7.0
pytorch>=1.10.1
quantus>=0.3.2
captum>=0.4.1

Getting started

Please see Tutorial-Getting-Started-with-MetaQuantus.ipynb under tutorials/ folder to get started. Note that PyTorch framework and the XAI evalaution library Quantus is needed to run MetaQuantus.

Reproduce the paper experiments

To reproduce the results of this paper, you will need to follow these three steps:

Generate the dataset. Run the notebook Tutorial-Data-Generation-Experiments.ipynb to generate the necessary data for the experiments. This notebook will guide you through the process of downloading and preprocessing the data in order to save it to appropriate test sets. Please store the models in a folder called assets/models/ and the tests sets under assets/test_sets/.
Run the experiments. To obtain the results for the respective experiments, you have to run the respective Python scripts which are detailed below. All these Python files are located in the scripts/ folder. If you want to run the experiments on other explanation methods, datasets or models, feel free to change the hyperparameters.
Analyse the results. Once the results are obtained for your chosen experiments, run the Tutorial-Reproduce-Paper-Experiments.ipynb to analyse the results. (In the notebook itself, we have also listed which specific Python scripts that need to be run in order to obtain the results for this analysis step.)

Note. For all steps, make sure to adjust any local paths so that the approriate files can be retrieved. Make sure to have all the necessary packages installed as well as ensure to have GPUs enabled throughout the computing as this will speed up the experimentation considerably. Please also note that the results may vary slightly depending on the random seed and other hyperparameters of the experiments, but the overall trends and conclusions should remain the same.

More details step 2.

Test: Go to the root folder and run a simple test that meta-evaluation work.

python3 scripts/run_test.py --K=5 --iters=10 --dataset=MNIST

Application: Run the benchmarking experiments (also used for category convergence analysis).

python3 scripts/run_benchmarking.py --dataset=MNIST --fname=f --K=5 --iters=3
python3 scripts/run_benchmarking.py --dataset=fMNIST --fname=f --K=5 --iters=3
python3 scripts/run_benchmarking.py --dataset=cMNIST --fname=f --K=5 --iters=3

Application: Run hyperparameter optimisation experiment.

python3 scripts/run_hp.py --dataset=MNIST --K=3 --iters=2
python3 scripts/run_hp.py --dataset=ImageNet --K=3 --iters=2

Experiment: Run the faithfulness ranking disagreement exercise.

python3 scripts/run_ranking.py --dataset=cMNIST --fname=f --K=5 --iters=3 --category=Faithfulness

Sanity-Check: Run sanity-checking exercise: adversarial estimators.

python3 scripts/run_sanity_checks.py --dataset=ImageNet --K=3 --iters=2

Sanity-Check: Run sanity-checking exercise: L dependency.

python3 scripts/run_l_dependency.py --dataset=MNIST --K=5 --iters=3
python3 scripts/run_l_dependency.py --dataset=fMNIST --K=5 --iters=3
python3 scripts/run_l_dependency.py --dataset=cMNIST --K=5 --iters=3

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.5

Sep 13, 2023

0.0.4

May 10, 2023

0.0.3

Apr 1, 2023

This version

0.0.2

Feb 28, 2023

0.0.1

Feb 14, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metaquantus-0.0.2.tar.gz (52.5 kB view hashes)

Uploaded Feb 28, 2023 Source

Built Distribution

metaquantus-0.0.2-py3-none-any.whl (59.7 kB view hashes)

Uploaded Feb 28, 2023 Python 3

Hashes for metaquantus-0.0.2.tar.gz

Hashes for metaquantus-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`4bc68e863b996af128efbd53fd0538e4d639ad2a78e4ed33aa3562f31fc3fe47`
MD5	`69319abf755baffb12059361ffd1eded`
BLAKE2b-256	`f3db7b97f052476f832b1897f9dc8de16bd92f44beea8a8ae6cf76cd13abd65a`

Hashes for metaquantus-0.0.2-py3-none-any.whl

Hashes for metaquantus-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3030cda9e9e96e46d78881ba880576c011e315e304f9ae66f63a64aa1a672966`
MD5	`4fcab184d42478db4a5703de002ed17f`
BLAKE2b-256	`4d25dd5ed3fcd111390f4ab6c554e26faa7595b7db29a9295888d115de2bdd97`