Skip to main content

Benchmarking tool for Counterfactual Explanations

Project description

Universal Counterfactual Benchmark Framework

Fastest way to test your tabular counterfactuals, evaluating 22 different datasets/models. All models are Keras/TensorFlow NN.

Ranking

You can see the ranking of the best counterfactual explanation generation algorithms in this repository: https://github.com/rmazzine/Ranking-Tabular-CF

Installation

pip install cfbench

Usage

This code will just run the counterfacutal generator my_cf_generator on all factual instances and models. Not creating any data, analysis or submitting to the benchmark. If you want to do that, see the examples in Further Examples.

import numpy as np
from cfbench.cfbench import BenchmarkCF

# A simple CF generator, when the factual class is 1
# return full 0 array, otherwise return full 1 array
def my_cf_generator(factual_array, model):
    if model.predict(np.array([factual_array]))[0][0] > 0.5:
        return [0]*len(factual_array)
    else:
        return [1]*len(factual_array)

# Create Benchmark Generator
benchmark_generator = BenchmarkCF().create_generator()

# The Benchmark loop
for benchmark_data in benchmark_generator:
    # Get factual array
    factual_array = benchmark_data['factual_oh']
    # Get Keras TensorFlow model
    model = benchmark_data['model']

    # Create CF
    cf = my_cf_generator(factual_array, model)

    # Get Evaluator
    evaluator = benchmark_data['cf_evaluator']
    # Evaluate CF
    evaluator(cf, verbose=True, algorithm_name="my_cf_generator")

Further information

We understand that different counterfactual generators need different data, so our generator provide multiple data described in the following table:

Click here for detailed info

The BenchmarkCF().create_generator() method returns a generator that provides the following data:

key Type Description
factual_oh list Factual, one hot encoded (if categorical features), data
model tf.Keras.Model Model to be explained
factual list Factual data (WITHOUT one hot encoding)
num_feats list Indexes of the numerical continuous features
cat_feats list Indexes of the categorical features
cf_evaluator BenchmarkGenerator.cf_evaluator Evaluates if the CF is indeed a CF. Returns [True, cf_array] if a CF and [False, nan_array] otherwise
oh_converter cfbench.cfg.OHConverter.Converter Converts to one hot .convert_to_oh or from one hot .convert
df_train pandas.DataFrame Dataframe of model's training data (WITHOUT one hot encoding)
df_oh_train pandas.DataFrame Dataframe of model's training data (WITH one hot encoding)
df_test pandas.DataFrame Dataframe of model's test data (WITHOUT one hot encoding)
df_oh_test pandas.DataFrame Dataframe of model's test data (WITH one hot encoding)
df_factual pandas.DataFrame Dataframe of factual data (WITHOUT one hot encoding)
tf_session tf.Session TensorFlow session
factual_idx int Index of the factual data in the factual dataset
factual_class int Model's prediction (0 or 1) of the factual data
dsname str Name of the dataset

I want to get general metrics of my counterfactual generator

If you want to get general metrics (coverage, sparsity, l2, mean absolute deviation, Mahalanobis distance, and generation time), you can use the sample code below

Click here to see the code

To generate a global analysis, you must create experiment data with the evaluator (benchmark_data['cf_evaluator']) and assigning the cf_generation_time (the time it took to generate the CF) and save_results as True, to create the data to be analyzed (that will be in the folder ./cfbench_results.)

Then, the analyze_results method makes the global analysis and returns a dataframe with all results processed in the folder ./cfbench_results_processed/. And it also prints the global metrics in the console.

import time
import numpy as np
from cfbench.cfbench import BenchmarkCF, analyze_results

# A simple CF generator, when the factual class is 1
# return full 0 array, otherwise return full 1 array
def my_cf_generator(factual_array, model):
    if model.predict(np.array([factual_array]))[0][0] > 0.5:
        return [0]*len(factual_array)
    else:
        return [1]*len(factual_array)

# Create Benchmark Generator
benchmark_generator = BenchmarkCF().create_generator()

# The Benchmark loop
for benchmark_data in benchmark_generator:
    # Get factual array
    factual_array = benchmark_data['factual_oh']
    # Get Keras TensorFlow model
    model = benchmark_data['model']

    # Create CF measuring how long it takes to generate the CF
    start_time = time.time()
    cf = my_cf_generator(factual_array, model)
    cf_generation_time = time.time() - start_time

    # Get Evaluator
    evaluator = benchmark_data['cf_evaluator']
    # Evaluate CF
    evaluator(
        cf_out=cf,
        algorithm_name='my_cf_generator',
        cf_generation_time=cf_generation_time,
        save_results=True)

analyze_results('my_cf_generator')

I want to rank my algorithm

If you want to compare your algorithm with others, you can use the code below.

Click here to see the code

To correctly send the results, you must create experiment data with the evaluator (benchmark_data['cf_evaluator']) and assigning the cf_generation_time (the time it took to generate the CF) and save_results as True, to create the data to be sent (that will be in the folder ./cfbench_results.)

After the experiment loop, you must call the send_results method of the evaluator, to send the results to the server.

This function will also create in the folder ./cfbench_results_processed/ a file with the processed results of your algorithm.

import time
import numpy as np
from cfbench.cfbench import BenchmarkCF, send_results

# A simple CF generator, when the factual class is 1
# return full 0 array, otherwise return full 1 array
def my_cf_generator(factual_array, model):
    if model.predict(np.array([factual_array]))[0][0] > 0.5:
        return [0]*len(factual_array)
    else:
        return [1]*len(factual_array)

# Create Benchmark Generator
benchmark_generator = BenchmarkCF().create_generator()

# The Benchmark loop
for benchmark_data in benchmark_generator:
    # Get factual array
    factual_array = benchmark_data['factual_oh']
    # Get Keras TensorFlow model
    model = benchmark_data['model']

    # Create CF measuring how long it takes to generate the CF
    start_time = time.time()
    cf = my_cf_generator(factual_array, model)
    cf_generation_time = time.time() - start_time

    # Get Evaluator
    evaluator = benchmark_data['cf_evaluator']
    # Evaluate CF
    evaluator(
        cf_out=cf,
        algorithm_name='my_cf_generator',
        cf_generation_time=cf_generation_time,
        save_results=True)

send_results('my_cf_generator')

After making the experiments and creating the analysis, you must fork this repository.

Then, you must provide the SSH path to your forked repo and, then, finally make a pull request to the main repository.

All these details are included in the algorithm, in a step-by-step process.

TensorFlow Version compatibility

This framework is supposed to be compatible with TensorFlow 1 and 2, however, problems can arise. Therefore, if you encounter any problem, please open an issue.

Reference

If you used this package on your experiments, here's the reference paper:

@Article{app11167274,
AUTHOR = {de Oliveira, Raphael Mazzine Barbosa and Martens, David},
TITLE = {A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data},
JOURNAL = {Applied Sciences},
VOLUME = {11},
YEAR = {2021},
NUMBER = {16},
ARTICLE-NUMBER = {7274},
URL = {https://www.mdpi.com/2076-3417/11/16/7274},
ISSN = {2076-3417},
DOI = {10.3390/app11167274}
}

0.0.9 / 2022-09-09

Allow disable/enable(default) TF session request

0.0.7 / 2022-09-03

Allow to enable/disable TensorFlow 2 behavior

0.0.7 / 2022-09-02

  • [BUGFIX] Remove binary features from one-hot encoding evaluation
  • [BUGFIX] Fix validity calculation

0.0.6 / 2022-09-01

  • Allow to add initial and final indexes

0.0.5 / 2022-09-01

  • Fix metric reporting
  • Add experiment progress information

0.0.4 / 2022-08-31

  • Implement method to send results to ranking repository
  • Implement global metrics method
  • Update README.md

0.0.3 / 2022-08-29

==================

  • Fix data files

0.0.2 / 2022-08-29

==================

  • Simplified interface to run the benchmark with module cfbench
  • Updated README.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cfbench-0.0.9.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

cfbench-0.0.9-py3-none-any.whl (21.3 kB view details)

Uploaded Python 3

File details

Details for the file cfbench-0.0.9.tar.gz.

File metadata

  • Download URL: cfbench-0.0.9.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for cfbench-0.0.9.tar.gz
Algorithm Hash digest
SHA256 3637dc01846b08ecdfbda11aaabd99f54aa76654e6681b29ca040b6661c280fc
MD5 f73bcf07f28dff7b067ac8459b3b59a0
BLAKE2b-256 273858d67808de157323c895555c8cd56005b171d9cb107312f1657007b0b70b

See more details on using hashes here.

File details

Details for the file cfbench-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: cfbench-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 21.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for cfbench-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 6cbf34f06c0e7cdddc2bfada266c525b49aaac0eef4ddb912c2906aa22f03098
MD5 61090a9438f64e07de9de9c05bdaa155
BLAKE2b-256 54f3770f101c03765b5939bb62a03d62a7097b6afd9ac1f39caf9f2acc550069

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page