Skip to main content

A framework for evaluating counterfactual explanations.

Project description

CEval — Counterfactual Explanation Evaluator

PyPI version Python License: MIT Paper

CEval is a lightweight Python package for evaluating the quality of counterfactual explanations produced by post-hoc XAI (Explainable AI) methods. It computes 14 established metrics with a single call and works with diverse architectures.

Paper: Bayrak, B., & Bach, K. (2024). Evaluation of Instance-Based Explanations: An In-Depth Analysis of Counterfactual Evaluation Metrics, Challenges, and the CEval Toolkit. IEEE Access. doi:10.1109/ACCESS.2024.3410540


Why CEval?

When you build or compare counterfactual explainers, you need more than one number to judge quality. CEval lets you measure all key dimensions (validity, proximity, sparsity, diversity, feasibility, and more) in a single unified framework, across different explainers and datasets.

from ceval import CEval

evaluator = CEval(samples=test_df, label="income", data=train_df, model=clf)
evaluator.add_explainer("DiCE",  dice_cfs,  "generated-cf")
evaluator.add_explainer("DICE+", dicep_cfs, "generated-cf")

print(evaluator.comparison_table)

Installation

pip install CEval

Requirements: Python ≥ 3.9, pandas, numpy, scikit-learn, scipy, gower, category-encoders


Metrics

Metric Description Needs model Needs data
validity Fraction of CFs that actually flip the classifier's prediction
proximity Average feature-space distance between instance and its CF
proximity_gower Proximity using the Gower mixed-type distance
sparsity Average fraction of features changed
count Average number of CFs per instance
diversity Determinant-based spread of the CF set
diversity_lcc Diversity weighted by label-class coverage
yNN Label consistency of the CF's k nearest neighbours
feasibility Average kNN distance of CFs to the training set
kNLN_dist Distance of CF to nearest same-class neighbour
relative_dist dist(x, CF) / dist(x, NUN)
redundancy Average number of unnecessary feature changes
plausibility dist(CF, NLN) / dist(NLN, NUN(NLN))
constraint_violation Fraction of CFs that break user constraints

Not every metric applies to every explanation type, CEval handles this automatically and fills non-applicable cells with "-".


Quick Start

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from ceval import CEval

# 1. Prepare your data
train_df = ...   # pd.DataFrame with features + label column
test_df  = ...   # pd.DataFrame with features + label column
clf      = RandomForestClassifier().fit(train_df.drop("label", axis=1),
                                        train_df["label"])

# 2. Generate counterfactuals with your favourite explainer
#    (DiCE, PertCF, DICE, NICE, etc.)
counterfactuals = ...   # pd.DataFrame, same columns as test_df

# 3. Evaluate
evaluator = CEval(
    samples    = test_df,        # instances to explain
    label      = "label",        # target column name
    data       = train_df,       # background dataset (unlocks more metrics)
    model      = clf,            # fitted classifier (unlocks more metrics)
    k_nn       = 5,              # neighbours for kNN-based metrics
    constraints= ["age"],        # features that must not change (optional)
)

evaluator.add_explainer(
    name            = "MyExplainer",
    explanations    = counterfactuals,
    exp_type        = "generated-cf",   # "generated-cf" | "existed-cf" |
                                        # "generated-factual" | "existed-factual"
    mode            = "1to1",           # "1to1" | "1toN"
)

print(evaluator.comparison_table)

Explanation types

exp_type When to use
"generated-cf" Counterfactuals synthesised by an algorithm (e.g. DiCE, PertCF)
"existed-cf" Counterfactuals retrieved from the training set
"generated-factual" Factual explanations generated by an algorithm
"existed-factual" Factual explanations retrieved from the training set

Explanation modes

mode DataFrame shape When to use
"1to1" Same number of rows as samples One explanation per instance
"1toN" Any number of rows + an "instance" column Multiple explanations per instance

Model Compatibility

CEval works with any classifier, not just scikit-learn.
Use the built-in wrappers from ceval.wrappers to adapt your model:

Framework Wrapper class Import
scikit-learn (none needed) pass model directly
XGBoost XGBoostWrapper from ceval.wrappers import XGBoostWrapper
LightGBM LightGBMWrapper from ceval.wrappers import LightGBMWrapper
CatBoost CatBoostWrapper from ceval.wrappers import CatBoostWrapper
PyTorch TorchWrapper from ceval.wrappers import TorchWrapper
Keras / TensorFlow KerasWrapper from ceval.wrappers import KerasWrapper
Anything else GenericWrapper from ceval.wrappers import GenericWrapper
# PyTorch
from ceval.wrappers import TorchWrapper
model = TorchWrapper(my_net, num_classes=2, device="cuda")

# XGBoost  (works with XGBClassifier and native Booster)
from ceval.wrappers import XGBoostWrapper
model = XGBoostWrapper(xgb_clf)

# Keras / TensorFlow
from ceval.wrappers import KerasWrapper
model = KerasWrapper(keras_model, num_classes=3)

# Anything else — supply two callables
from ceval.wrappers import GenericWrapper
model = GenericWrapper(
    predict_fn       = lambda X: my_model.infer(X).argmax(axis=1),
    predict_proba_fn = lambda X: my_model.infer(X),
)

# Then use as normal
evaluator = CEval(samples=test_df, label="income", data=train_df, model=model)

If you pass an incompatible model without a wrapper, CEval raises a clear TypeError that tells you exactly which wrapper to use.


See examples/demo_adult_income.py for a complete working demo that:

  • Loads the Adult Income dataset
  • Trains a Random Forest classifier
  • Generates counterfactuals with DiCE
  • Evaluates them in both 1-to-1 and 1-to-N mode
  • Prints a full comparison table
python examples/demo_adult_income.py

Expected output:

                     DiCE (1-to-1)  DiCE (1-to-N)
validity                      0.90          0.867
proximity_gower               0.11          0.152
sparsity                      0.32          0.347
yNN                           0.68          0.713
feasibility                  48.21         183.44
redundancy                    0.80          0.733
constraint_violation          0.50          0.233
...

Comparing Multiple Explainers

evaluator = CEval(samples=test_df, label="label", data=train_df, model=clf)

evaluator.add_explainer("DiCE",   dice_cfs,   "generated-cf", mode="1toN")
evaluator.add_explainer("PertCF", pertcf_cfs, "generated-cf", mode="1toN")
evaluator.add_explainer("NICE",   nice_cfs,   "existed-cf",   mode="1to1")

# Side-by-side comparison
print(evaluator.comparison_table.T)

API Reference

CEval:

CEval(samples, label, ...)

Parameter Type Default Description
samples pd.DataFrame required Instances to be explained (includes label column)
label str required Name of the target column
data pd.DataFrame None Full background dataset; unlocks distribution-based metrics
model sklearn estimator None Fitted classifier; unlocks prediction-based metrics
k_nn int 5 Neighbours for kNN metrics
encoder str None Category-encoder name for categoricals (default: OrdinalEncoder)
distance str None scipy distance metric for proximity; None uses built-in mixed metric
constraints list[str] None Feature names that must not change in valid CFs

evaluator.add_explainer:

evaluator.add_explainer(name, explanations, exp_type, mode="1to1")

Registers an explainer and computes all applicable metrics. Results are appended to evaluator.comparison_table.

evaluator.comparison_table:

A pd.DataFrame with one row per explainer and one column per metric. Non-applicable metrics show "-".


Citation

If you use CEval in your research, please cite:

@article{bayrak2024ceval,
  title   = {Evaluation of Instance-Based Explanations: An In-Depth Analysis of Counterfactual Evaluation Metrics, Challenges, and the CEval Toolkit},
  author  = {Bayrak, Bet{\"u}l and Bach, Kerstin},
  journal = {IEEE Access},
  year    = {2024},
  doi     = {10.1109/ACCESS.2024.3410540}
}

Related Work

This package is part of a broader research effort on counterfactual explanation methods:

  • PerCE — Personalised Counterfactual Explanations (IEEE)
  • PertCF — Perturbation-based Counterfactual Explainer (Paper | Code)

License

MIT © Betül Bayrak

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ceval-1.1.1.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ceval-1.1.1-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file ceval-1.1.1.tar.gz.

File metadata

  • Download URL: ceval-1.1.1.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ceval-1.1.1.tar.gz
Algorithm Hash digest
SHA256 c87609c2cc4117d763f154bf6f7f6b665d601e3065138256d0de3edf850e08bd
MD5 6b06d7414235aa5da0b8ec974f3a4cae
BLAKE2b-256 046c8109d4b34b2fff3e2b7dd5039c4ad17f7aafe653bb43559b0d6cd8df2030

See more details on using hashes here.

Provenance

The following attestation bundles were made for ceval-1.1.1.tar.gz:

Publisher: python-publish.yml on b-bayrak/CEval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ceval-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: ceval-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ceval-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1d4b672adfc06053f5caaeb65abe9e26438629d4376008866f45800e4ba39cbd
MD5 129d22f6169857d9838bb89e66385f4f
BLAKE2b-256 6b6fb9a8a67f0eb5bb41d50747d6547a210581f3f02d354913bdb7217f655af0

See more details on using hashes here.

Provenance

The following attestation bundles were made for ceval-1.1.1-py3-none-any.whl:

Publisher: python-publish.yml on b-bayrak/CEval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page