Skip to main content

A model agnostic and gradient-free optimization method for generating counterfactuals

Project description

Genfact

A fast counterfactual generator for Causal analysis

DOI

Background

Counterfactual examples are samples which are minimally modified with respect to the original sample to alter the predicted value by a model. Thus, counterfactual explanations provide statements as smallest changes required to alter certain predicted value or decision and it proves to be quite useful in finding causal relationships in data. Genfact is a model agnostic and gradient-free optimization method for generating counterfactuals. It can generate multiple counterfactuals at once and can-do amortized inference, thus making the process fast. Given a dataset it can find counterfactuals pairs closest to each other and the pairs may not exist in the original dataset. This feature is useful in this context as the given dataset used for generating counterfactuals may not contain enough samples around the classification boundary, but Genfact can generate samples around the boundary. Reference paper can be found here.

Features

  • Generate fact and counterfact pairs on arbitrary relational dataset
  • Fast processing
  • Evaluate generated counterfactuals based on entropy and fitness
  • Preloaded with encoded Facebook test data
  • Inbuilt data preparation

Usage

Follow the step by step guide to get started with.

Installation

Either download from this github repo or install through pip

$ pip install genfact

Using it

Import the library.

import genfact as gf
  • Load the data in a pandas dataframe. Each attribute can be categorical or continous. Assign each categorical values a numeric code. Let the dataframe variable be data_df.
  • Prepare a list containing the datatypes of each attribute in the data. A categorical attribute should be represented as 'cat' and a continous by 'con'. Let the list if stored in a variable dtype.
  • Identify the targetvariable index and assign it to targetclass_idx

To generate counterfactuals run the following function

factuals,counterfactuals,factclass,cfactclass,classdistribution = gf.generate_counterfactuals(data_df,dtype,targetclass_idx, model=None, C=15, clustsize = 20, datafraction = 0.4, maxiterations = 10)

Hyperparameters

  • model represents the predictive model for featuredata and classdata. If None is supplied a Random forest model is trained
  • C represents the number of classes the target variable will be divided if it is a continous one. Please note If duplicates are present the actual number of buckets formed will be lesser than C.
  • clustsize represents number of clusters to be generated using the feature data.
  • datafraction represents the fraction of data that will be processed to generate counterfactuals
  • maxiterations represents the number of iterations the genetic algorithm will run

The given values are the default values of the hyperparameters

Output

  • factuals contain an array of facts from the feature data
  • counterfactuals contain an array of counterfacts for each facts
  • factclass contain an array of predicted class for each facts
  • cfactclass contain an array of predicted class for each counterfacts
  • classdistribution contain a dataframe representing the boundary of each classes if the target attribute in the input dataset is continous. Else it returns None.

Example

The following example shows how to run the counterfactual generator using the test data and evaluate them.

import genfact as gf
### load test data
data_df,dtype,targetclass_idx = gf.load_data()
factuals,counterfactuals,factclass,cfactclass,classdistribution = gf.generate_counterfactuals(data_df,dtype,targetclass_idx)
entropy,fitness = gf.evaluate_counterfactuals(factuals,counterfactuals,factclass,cfactclass)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genfact-1.4.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genfact-1.4-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file genfact-1.4.tar.gz.

File metadata

  • Download URL: genfact-1.4.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.5.0.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.9

File hashes

Hashes for genfact-1.4.tar.gz
Algorithm Hash digest
SHA256 f2054748eb3baacc2b70ffb95ca237ba15322099f5a85408c7d2fa580a13abbe
MD5 92c75c72131bce2b1247b0fecaff5790
BLAKE2b-256 604f1ee23613c19fbf93e2c96c5f75714ec0f25157e67724103d0624d9f65dcd

See more details on using hashes here.

File details

Details for the file genfact-1.4-py3-none-any.whl.

File metadata

  • Download URL: genfact-1.4-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.5.0.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.9

File hashes

Hashes for genfact-1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7c41f1b63215d13c5e29bd7d6427a82adc12162c9848b1b81dfe421576363283
MD5 921fdfde922c31744594ff6cee027019
BLAKE2b-256 447db91dc79189d736a7fcfb099c019d56250fc277c19570a59ddcf65ca3306f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page