Skip to main content

No project description provided

Project description

supervised-discretization

This repository contains the code for the paper Supervised Feature Compression based on Counterfactual Analysis

Installation

  • The MILP problem for computing the Counterfactual Explanation for a point is implemented in Gurobi. An active Gurobi Licence is needed to run the code.

  • The package can be installed with the command:

pip install SupervisedDiscretization

Hyperparameters

The implementation of the FCCA procedure can be found in the file discretize.py that contains the Python class FCCA which takes the following parameters:

  • estimator: an unfitted binary classifier from the sklearn package. It can be one of the following: RandomForestClassifier, GradientBoosting, LinearSVC, SVC(kernel='linear'). It is also possible to take in input GridSearchCV to choose in cross validation the parameters of the estimator;
  • p0, p1: lower and upper bound for the classification probability of points for which computing the Counterfactual Explanation;
  • lambda0, lambda1, lambda2: hyperparameters for the Counterfactual Explanation problem that represents respectively the weights for the l0-, l1- and l2- norm;
  • compress: boolean that is set to True to merge thresholds whose absolute difference is smaller than 0.01;
  • timelimit: time limit in seconds for solving the Counterfactual Explanations problem.

The FCCA class offers the following methods:

  • fit: method for fitting the FCCA procedure;
  • transform: method for discretizing a dataset by using the set of thresholds previously computed via the fit method;
  • fit_transform: method for applying in sequence the fit and transform methods;
  • selectThresholds: method for setting a different value of Q after the fit has been called; this method allows to subsample the set of thresholds in a fast way without recomputing the FCCA procedure.

Execution

We report an example on how to use the FCCA procedure on new data. The example can also be found in the file example.py

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import GradientBoostingClassifier
from SupervisedDiscretization.discretize import FCCA

# Reading the dataset
data = pd.read_csv('datasets/boston.csv')
label_column = data.columns[-1]
feature_columns = data.columns[:-1]

# Scaling the features between 0 and 1
scaler = MinMaxScaler()
data[feature_columns] = scaler.fit_transform(data[feature_columns])

# Train - test split
data_ts = data.sample(n=int(0.3*len(data)))
data_tr = data.drop(index=data_ts.index)

x_tr, y_tr = data_tr[feature_columns], data_tr[label_column]
x_ts, y_ts = data_ts[feature_columns], data_ts[label_column]

# Target model
target = GradientBoostingClassifier(max_depth=1, n_estimators=100,learning_rate=0.1)

# Hyperparameters for the discretization - default values
discretizer = FCCA(target, p0=0.5, p1=1, lambda0=0.1, lambda1=1, lambda2=0)

# Discretization
x_tr_discr, y_tr_discr = discretizer.fit_transform(x_tr, y_tr)
x_ts_discr, y_ts_discr = discretizer.transform(x_ts, y_ts)

# Compression - inconsistency rate
print(f'Compression rate: {discretizer.compression_rate(x_ts, y_ts)}')
print(f'Inconsistency rate: {discretizer.inconsistency_rate(x_ts, y_ts)}')

print('Setting Q to 0.7')
# Increasing the value of Q
tao_q = discretizer.selectThresholds(0.7)

# Discretization
x_tr_discr, y_tr_discr = discretizer.transform(x_tr, y_tr, tao_q)
x_ts_discr, y_ts_discr = discretizer.transform(x_ts, y_ts, tao_q)

# Compression - inconsistency rate
print(f'Compression rate: {discretizer.compression_rate(x_ts, y_ts, tao_q)}')
print(f'Inconsistency rate: {discretizer.inconsistency_rate(x_ts, y_ts, tao_q)}')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SupervisedDiscretization-0.0.5.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

SupervisedDiscretization-0.0.5-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file SupervisedDiscretization-0.0.5.tar.gz.

File metadata

  • Download URL: SupervisedDiscretization-0.0.5.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for SupervisedDiscretization-0.0.5.tar.gz
Algorithm Hash digest
SHA256 e8da04c7a95b2fe92120df544baa9e38d76bf3a6c457467a25622db2eadcc1fe
MD5 1508bb2d89d5dba397122757dd95baee
BLAKE2b-256 3ff6d9bdf2ebaab2acaa2d2dfcc535ca472f5df4453e924f26bd2bf1882645d4

See more details on using hashes here.

File details

Details for the file SupervisedDiscretization-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for SupervisedDiscretization-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 87b3ba7d9e4aa01657687ebfa60ad77e7eef71d81a57c50c66b884de7d5b84ce
MD5 072f8f3cc254929b3795bc5bbf2d64f5
BLAKE2b-256 a7483e267d8c86b5f505d2b8de7b8f239f154887e0387643deca57bce69b53b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page