Skip to main content

BOExplain

Project description

BOExplain, Explaining Inference Queries with Bayesian Optimization

BOExplain is a library for explaining inference queries with Bayesian optimization. The corresponding paper can be found at https://arxiv.org/abs/2102.05308.

Installation

pip install boexplain

Documentation

The documentation is available at https://sfu-db.github.io/BOExplain/. (shortcut to fmin, fmax)

Getting Started

Derive an explanation for why the predicted rate of having an income over $50K is higher for men compared to women in the UCI ML Adult dataset.

  1. Load the data and prepare it for ML.
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

df = pd.read_csv("adult.data",
                 names=[
                     "Age", "Workclass", "fnlwgt", "Education",
                     "Education-Num", "Marital Status", "Occupation",
                     "Relationship", "Race", "Gender", "Capital Gain",
                     "Capital Loss", "Hours per week", "Country", "Income"
                 ],
                 na_values=" ?")

df['Income'].replace({" <=50K": 0, ' >50K': 1}, inplace=True)
df['Gender'].replace({" Male": 0, ' Female': 1}, inplace=True)
df = pd.get_dummies(df)

train, test = train_test_split(df, test_size=0.2)
test = test.drop(columns='Income')
  1. Define the objective function that trains a random forest classifier and queries the ratio of predicted rates of having an income over $50K between men and women.
def obj(train_filtered):
    rf = RandomForestClassifier(n_estimators=13, random_state=0)
    rf.fit(train_filtered.drop(columns='Income'), train_filtered['Income'])
    test["prediction"] = rf.predict(test)
    rates = test.groupby("Gender")["prediction"].sum() / test.groupby("Gender")["prediction"].size()
    test.drop(columns='prediction', inplace=True)
    return rates[0] / rates[1]
  1. Use the function fmin to minimize the objective function.
from boexplain import fmin

train_filtered = fmin(
    data=train,
    f=obj,
    columns=["Age", "Education-Num"],
    runtime=30,
)

Reproduce the Experiments

To reproduce the experiments, you can clone the repo and create a poetry environment (install Poetry). Run

poetry install

To setup the poetry environment a for jupyter notebook, run

poetry run ipython kernel install --name=boexplain

An ipython kernel has been created for this environemnt.

Adult Experiment

To reproduce the results of the Adult experiment and recreate Figure 6, follow the instruction in adult.ipynb.

Credit Experiment

To reproduce the results of the Credit experiment and recreate Figure 8, follow the instruction in credit.ipynb.

House Experiment

To reproduce the results of the House experiment and recreate Figure 7, follow the instruction in house.ipynb.

Scorpion Synthetic Data Experiment

To reproduce the results of the experiment with Scorpion's synthetic data and corresponding query, and recreate Figure 4, follow the instruction in scorpion.ipynb.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

boexplain-0.1.1.tar.gz (255.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

boexplain-0.1.1-py3-none-any.whl (126.8 kB view details)

Uploaded Python 3

File details

Details for the file boexplain-0.1.1.tar.gz.

File metadata

  • Download URL: boexplain-0.1.1.tar.gz
  • Upload date:
  • Size: 255.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.0 CPython/3.9.1 Darwin/20.2.0

File hashes

Hashes for boexplain-0.1.1.tar.gz
Algorithm Hash digest
SHA256 caedfb118b6ec7376cc7700e81dbb8d4ee48701c8a112642efb89550090ebe9d
MD5 a7e9ee984d23d7b82158f19ce1e20d3e
BLAKE2b-256 f94795bf6d83dd9cad2f0eab9d83e94f5babc784a87455b18aa06d37e25d3f80

See more details on using hashes here.

File details

Details for the file boexplain-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: boexplain-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 126.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.0 CPython/3.9.1 Darwin/20.2.0

File hashes

Hashes for boexplain-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ebd16e7c7002303c348ebee628bd08e483711712ce461405ada10c4c279ed796
MD5 46731c2a6d6c7dfc925300a4ac430d4d
BLAKE2b-256 cfaba7587af8a5ca7c89da3f7781f146a5a88b6fbb3cb09e5f03cf44bd689f6b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page