Skip to main content

A Python package for visually auditing machine learning models for fairness through the use of interpretable visualizations summarizing intersectional bias.

Project description

Visual Auditor

License: MIT

An interactive visualization system for identifying and understanding biases in machine learning models.

Working Demo

A live demo of the Visual Auditor web application is available at the following link: https://visual-auditor.surge.sh/

Installation

import visual_auditor

Demo Usage

# Import additional packages
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
# Helper function for binning numerical features
def bin_feature(feature):
    bins = np.histogram_bin_edges(adult_data[feature], bins=10, range=None, weights=None)
    adult_data[feature] = pd.cut(adult_data[feature], bins, labels=[x for x in range(len(bins) - 1)], right=True, include_lowest=True, duplicates='drop')
    intervals = []
    for i in range(len(bins) - 1):
        intervals.append(f' {int(bins[i])} - {int(bins[i+1])}')
    return intervals
# Load Adult dataset
adult_data = pd.read_csv(
    "data/adult.data",
    names=[
        "Age", "Workclass", "Final Weight", "Education", "Education-Num", "Marital Status",
        "Occupation", "Relationship", "Race", "Sex", "Capital Gain", "Capital Loss",
        "Hours Per Week", "Country", "Target"],
        sep=r'\s*,\s*',
        engine='python',
        na_values="?")

# Drop NA values
adult_data = adult_data.dropna()

# Drop irrelevant fields
adult_data = adult_data.drop(columns=['Final Weight', 'Education-Num'])

# Bin numerical features
encoders = {}
encodings = {}
numerical_features = ["Age", "Capital Gain", "Capital Loss", "Hours Per Week"]
for feature in numerical_features:
    encodings[feature] = bin_feature(feature)

# Encode categorical features
for column in adult_data.columns.difference(numerical_features):
    if adult_data.dtypes[column] == np.object:
        le = LabelEncoder()
        adult_data[column] = le.fit_transform(adult_data[column])
        encoders[column] = le
        encodings[column] = le.classes_
        print(column, le.classes_, le.transform(le.classes_))

# Separate Target values
X, y = adult_data[adult_data.columns.difference(["Target"])], adult_data["Target"]

# Train a classifier
classifier = RandomForestClassifier(max_depth=5, n_estimators=10)
classifier.fit(X, y)
# Interact with the Visual Auditor
visual_auditor.find_slices_and_visualize(classifier, (X, y))

Credits

The Visual Auditor was developed and maintained by David Munechika, Jay Wang, and Polo Chau from the Polo Club of Data Science at Georgia Tech.

License

The Visual Auditor is available under the MIT License. The Visual Auditor uses the D3.js which is licensed under the ISC License and React.js which is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visual_auditor-0.1.3.tar.gz (15.0 MB view hashes)

Uploaded Source

Built Distribution

visual_auditor-0.1.3-py2.py3-none-any.whl (15.3 MB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page