Skip to main content

An open-source Python library designed for developers to calculate fairness metrics and assess bias in machine learning models.

Project description

Eticas: Bias & Audit Framework

An open-source Python library designed for developers to calculate fairness metrics and assess bias in machine learning models. This library provides a comprehensive set of tools to ensure transparency, accountability, and ethical AI development.

GitHub release

Linting Status

Unit Tests

License

Why Use This Library?

AI System can inherit biases from data or amplify them during decision-making processes. This library helps ensure transparency and accountability by providing actionable insights to improve fairness in AI systems.

๐Ÿš€ Key Features

This framework is designed to audit AI systems comprehensively across all stages of their lifecycle. At its core, it focuses on comparing privileged and underprivileged groups, ensuring a fair evaluation of model behavior.

With a wide range of metrics, this framework is a game-changer in bias monitoring. It offers a deep perspective on fairness, allowing for comprehensive reporting even without relying on true labels. The only restriction for measuring bias in production lies in performance metrics, as they are directly tied to true labels.

The stages considered are the following:

  1. The dataset used to train the model.
  2. The dataset used in production.
  3. A dataset containing the systemโ€™s final decisions, which may include human intervention or another model.

Flow calculate bias

  • Demographic Benchmarking Monitoring: Perform in-depth analysis of population distribution.
  • Model Fairness Monitoring: Ensure equality and detect equity issues in decision-making.
  • Features Distribution Evaluation: Analyze correlations, causality, and variable importance.
  • Performance Analysis: Metrics to assess model performance, accuracy, and recall.
  • Model Drift Monitoring: Detect and measure changes in data distributions and model behavior over time.

๐ŸŒŸ ITACA: Monitoring & Auditing Platform ๐ŸŒŸ

๐ŸŸก Unlock the full potential of Eticas by upgrading to our subscription model! With ITACA, our powerful SaaS platform, you can monitor every stage of your modelโ€™s lifecycle seamlessly. Easily integrate ITACA into your workflows with our library and APIโ€”start optimizing your models today!

  • Audit Subscription ๐Ÿ”Ž: Stay compliant with major regulations and laws on bias and fairness.

Learn more about our platform at ๐Ÿ”— ITACA โ€“ Monitoring & Auditing Platform.

| |

COMING SOON ๐ŸŽ‰

  • Developer Subscription ๐Ÿ› ๏ธ: Connect to ITACA to monitor your models.

โš–๏ธ Metrics

Group Metric Label needed? Description
fairness d_equality no Analyze whether the systemโ€™s disparities occur because the model does not treat all groups equally.
fairness d_equity no Analyze whether the systemโ€™s disparities arise because some groups have unique characteristics and may need a boost.
fairness d_parity no Calculate the ratio of Selection Rates (Disparate Impact, DI). It represents the chance of success.
fairness d_statisticalparity no Calculate the difference in Selection Rates (Statistical Parity Difference, SPD). It measures the gap in success rates.
fairness d_calibrated_false yes Evaluate the calibration for negative outcomes across groups.
fairness d_calibrated_true yes Evaluate the calibration for positive outcomes across groups.
fairness d_equalodds_false yes Check whether false outcomes are distributed equally among groups.
fairness d_equalodds_true yes Check whether true outcomes are distributed equally among groups.
Demographic Benchmarking da_inconsistency no Calculate the percentage of samples that belong to an underprivileged group.
Demographic Benchmarking da_positive no Calculate the percentage of samples that receive a positive outcome and belong to an underprivileged group.
Features Distribution da_informative no Determine if there is a proxy feature in the dataset, meaning some features act as a protected attribute.
Features Distribution dxa_inconsistency no Check if the protected attributes are highly related to the output.
Performance accuracy yes Calculate the proportion of correct predictions among all predictions.
Performance F1 yes Compute the harmonic mean of precision and recall.
Performance precision yes Compute the ratio of true positives to all predicted positives.
Performance recall yes Compute the ratio of true positives to all actual positives.
Performance poor_performance yes Calculate the accuracy against the representation of the largest class.
Drift Drift Train-Operational no Evaluate changes in data or model performance between training and operational phases.

Quick Install

  1. Clone this repository.
  2. In the root folder, run:
pip install .

Example Notebooks

Notebook Description
Audit AI System Check how use this library to audit AI System.

QuickStart Bias Auditing

Execute Audit

Define Senstive Attributes

Use a JSON object to define the sensitive attributes. You need to specify the columns where the attribute names and the underprivileged or privileged groups are defined. This definition can include a list to accommodate more than one group.

Sensitive attributes can be simple, for example sex or race. They can also be complexโ€”for instance, the intersection of sex and race.

{
    "sensitive_attributes": {
        "sex": {
            "columns": [
                {
                    "name": "sex",
                    "underprivileged": [2]
                }
            ],
            "type": "simple"
        },
        "ethnicity": {
            "columns": [
                {
                    "name": "ethnicity",
                    "privileged": [1]
                }
            ],
            "type": "simple"
        },
        "age": {
            "columns": [
                {
                    "name": "age",
                    "privileged": [3, 4]
                }
            ],
            "type": "simple"
        },
        "sex_ethnicity": {
            "groups": ["sex", "ethnicity"],
            "type": "complex"
        }
    }
}

Create Model

Initialize the model object that will be the focus of the audit. As important inputs, you need to define the sensitive attributes and specify which input features you want to analyze. It's not necessary to include all featuresโ€”only the most important or relevant ones.

import logging
logging.basicConfig(
    level=logging.INFO,
    format='[%(levelname)s] %(name)s - %(message)s'
)

from eticas.model.ml_model import MLModel
model = MLModel(
    model_name="ML Testing Regression",
    description="A logistic regression model to illustrate audits",
    country="USA",
    state="CA",
    sensitive_attributes=sensitive_attributes,
    features=["feature_0", "feature_1", "feature_2"]
)

Audit Labeled

This is how to define the audit for a labeled dataset. In general the dataset used to train the dataset.The required inputs are:

  1. dataset_path โ€“ path to the data,
  2. label_column โ€“ represents the true label,
  3. output_column โ€“ contains the modelโ€™s output,
  4. positive_output โ€“ A list of outputs considered positive.

You can also upload a label or output column with scoring, ranking, or recommendation values (continuous values). If the regression ordering is ascending, the positive output is interpreted as 1; if it is descending, it is interpreted as 0.

model.run_labeled_audit(dataset_path ='files/example_training_binary_2.csv', 
                        label_column = 'outcome', 
                        output_column = 'predicted_outcome',
                        positive_output = [1])

#json labeled results
model.labeled_results

Audit Production

This is how to define the audit for a production dataset. The required inputs are:

  1. dataset_path โ€“ path to the data,
  2. output_column โ€“ contains the modelโ€™s output,
  3. positive_output โ€“ A list of outputs considered positive.

You can also upload a label or output column with scoring, ranking, or recommendation values (continuous values). If the regression ordering is ascending, the positive output is interpreted as 1; if it is descending, it is interpreted as 0.

model.run_production_audit(dataset_path ='files/example_operational_binary_2.csv',
                           output_column = 'predicted_outcome',
                           positive_output = [1])

#json production results
model.production_results

Audit Impacted

This is how to define the audit for a production dataset. The required inputs are:

  1. dataset_path โ€“ path to the data,
  2. output_column โ€“ contains the modelโ€™s output,
  3. positive_output โ€“ A list of outputs considered positive.

You can also upload a label or output column with scoring, ranking, or recommendation values (continuous values). If the regression ordering is ascending, the positive output is interpreted as 1; if it is descending, it is interpreted as 0.

model.run_impacted_audit(dataset_path ='files/example_impact_binary_2.csv', 
                         output_column = 'recorded_outcome',
                         positive_output = [1])

#json impacted results
model.impacted_results

Audit Drift

model.run_drift_audit(dataset_path_dev = 'files/example_training_binary_2.csv', 
                      output_column_dev = 'outcome',
                      positive_output_dev = [1],
                      dataset_path_prod = 'files/example_operational_binary_2.csv', 
                      output_column_prod = 'predicted_outcome',
                      positive_output_prod = [1])

#json drift results
model.drift_results

Explore Results

The results can be exported in JSON or DataFrame format. Both options allow you to extract the information with or without normalization. Normalized values range from 0 to 100, where 0 represents a poor result and 100 represents a perfect value.

audit_result = model.df_results(norm_values=True)
audit_result = model.json_results(norm_values=True)

Metrics WITHOUT TRUE LABEL

Fairness

result = audit_result.xs(('fairness',), level=(0,))
result = result.reset_index()
result = result.pivot(
    index=['metric', 'attribute'],  
    columns='stage',             
    values='value'   
)

result
Metric Attribute 01-labeled 02-production 03-impact
d_equality age 99.0 โ–“โ–“โ–“โ–“โ–“โ–“ 99.0 โ–“โ–“โ–“โ–“โ–“โ–“ 100.0 โ–“โ–“โ–“โ–“โ–“โ–“
ethnicity 100.0 โ–“โ–“โ–“โ–“โ–“โ–“ 99.0 โ–“โ–“โ–“โ–“โ–“โ–“ 44.0 โ–“โ–“โ–“
sex 100.0 โ–“โ–“โ–“โ–“โ–“โ–“ 71.0 โ–“โ–“โ–“โ–“ 47.0 โ–“โ–“โ–“
sex_ethnicity 99.0 โ–“โ–“โ–“โ–“โ–“โ–“ 62.0 โ–“โ–“โ–“โ–“ 44.0 โ–“โ–“โ–“
d_equity age 100.0 โ–“โ–“โ–“โ–“โ–“โ–“ 99.0 โ–“โ–“โ–“โ–“โ–“โ–“ 100.0 โ–“โ–“โ–“โ–“โ–“โ–“
ethnicity 100.0 โ–“โ–“โ–“โ–“โ–“โ–“ 70.0 โ–“โ–“โ–“โ–“ 44.0 โ–“โ–“โ–“
sex 99.0 โ–“โ–“โ–“โ–“โ–“โ–“ 48.0 โ–“โ–“โ–“ 41.0 โ–“โ–“โ–“
sex_ethnicity 100.0 โ–“โ–“โ–“โ–“โ–“โ–“ 57.0 โ–“โ–“โ–“ 44.0 โ–“โ–“โ–“
d_parity age 98.0 โ–“โ–“โ–“โ–“โ–“โ–“ 98.0 โ–“โ–“โ–“โ–“โ–“โ–“ 99.0 โ–“โ–“โ–“โ–“โ–“โ–“
ethnicity 99.0 โ–“โ–“โ–“โ–“โ–“โ–“ 76.0 โ–“โ–“โ–“โ–“ 42.0 โ–“โ–“โ–“
sex 99.0 โ–“โ–“โ–“โ–“โ–“โ–“ 64.0 โ–“โ–“โ–“โ–“ 40.0 โ–“โ–“โ–“
sex_ethnicity 99.0 โ–“โ–“โ–“โ–“โ–“โ–“ 65.0 โ–“โ–“โ–“โ–“ 40.0 โ–“โ–“โ–“
d_statisticalparity age 98.0 โ–“โ–“โ–“โ–“โ–“โ–“ 98.0 โ–“โ–“โ–“โ–“โ–“โ–“ 98.0 โ–“โ–“โ–“โ–“โ–“โ–“
ethnicity 100.0 โ–“โ–“โ–“โ–“โ–“โ–“ 74.0 โ–“โ–“โ–“โ–“ 43.0 โ–“โ–“โ–“
sex 100.0 โ–“โ–“โ–“โ–“โ–“โ–“ 57.0 โ–“โ–“โ–“ 42.0 โ–“โ–“โ–“
sex_ethnicity 98.0 โ–“โ–“โ–“โ–“โ–“โ–“ 57.0 โ–“โ–“โ–“ 42.0 โ–“โ–“โ–“

Benchmarking

result = audit_result.xs(('benchmarking',), level=(0,))
result = result.reset_index()
result = result.pivot(
    index=['metric', 'attribute'],  
    columns='stage',             
    values='value'   
)

result
Metric Attribute 01-labeled 02-production 03-impact
da_inconsistency age 44.8 โ–“โ–“โ–“ 44.5 โ–“โ–“โ–“ 45.0 โ–“โ–“โ–“
ethnicity 40.0 โ–“โ–“โ–“ 20.0 โ–“โ–“ 10.0 โ–“
sex 60.0 โ–“โ–“โ–“โ–“ 30.0 โ–“โ–“ 15.0 โ–“
sex_ethnicity 25.0 โ–“โ–“ 15.0 โ–“ 10.0 โ–“
da_positive age 45.2 โ–“โ–“โ–“ 44.9 โ–“โ–“โ–“ 45.3 โ–“โ–“โ–“
ethnicity 39.8 โ–“โ–“โ–“ 15.8 โ–“ 3.7 โ–“
sex 59.8 โ–“โ–“โ–“โ–“ 19.7 โ–“โ–“ 5.7 โ–“
sex_ethnicity 24.7 โ–“โ–“ 10.0 โ–“ 3.7 โ–“

Distribution

result = audit_result.xs(('distribution',), level=(0,))
result = result.reset_index()
result = result.pivot(
    index=['metric', 'attribute'],  
    columns='stage',             
    values='value'   
)

result
Metric Attribute 01-labeled 02-production 03-impact
d_equality age 100.0 โ–“โ–“โ–“โ–“โ–“โ–“ 100.0 โ–“โ–“โ–“โ–“โ–“โ–“ 100.0 โ–“โ–“โ–“โ–“โ–“โ–“
ethnicity 100.0 โ–“โ–“โ–“โ–“โ–“โ–“ 100.0 โ–“โ–“โ–“โ–“โ–“โ–“ 72.0 โ–“โ–“โ–“โ–“
sex 99.0 โ–“โ–“โ–“โ–“โ–“โ–“ 90.0 โ–“โ–“โ–“โ–“โ–“โ–“ 60.0 โ–“โ–“โ–“โ–“
sex_ethnicity 100.0 โ–“โ–“โ–“โ–“โ–“โ–“ 90.0 โ–“โ–“โ–“โ–“โ–“โ–“ 60.0 โ–“โ–“โ–“โ–“
d_equity age 99.0 โ–“โ–“โ–“โ–“โ–“โ–“ 98.0 โ–“โ–“โ–“โ–“โ–“โ–“ 98.0 โ–“โ–“โ–“โ–“โ–“โ–“
ethnicity 99.0 โ–“โ–“โ–“โ–“โ–“โ–“ 89.0 โ–“โ–“โ–“โ–“โ–“โ–“ 77.0 โ–“โ–“โ–“โ–“
sex 99.0 โ–“โ–“โ–“โ–“โ–“โ–“ 77.0 โ–“โ–“โ–“โ–“โ–“โ–“ 70.0 โ–“โ–“โ–“โ–“
sex_ethnicity 98.0 โ–“โ–“โ–“โ–“โ–“โ–“ 86.0 โ–“โ–“โ–“โ–“โ–“โ–“ 77.0 โ–“โ–“โ–“โ–“

Drift

result = audit_result.xs(('drift',), level=(0,))
result = result.reset_index()
result = result.pivot(
    index=['metric', 'attribute'],  
    columns='stage',             
    values='value'   
)

result
Metric Attribute 02-production
drift age 1.17 โ–“
ethnicity 0.0 โ–“
overall 0.87 โ–“
sex 0.0 โ–“
sex_ethnicity 1.62 โ–“

Metrics WITH TRUE LABEL

Fairness

result = audit_result.xs(('fairness_label',), level=(0,))
result = result.reset_index()
result = result.pivot(
    index=['metric', 'attribute'],  
    columns='stage',             
    values='value'   
)

result
Metric Attribute 01-labeled
d_calibrated_false age 99.0 โ–“โ–“โ–“โ–“โ–“โ–“
ethnicity 97.0 โ–“โ–“โ–“โ–“โ–“โ–“
sex 98.0 โ–“โ–“โ–“โ–“โ–“โ–“
sex_ethnicity 99.0 โ–“โ–“โ–“โ–“โ–“โ–“
d_calibrated_true age 96.0 โ–“โ–“โ–“โ–“โ–“โ–“
ethnicity 98.0 โ–“โ–“โ–“โ–“โ–“โ–“
sex 96.0 โ–“โ–“โ–“โ–“โ–“โ–“
sex_ethnicity 95.0 โ–“โ–“โ–“โ–“โ–“โ–“
d_equalodds_false age 99.0 โ–“โ–“โ–“โ–“โ–“โ–“
ethnicity 97.0 โ–“โ–“โ–“โ–“โ–“โ–“
sex 98.0 โ–“โ–“โ–“โ–“โ–“โ–“
sex_ethnicity 99.0 โ–“โ–“โ–“โ–“โ–“โ–“
d_equalodds_true age 96.0 โ–“โ–“โ–“โ–“โ–“โ–“
ethnicity 98.0 โ–“โ–“โ–“โ–“โ–“โ–“
sex 96.0 โ–“โ–“โ–“โ–“โ–“โ–“
sex_ethnicity 95.0 โ–“โ–“โ–“โ–“โ–“โ–“

Performance

result = audit_result.xs(('performance',), level=(0,))
result = result.reset_index()
result = result.pivot(
    index=['metric', 'attribute'],  
    columns='stage',             
    values='value'   
)

result
Metric Attribute 01-labeled
FN age 1151
ethnicity 987
sex 1475
sex_ethnicity 619
FP age 1146
ethnicity 1027
sex 1565
sex_ethnicity 653
TN age 1060
ethnicity 973
sex 1428
sex_ethnicity 605
TP age 1121
ethnicity 1013
sex 1532
sex_ethnicity 623
accuracy age 48.7 โ–“โ–“
ethnicity 49.65 โ–“โ–“
sex 49.33 โ–“โ–“
sex_ethnicity 49.12 โ–“โ–“
f1 age 49.39 โ–“โ–“
ethnicity 50.15 โ–“โ–“
sex 50.2 โ–“โ–“
sex_ethnicity 49.48 โ–“โ–“
poor_performance age 57.58 โ–“โ–“
ethnicity 59.58 โ–“โ–“
sex 59.05 โ–“โ–“
sex_ethnicity 58.57 โ–“โ–“
precision age 49.45 โ–“โ–“
ethnicity 49.66 โ–“โ–“
sex 49.47 โ–“โ–“
sex_ethnicity 48.82 โ–“โ–“
recall age 49.34 โ–“โ–“
ethnicity 50.65 โ–“โ–“
sex 50.95 โ–“โ–“
sex_ethnicity 50.16 โ–“โ–“

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eticas_audit-0.1.0.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eticas_audit-0.1.0-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file eticas_audit-0.1.0.tar.gz.

File metadata

  • Download URL: eticas_audit-0.1.0.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for eticas_audit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8067d190b80d57016901d6d83960d0bb67e9374c08895c0703c0768693268b74
MD5 ccf35ad13b6d8dd7169e981c795e10ed
BLAKE2b-256 a2d1ab0584be2b2c3997725fe5ab2aaa084619c0d166c7c62af62958db49d57f

See more details on using hashes here.

File details

Details for the file eticas_audit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: eticas_audit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for eticas_audit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7af493e29fab9310abf1bd3f631ab21cb6cd696d6b39a5f3e0d08193893cb713
MD5 fc21d268a9120395a16b6ca1c0acdb89
BLAKE2b-256 8400bd53035d70a806e88389007fea146a490d5ccbbd135ea1a60512c331be86

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page