Skip to main content

Comprehensive AI Ethics and Bias Detection Toolkit with SAP Integration

Project description

๐Ÿง  Fairsight Toolkit

Comprehensive AI Ethics and Bias Detection Toolkit with SAP Integration

Python 3.8+ License: MIT SAP HANA

Fairsight is a production-ready Python toolkit for detecting bias, ensuring fairness, and maintaining ethical standards in machine learning models and datasets. Built with enterprise integration in mind, it features seamless SAP HANA Cloud and SAP Analytics Cloud connectivity.

๐ŸŒŸ Key Features

  • ๐Ÿ” Comprehensive Bias Detection: Statistical parity, disparate impact, equal opportunity, and more
  • โš–๏ธ Fairness Metrics: Demographic parity, equalized odds, predictive parity
  • ๐Ÿ”ฎ Model Explainability: SHAP and LIME integration for interpretable AI
  • ๐Ÿ“Š Enterprise Integration: Native SAP HANA Cloud and SAP Analytics Cloud support
  • ๐Ÿ“‹ Justified Attributes: Smart handling of business-justified discriminatory features
  • ๐Ÿš€ Easy to Use: Simple API for both datasets and trained models
  • ๐Ÿ“ˆ Automated Reporting: Beautiful, actionable audit reports
  • ๐Ÿข Production Ready: Enterprise-grade logging, error handling, and scalability

๐Ÿ› ๏ธ Installation

Basic Installation

pip install fairsight

With SAP Integration

pip install fairsight[sap]

Development Installation

git clone https://github.com/vijayk/fairsight.git
cd fairsight
pip install -e .[dev,sap]

๐Ÿš€ Quick Start

Basic Dataset Audit

from fairsight import FSAuditor

# Simple dataset audit
auditor = FSAuditor(
    dataset="data/hiring_data.csv",
    sensitive_features=["gender", "race"],
    target="hired",
    justified_attributes=["experience_years"]  # Job-relevant factors
)

results = auditor.run_audit()
print(f"Ethical Score: {results['ethical_score']}/100")

Model + Dataset Audit

from fairsight import FSAuditor
from sklearn.ensemble import RandomForestClassifier

# Train your model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Comprehensive audit
auditor = FSAuditor(
    dataset="data/loan_data.csv",
    model=model,
    sensitive_features=["gender", "race", "age"],
    target="loan_approved",
    justified_attributes=["credit_score", "income"],  # Financially relevant
    fairness_threshold=0.8
)

# Run complete audit
audit_results = auditor.run_audit()

# Export results
auditor.export_results("audit_report.json")

Handling "Justified" Attributes

The key innovation of Fairsight is handling justified attributes - features that may appear discriminatory but are business-justified:

# Example: House loan approval
auditor = FSAuditor(
    dataset="house_loans.csv",
    sensitive_features=["gender", "race", "job"],  
    justified_attributes=["job"],  # Job status is legally justified for loans
    target="approved"
)

results = auditor.run_audit()

# Job-related disparities won't be flagged as bias
# Gender/race disparities will still be detected

๐Ÿš€ Quick Start: One-liner Wrappers

Fairsight provides convenient wrapper functions for the most common bias and fairness analysis tasks. These wrappers let you run a full analysis in a single line of code.

Dataset Bias Detection (Wrapper)

from fairsight import detect_dataset_bias
import pandas as pd

df = pd.read_csv('data.csv')
results = detect_dataset_bias(df, protected_attributes=['gender', 'race'], target_column='outcome')
for r in results:
    print(r)

Output:

BiasResult(gender.Disparate Impact: 0.82 [FAIR])
BiasResult(gender.Statistical Parity Difference: 0.05 [FAIR])
BiasResult(race.Disparate Impact: 0.76 [BIASED])
BiasResult(race.Statistical Parity Difference: 0.18 [BIASED])

Model Bias Detection (Wrapper)

from fairsight import detect_model_bias
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

df = pd.read_csv('data.csv')
model = RandomForestClassifier().fit(df.drop('outcome', axis=1), df['outcome'])
results = detect_model_bias(model, df, protected_attributes=['gender'], target_column='outcome')
for r in results:
    print(r)

Full Dataset Audit (Wrapper)

from fairsight import audit_dataset
import pandas as pd

df = pd.read_csv('data.csv')
results = audit_dataset(df, protected_attributes=['gender'], target_column='outcome')
print(results['bias_detection'])
print(results['fairness_metrics'])

Full Model Audit (Wrapper)

from fairsight import audit_model
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

df = pd.read_csv('data.csv')
X = df.drop('outcome', axis=1)
y = df['outcome']
model = RandomForestClassifier().fit(X, y)
results = audit_model(model, X, y, protected_attributes=['gender'])
print(results['bias_detection'])
print(results['fairness_metrics'])

๐Ÿ—๏ธ Architecture

fairsight/
โ”œโ”€โ”€ __init__.py              # Main package exports  
โ”œโ”€โ”€ auditor.py              # FSAuditor main class
โ”œโ”€โ”€ bias_detection.py       # Enhanced bias detection with justified attributes
โ”œโ”€โ”€ dataset_audit.py        # Comprehensive dataset auditing
โ”œโ”€โ”€ model_audit.py          # Model performance and bias auditing  
โ”œโ”€โ”€ explainability.py       # SHAP/LIME model explanations
โ”œโ”€โ”€ fairness_metrics.py     # Fairness metric computations
โ”œโ”€โ”€ report_generator.py     # Automated report generation
โ”œโ”€โ”€ dashboard_push.py       # SAP HANA Cloud integration
โ””โ”€โ”€ utils.py               # Utility functions

๐Ÿ“Š SAP Integration

SAP HANA Cloud Setup

from fairsight import Dashboard

# Configure SAP HANA connection
dashboard = Dashboard({
    "host": "your-hana-instance.hanacloud.ondemand.com",
    "port": 443,
    "user": "DBADMIN", 
    "password": "your_password",
    "encrypt": True
})

# Audit results automatically pushed to HANA
auditor = FSAuditor(
    dataset="data.csv",
    sensitive_features=["gender"],
    enable_sap_integration=True
)

results = auditor.run_audit()  # Automatically pushes to HANA

SAP Analytics Cloud Dashboard

# Generate SAP Analytics Cloud configuration
dashboard_config = dashboard.create_sac_dashboard_config()

# Use this configuration to set up your SAC dashboard
print(dashboard_config)

๐Ÿ” Comprehensive Example

import pandas as pd
from fairsight import FSAuditor
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Load data
df = pd.read_csv("hiring_dataset.csv")

# Define protected and justified attributes
protected_attrs = ["gender", "race", "age"]
justified_attrs = ["years_experience", "education_level"]  # Job-relevant

# Split data
X = df.drop("hired", axis=1)
y = df["hired"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Comprehensive audit
auditor = FSAuditor(
    model=model,
    X_test=X_test,
    y_test=y_test,
    sensitive_features=protected_attrs,
    justified_attributes=justified_attrs,
    fairness_threshold=0.8,
    enable_sap_integration=True
)

# Run audit with all components
results = auditor.run_audit(
    include_dataset=True,
    include_model=True,
    include_bias_detection=True,
    generate_report=True,
    push_to_dashboard=True
)

# Print summary
print("=" * 50)
print(f"๐Ÿ† ETHICAL SCORE: {results['ethical_score']}/100")
print(f"๐Ÿ“Š OVERALL ASSESSMENT: {results['executive_summary']['overall_assessment']}")
print("=" * 50)

# Key findings
for finding in results['executive_summary']['key_findings']:
    print(f"โœ… {finding}")

# Critical issues  
for issue in results['executive_summary']['critical_issues']:
    print(f"๐Ÿšจ {issue}")

# Recommendations
for rec in results['executive_summary']['recommendations']:
    print(f"๐Ÿ’ก {rec}")

# Export detailed results
auditor.export_results("detailed_audit_results.json")

# View audit history
history = auditor.get_audit_history(limit=5)
print(history)

๐Ÿ“‹ Key Metrics

Bias Detection Metrics

  • Disparate Impact: 80% rule compliance
  • Statistical Parity Difference: Difference in positive rates
  • Equal Opportunity Difference: Difference in TPR across groups
  • Predictive Parity: Difference in precision across groups
  • Equalized Odds: Both TPR and FPR differences

Fairness Metrics

  • Demographic Parity: Equal positive prediction rates
  • Equal Opportunity: Equal TPR for qualified individuals
  • Predictive Equality: Equal FPR across groups
  • Overall Accuracy Equality: Equal accuracy across groups

๐ŸŽฏ Use Cases

1. Hiring & Recruitment

# Audit hiring algorithms
auditor = FSAuditor(
    dataset="hiring_data.csv",
    sensitive_features=["gender", "race", "age"],
    justified_attributes=["experience", "education", "skills_score"],
    target="hired"
)

2. Financial Services

# Audit loan approval models
auditor = FSAuditor(
    model=loan_model,
    sensitive_features=["gender", "race", "marital_status"], 
    justified_attributes=["credit_score", "income", "debt_ratio"],
    target="loan_approved"
)

3. Healthcare

# Audit medical diagnosis systems
auditor = FSAuditor(
    model=diagnosis_model,
    sensitive_features=["gender", "race", "age"],
    justified_attributes=["symptoms", "medical_history", "test_results"],
    target="diagnosis"
)

๐Ÿ“Š Example Output

๐Ÿง  AI Fairness & Bias Audit Report
===================================

**Ethical Score**: 87/100

๐Ÿ” Attribute-wise Bias Analysis
--------------------------------

โžค Gender
- Disparate Impact: 0.85
- Equal Opportunity Difference: 0.08  
- Statistical Parity Difference: 0.12
- **Interpretation**: Minor disparity detected, within acceptable range.

โžค Job (justified attribute)  
- Disparate Impact: 0.62
- Equal Opportunity Difference: 0.28
- **Interpretation**: This feature is justified for decision-making per business requirements.

๐Ÿ“Š Fairness Metric Gaps
------------------------

| Attribute | Precision Gap | Recall Gap | F1 Score Gap |
|-----------|---------------|-----------|-------------|
| Gender    | 0.05          | 0.07      | 0.06        |
| Job       | 0.15          | 0.18      | 0.16        |

๐Ÿ“Œ Final Ethical Assessment
----------------------------

โœ… The model demonstrates strong ethical integrity with low bias across protected groups.

๐Ÿ“‹ Note: job is marked as a justified attribute and disparities here are acceptable per business configuration.

๐Ÿ”ง Advanced Configuration

Custom Fairness Thresholds

auditor = FSAuditor(
    dataset="data.csv",
    fairness_threshold=0.85,  # Stricter 85% rule
    sensitive_features=["gender", "race"]
)

Custom Privileged Groups

auditor = FSAuditor(
    dataset="data.csv",
    sensitive_features=["gender", "race"],
    privileged_groups={
        "gender": "male",      # Specify privileged group
        "race": "white"
    }
)

๐Ÿง‘โ€๐Ÿ’ป Advanced: Core Class Usage

For advanced users, Fairsight exposes all core classes for maximum flexibility and custom workflows.

BiasDetector (Direct Use)

from fairsight import BiasDetector
import pandas as pd

df = pd.read_csv('data.csv')
detector = BiasDetector(dataset=df, sensitive_features=['gender'], target='outcome')
results = detector.detect_bias_on_dataset()
for r in results:
    print(r)

DatasetAuditor (Direct Use)

from fairsight import DatasetAuditor
import pandas as pd

df = pd.read_csv('data.csv')
auditor = DatasetAuditor(dataset=df, protected_attributes=['gender'], target_column='outcome')
results = auditor.audit()
print(results['bias_detection'])
print(results['fairness_metrics'])

ModelAuditor (Direct Use)

from fairsight import ModelAuditor
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

df = pd.read_csv('data.csv')
X = df.drop('outcome', axis=1)
y = df['outcome']
model = RandomForestClassifier().fit(X, y)
auditor = ModelAuditor(model=model, X_test=X, y_test=y, protected_attributes=['gender'], target_column='outcome')
results = auditor.audit()
print(results['bias_detection'])
print(results['fairness_metrics'])

FairnessMetrics (Direct Use)

from fairsight import FairnessMetrics
import numpy as np

y_true = np.array([1, 0, 1, 0])
y_pred = np.array([1, 0, 0, 0])
protected = np.array([0, 1, 0, 1])
fm = FairnessMetrics(y_true, y_pred, protected_attr=protected, privileged_group=0)
print(fm.demographic_parity())
print(fm.equalized_odds())
print(fm.predictive_parity())

ExplainabilityEngine (Direct Use)

from fairsight import ExplainabilityEngine
from sklearn.linear_model import LogisticRegression
import pandas as pd

df = pd.read_csv('data.csv')
X = df.drop('outcome', axis=1)
y = df['outcome']
model = LogisticRegression().fit(X, y)
engine = ExplainabilityEngine(model=model, training_data=X, feature_names=list(X.columns))
shap_result = engine.explain_with_shap(X)
print(shap_result)

๐Ÿงฉ Standalone Utilities (Quick Use)

Fairsight exposes key utilities as standalone functions for maximum flexibility. You can use these independently of the main pipeline:

from fairsight import (
    explain_with_shap, explain_with_lime, detect_illegal_data,
    preprocess_data, calculate_privilege_groups, generate_html_report
)

# Preprocessing
df_processed, encoders = preprocess_data(df, target_column='outcome', protected_attributes=['gender'])

# Privilege group calculation
priv_groups = calculate_privilege_groups(df, ['gender'])

# Illegal data detection
illegal_results = detect_illegal_data(df)

# Explainability (SHAP & LIME)
shap_result = explain_with_shap(model, X, feature_names)
lime_result = explain_with_lime(model, X, feature_names)

# Quick HTML report
dummy_bias = {'gender': {'statistical_parity': 0.1}}
dummy_fairness = {'gender': {'demographic_parity': 0.12}}
report_path = generate_html_report(dummy_bias, dummy_fairness, model_name='DemoModel')
print(f"HTML report at: {report_path}")

๐Ÿ“š References & Citations

Algorithms & Metrics

  • Reweighing (Bias Mitigation):
    • Kamiran, F., & Calders, T. (2012). Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1), 1-33. Springer Link
  • Fairness Metrics:
    • Demographic Parity, Equalized Odds, Equal Opportunity, Predictive Parity, Disparate Impact, Statistical Parity, etc. are based on open academic literature, e.g.:
      • Hardt, M., Price, E., & Srebro, N. (2016). Equality of Opportunity in Supervised Learning. NeurIPS. arXiv
      • Feldman, M., et al. (2015). Certifying and removing disparate impact. KDD. arXiv
      • Barocas, S., Hardt, M., & Narayanan, A. (2019). Fairness and Machine Learning. fairmlbook.org
  • Explainability:
    • SHAP: Lundberg, S. M., & Lee, S.-I. (2017). A Unified Approach to Interpreting Model Predictions. NeurIPS. arXiv
    • LIME: Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why Should I Trust You?": Explaining the Predictions of Any Classifier. KDD. arXiv
  • Generalized Entropy Index:
    • Speicher, T., et al. (2018). A Unified Approach to Quantifying Algorithmic Unfairness: Measuring Individual & Group Unfairness via Inequality Indices. KDD. arXiv

Libraries Used

  • scikit-learn (BSD-3-Clause License): Machine learning models and utilities
  • pandas (BSD-3-Clause License): Data processing
  • numpy (BSD License): Numerical computing
  • SHAP (MIT License): Model explainability
  • LIME (MIT License): Model explainability
  • matplotlib, seaborn (matplotlib: PSF License, seaborn: BSD): Visualization

All algorithms and metrics are implemented based on open academic literature and open-source libraries. No proprietary or closed-source code is used.


๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • SAP HANA Cloud for enterprise data integration
  • SHAP & LIME for model explainability
  • scikit-learn for machine learning utilities
  • pandas & numpy for data processing

๐Ÿ“ž Support


Made with โค๏ธ for Ethical AI

Fairsight Toolkit - Making AI Fair, Transparent, and Accountable

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fairsight-1.0.0.tar.gz (67.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fairsight-1.0.0-py3-none-any.whl (80.9 MB view details)

Uploaded Python 3

File details

Details for the file fairsight-1.0.0.tar.gz.

File metadata

  • Download URL: fairsight-1.0.0.tar.gz
  • Upload date:
  • Size: 67.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for fairsight-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c3183034aa8e96331790ab843970843421e2a9d980fed79dc72877f443d96ad8
MD5 5cd6de098f7e0d1918003dc95a848170
BLAKE2b-256 181d90e22d5567965131c920a381fab38e1425f9ee67b5f1bf5c72f8fdde2f38

See more details on using hashes here.

File details

Details for the file fairsight-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: fairsight-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 80.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for fairsight-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d54393e4166b99f01883f36090bd77ccd9d24a5edda771e94e38d9c759e6eabf
MD5 538db78d8992809e617607c887cc0be9
BLAKE2b-256 359c84cc1930cb01120e7c23eecac1c2b313d3fc888d4e8f04ece6de2d929b3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page