Comprehensive AI Ethics and Bias Detection Toolkit with SAP Integration
Project description
๐ง Fairsight Toolkit
Comprehensive AI Ethics and Bias Detection Toolkit with SAP Integration
Fairsight is a production-ready Python toolkit for detecting bias, ensuring fairness, and maintaining ethical standards in machine learning models and datasets. Built with enterprise integration in mind, it features seamless SAP HANA Cloud and SAP Analytics Cloud connectivity.
๐ Key Features
- ๐ Comprehensive Bias Detection: Statistical parity, disparate impact, equal opportunity, and more
- โ๏ธ Fairness Metrics: Demographic parity, equalized odds, predictive parity
- ๐ฎ Model Explainability: SHAP and LIME integration for interpretable AI
- ๐ Enterprise Integration: Native SAP HANA Cloud and SAP Analytics Cloud support
- ๐ Justified Attributes: Smart handling of business-justified discriminatory features
- ๐ Easy to Use: Simple API for both datasets and trained models
- ๐ Automated Reporting: Beautiful, actionable audit reports
- ๐ข Production Ready: Enterprise-grade logging, error handling, and scalability
๐ ๏ธ Installation
Basic Installation
pip install fairsight
With SAP Integration
pip install fairsight[sap]
Development Installation
git clone https://github.com/vijayk/fairsight.git
cd fairsight
pip install -e .[dev,sap]
๐ Quick Start
Basic Dataset Audit
from fairsight import FSAuditor
# Simple dataset audit
auditor = FSAuditor(
dataset="data/hiring_data.csv",
sensitive_features=["gender", "race"],
target="hired",
justified_attributes=["experience_years"] # Job-relevant factors
)
results = auditor.run_audit()
print(f"Ethical Score: {results['ethical_score']}/100")
Model + Dataset Audit
from fairsight import FSAuditor
from sklearn.ensemble import RandomForestClassifier
# Train your model
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Comprehensive audit
auditor = FSAuditor(
dataset="data/loan_data.csv",
model=model,
sensitive_features=["gender", "race", "age"],
target="loan_approved",
justified_attributes=["credit_score", "income"], # Financially relevant
fairness_threshold=0.8
)
# Run complete audit
audit_results = auditor.run_audit()
# Export results
auditor.export_results("audit_report.json")
Handling "Justified" Attributes
The key innovation of Fairsight is handling justified attributes - features that may appear discriminatory but are business-justified:
# Example: House loan approval
auditor = FSAuditor(
dataset="house_loans.csv",
sensitive_features=["gender", "race", "job"],
justified_attributes=["job"], # Job status is legally justified for loans
target="approved"
)
results = auditor.run_audit()
# Job-related disparities won't be flagged as bias
# Gender/race disparities will still be detected
๐ Quick Start: One-liner Wrappers
Fairsight provides convenient wrapper functions for the most common bias and fairness analysis tasks. These wrappers let you run a full analysis in a single line of code.
Dataset Bias Detection (Wrapper)
from fairsight import detect_dataset_bias
import pandas as pd
df = pd.read_csv('data.csv')
results = detect_dataset_bias(df, protected_attributes=['gender', 'race'], target_column='outcome')
for r in results:
print(r)
Output:
BiasResult(gender.Disparate Impact: 0.82 [FAIR])
BiasResult(gender.Statistical Parity Difference: 0.05 [FAIR])
BiasResult(race.Disparate Impact: 0.76 [BIASED])
BiasResult(race.Statistical Parity Difference: 0.18 [BIASED])
Model Bias Detection (Wrapper)
from fairsight import detect_model_bias
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
df = pd.read_csv('data.csv')
model = RandomForestClassifier().fit(df.drop('outcome', axis=1), df['outcome'])
results = detect_model_bias(model, df, protected_attributes=['gender'], target_column='outcome')
for r in results:
print(r)
Full Dataset Audit (Wrapper)
from fairsight import audit_dataset
import pandas as pd
df = pd.read_csv('data.csv')
results = audit_dataset(df, protected_attributes=['gender'], target_column='outcome')
print(results['bias_detection'])
print(results['fairness_metrics'])
Full Model Audit (Wrapper)
from fairsight import audit_model
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
df = pd.read_csv('data.csv')
X = df.drop('outcome', axis=1)
y = df['outcome']
model = RandomForestClassifier().fit(X, y)
results = audit_model(model, X, y, protected_attributes=['gender'])
print(results['bias_detection'])
print(results['fairness_metrics'])
๐๏ธ Architecture
fairsight/
โโโ __init__.py # Main package exports
โโโ auditor.py # FSAuditor main class
โโโ bias_detection.py # Enhanced bias detection with justified attributes
โโโ dataset_audit.py # Comprehensive dataset auditing
โโโ model_audit.py # Model performance and bias auditing
โโโ explainability.py # SHAP/LIME model explanations
โโโ fairness_metrics.py # Fairness metric computations
โโโ report_generator.py # Automated report generation
โโโ dashboard_push.py # SAP HANA Cloud integration
โโโ utils.py # Utility functions
๐ SAP Integration
SAP HANA Cloud Setup
from fairsight import Dashboard
# Configure SAP HANA connection
dashboard = Dashboard({
"host": "your-hana-instance.hanacloud.ondemand.com",
"port": 443,
"user": "DBADMIN",
"password": "your_password",
"encrypt": True
})
# Audit results automatically pushed to HANA
auditor = FSAuditor(
dataset="data.csv",
sensitive_features=["gender"],
enable_sap_integration=True
)
results = auditor.run_audit() # Automatically pushes to HANA
SAP Analytics Cloud Dashboard
# Generate SAP Analytics Cloud configuration
dashboard_config = dashboard.create_sac_dashboard_config()
# Use this configuration to set up your SAC dashboard
print(dashboard_config)
๐ Comprehensive Example
import pandas as pd
from fairsight import FSAuditor
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# Load data
df = pd.read_csv("hiring_dataset.csv")
# Define protected and justified attributes
protected_attrs = ["gender", "race", "age"]
justified_attrs = ["years_experience", "education_level"] # Job-relevant
# Split data
X = df.drop("hired", axis=1)
y = df["hired"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Comprehensive audit
auditor = FSAuditor(
model=model,
X_test=X_test,
y_test=y_test,
sensitive_features=protected_attrs,
justified_attributes=justified_attrs,
fairness_threshold=0.8,
enable_sap_integration=True
)
# Run audit with all components
results = auditor.run_audit(
include_dataset=True,
include_model=True,
include_bias_detection=True,
generate_report=True,
push_to_dashboard=True
)
# Print summary
print("=" * 50)
print(f"๐ ETHICAL SCORE: {results['ethical_score']}/100")
print(f"๐ OVERALL ASSESSMENT: {results['executive_summary']['overall_assessment']}")
print("=" * 50)
# Key findings
for finding in results['executive_summary']['key_findings']:
print(f"โ
{finding}")
# Critical issues
for issue in results['executive_summary']['critical_issues']:
print(f"๐จ {issue}")
# Recommendations
for rec in results['executive_summary']['recommendations']:
print(f"๐ก {rec}")
# Export detailed results
auditor.export_results("detailed_audit_results.json")
# View audit history
history = auditor.get_audit_history(limit=5)
print(history)
๐ Key Metrics
Bias Detection Metrics
- Disparate Impact: 80% rule compliance
- Statistical Parity Difference: Difference in positive rates
- Equal Opportunity Difference: Difference in TPR across groups
- Predictive Parity: Difference in precision across groups
- Equalized Odds: Both TPR and FPR differences
Fairness Metrics
- Demographic Parity: Equal positive prediction rates
- Equal Opportunity: Equal TPR for qualified individuals
- Predictive Equality: Equal FPR across groups
- Overall Accuracy Equality: Equal accuracy across groups
๐ฏ Use Cases
1. Hiring & Recruitment
# Audit hiring algorithms
auditor = FSAuditor(
dataset="hiring_data.csv",
sensitive_features=["gender", "race", "age"],
justified_attributes=["experience", "education", "skills_score"],
target="hired"
)
2. Financial Services
# Audit loan approval models
auditor = FSAuditor(
model=loan_model,
sensitive_features=["gender", "race", "marital_status"],
justified_attributes=["credit_score", "income", "debt_ratio"],
target="loan_approved"
)
3. Healthcare
# Audit medical diagnosis systems
auditor = FSAuditor(
model=diagnosis_model,
sensitive_features=["gender", "race", "age"],
justified_attributes=["symptoms", "medical_history", "test_results"],
target="diagnosis"
)
๐ Example Output
๐ง AI Fairness & Bias Audit Report
===================================
**Ethical Score**: 87/100
๐ Attribute-wise Bias Analysis
--------------------------------
โค Gender
- Disparate Impact: 0.85
- Equal Opportunity Difference: 0.08
- Statistical Parity Difference: 0.12
- **Interpretation**: Minor disparity detected, within acceptable range.
โค Job (justified attribute)
- Disparate Impact: 0.62
- Equal Opportunity Difference: 0.28
- **Interpretation**: This feature is justified for decision-making per business requirements.
๐ Fairness Metric Gaps
------------------------
| Attribute | Precision Gap | Recall Gap | F1 Score Gap |
|-----------|---------------|-----------|-------------|
| Gender | 0.05 | 0.07 | 0.06 |
| Job | 0.15 | 0.18 | 0.16 |
๐ Final Ethical Assessment
----------------------------
โ
The model demonstrates strong ethical integrity with low bias across protected groups.
๐ Note: job is marked as a justified attribute and disparities here are acceptable per business configuration.
๐ง Advanced Configuration
Custom Fairness Thresholds
auditor = FSAuditor(
dataset="data.csv",
fairness_threshold=0.85, # Stricter 85% rule
sensitive_features=["gender", "race"]
)
Custom Privileged Groups
auditor = FSAuditor(
dataset="data.csv",
sensitive_features=["gender", "race"],
privileged_groups={
"gender": "male", # Specify privileged group
"race": "white"
}
)
๐งโ๐ป Advanced: Core Class Usage
For advanced users, Fairsight exposes all core classes for maximum flexibility and custom workflows.
BiasDetector (Direct Use)
from fairsight import BiasDetector
import pandas as pd
df = pd.read_csv('data.csv')
detector = BiasDetector(dataset=df, sensitive_features=['gender'], target='outcome')
results = detector.detect_bias_on_dataset()
for r in results:
print(r)
DatasetAuditor (Direct Use)
from fairsight import DatasetAuditor
import pandas as pd
df = pd.read_csv('data.csv')
auditor = DatasetAuditor(dataset=df, protected_attributes=['gender'], target_column='outcome')
results = auditor.audit()
print(results['bias_detection'])
print(results['fairness_metrics'])
ModelAuditor (Direct Use)
from fairsight import ModelAuditor
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
df = pd.read_csv('data.csv')
X = df.drop('outcome', axis=1)
y = df['outcome']
model = RandomForestClassifier().fit(X, y)
auditor = ModelAuditor(model=model, X_test=X, y_test=y, protected_attributes=['gender'], target_column='outcome')
results = auditor.audit()
print(results['bias_detection'])
print(results['fairness_metrics'])
FairnessMetrics (Direct Use)
from fairsight import FairnessMetrics
import numpy as np
y_true = np.array([1, 0, 1, 0])
y_pred = np.array([1, 0, 0, 0])
protected = np.array([0, 1, 0, 1])
fm = FairnessMetrics(y_true, y_pred, protected_attr=protected, privileged_group=0)
print(fm.demographic_parity())
print(fm.equalized_odds())
print(fm.predictive_parity())
ExplainabilityEngine (Direct Use)
from fairsight import ExplainabilityEngine
from sklearn.linear_model import LogisticRegression
import pandas as pd
df = pd.read_csv('data.csv')
X = df.drop('outcome', axis=1)
y = df['outcome']
model = LogisticRegression().fit(X, y)
engine = ExplainabilityEngine(model=model, training_data=X, feature_names=list(X.columns))
shap_result = engine.explain_with_shap(X)
print(shap_result)
๐งฉ Standalone Utilities (Quick Use)
Fairsight exposes key utilities as standalone functions for maximum flexibility. You can use these independently of the main pipeline:
from fairsight import (
explain_with_shap, explain_with_lime, detect_illegal_data,
preprocess_data, calculate_privilege_groups, generate_html_report
)
# Preprocessing
df_processed, encoders = preprocess_data(df, target_column='outcome', protected_attributes=['gender'])
# Privilege group calculation
priv_groups = calculate_privilege_groups(df, ['gender'])
# Illegal data detection
illegal_results = detect_illegal_data(df)
# Explainability (SHAP & LIME)
shap_result = explain_with_shap(model, X, feature_names)
lime_result = explain_with_lime(model, X, feature_names)
# Quick HTML report
dummy_bias = {'gender': {'statistical_parity': 0.1}}
dummy_fairness = {'gender': {'demographic_parity': 0.12}}
report_path = generate_html_report(dummy_bias, dummy_fairness, model_name='DemoModel')
print(f"HTML report at: {report_path}")
๐ References & Citations
Algorithms & Metrics
- Reweighing (Bias Mitigation):
- Kamiran, F., & Calders, T. (2012). Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1), 1-33. Springer Link
- Fairness Metrics:
- Demographic Parity, Equalized Odds, Equal Opportunity, Predictive Parity, Disparate Impact, Statistical Parity, etc. are based on open academic literature, e.g.:
- Hardt, M., Price, E., & Srebro, N. (2016). Equality of Opportunity in Supervised Learning. NeurIPS. arXiv
- Feldman, M., et al. (2015). Certifying and removing disparate impact. KDD. arXiv
- Barocas, S., Hardt, M., & Narayanan, A. (2019). Fairness and Machine Learning. fairmlbook.org
- Demographic Parity, Equalized Odds, Equal Opportunity, Predictive Parity, Disparate Impact, Statistical Parity, etc. are based on open academic literature, e.g.:
- Explainability:
- Generalized Entropy Index:
- Speicher, T., et al. (2018). A Unified Approach to Quantifying Algorithmic Unfairness: Measuring Individual & Group Unfairness via Inequality Indices. KDD. arXiv
Libraries Used
- scikit-learn (BSD-3-Clause License): Machine learning models and utilities
- pandas (BSD-3-Clause License): Data processing
- numpy (BSD License): Numerical computing
- SHAP (MIT License): Model explainability
- LIME (MIT License): Model explainability
- matplotlib, seaborn (matplotlib: PSF License, seaborn: BSD): Visualization
All algorithms and metrics are implemented based on open academic literature and open-source libraries. No proprietary or closed-source code is used.
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- SAP HANA Cloud for enterprise data integration
- SHAP & LIME for model explainability
- scikit-learn for machine learning utilities
- pandas & numpy for data processing
๐ Support
- ๐ง Email: support@fairsight.com
- ๐ฌ GitHub Issues: Create an issue
- ๐ Documentation: fairsight.readthedocs.io
Made with โค๏ธ for Ethical AI
Fairsight Toolkit - Making AI Fair, Transparent, and Accountable
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fairsight-1.0.0.tar.gz.
File metadata
- Download URL: fairsight-1.0.0.tar.gz
- Upload date:
- Size: 67.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3183034aa8e96331790ab843970843421e2a9d980fed79dc72877f443d96ad8
|
|
| MD5 |
5cd6de098f7e0d1918003dc95a848170
|
|
| BLAKE2b-256 |
181d90e22d5567965131c920a381fab38e1425f9ee67b5f1bf5c72f8fdde2f38
|
File details
Details for the file fairsight-1.0.0-py3-none-any.whl.
File metadata
- Download URL: fairsight-1.0.0-py3-none-any.whl
- Upload date:
- Size: 80.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d54393e4166b99f01883f36090bd77ccd9d24a5edda771e94e38d9c759e6eabf
|
|
| MD5 |
538db78d8992809e617607c887cc0be9
|
|
| BLAKE2b-256 |
359c84cc1930cb01120e7c23eecac1c2b313d3fc888d4e8f04ece6de2d929b3e
|