Skip to main content

A tool for detecting biases in machine learning datasets

Project description

Bias Detection Engine for ML Datasets

A comprehensive tool for detecting various types of biases in machine learning datasets before model training begins.

Features

  • Predictive Imbalance Detection: Identifies class imbalance and feature distribution skews
  • Label Leakage Analysis: Detects potential data leakage and feature dependencies
  • Distributional Bias: Analyzes demographic and categorical feature distributions
  • Information Theoretic Metrics: Measures feature importance and mutual information
  • Autoencoder-based Anomaly Detection: Identifies unusual patterns in the data
  • Contrastive Learning Analysis: Evaluates feature representations and relationships

Installation

pip install -r requirements.txt

Project Structure

bias_detection_engine/
├── core/
│   ├── __init__.py
│   ├── imbalance_detector.py
│   ├── leakage_detector.py
│   ├── distribution_analyzer.py
│   └── feature_analyzer.py
├── utils/
│   ├── __init__.py
│   ├── preprocessing.py
│   └── visualization.py
├── models/
│   ├── __init__.py
│   ├── autoencoder.py
│   └── contrastive_learner.py
├── notebooks/
│   └── examples/
└── tests/
    └── __init__.py

Usage

from bias_detection_engine.core import ImbalanceDetector, LeakageDetector, DistributionAnalyzer
from bias_detection_engine.utils import preprocess_data

# Load and preprocess your dataset
data = preprocess_data(your_dataset)

# Detect imbalances
imbalance_detector = ImbalanceDetector()
imbalance_report = imbalance_detector.analyze(data)

# Check for label leakage
leakage_detector = LeakageDetector()
leakage_report = leakage_detector.analyze(data)

# Analyze distributions
distribution_analyzer = DistributionAnalyzer()
distribution_report = distribution_analyzer.analyze(data)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bias_detection_engine-0.1.0.tar.gz (3.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bias_detection_engine-0.1.0-py3-none-any.whl (3.1 kB view details)

Uploaded Python 3

File details

Details for the file bias_detection_engine-0.1.0.tar.gz.

File metadata

  • Download URL: bias_detection_engine-0.1.0.tar.gz
  • Upload date:
  • Size: 3.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for bias_detection_engine-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7f6baba2fb3887dcb765891544eaf0bfdec33e69897fc379bbb162d3d0c100f8
MD5 66cb7e01eb1aedd619154b6b82014a7f
BLAKE2b-256 45ef22807b510546c8c953b1581b8a41d7ce0d6fe99bcc77dad0029aca6276ec

See more details on using hashes here.

File details

Details for the file bias_detection_engine-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for bias_detection_engine-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cdfcffd6f6265a80fa65cc6ccd53977b167bb547785ae0117380377c20a07b48
MD5 d4ea4fa1a43c1af831bab326b5321da3
BLAKE2b-256 393843950762eac50e4704a9e6782a736c85e0324355756cff5ec1602efa8954

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page