A tool for detecting biases in machine learning datasets
Project description
Bias Detection Engine for ML Datasets
A comprehensive tool for detecting various types of biases in machine learning datasets before model training begins.
Features
- Predictive Imbalance Detection: Identifies class imbalance and feature distribution skews
- Label Leakage Analysis: Detects potential data leakage and feature dependencies
- Distributional Bias: Analyzes demographic and categorical feature distributions
- Information Theoretic Metrics: Measures feature importance and mutual information
- Autoencoder-based Anomaly Detection: Identifies unusual patterns in the data
- Contrastive Learning Analysis: Evaluates feature representations and relationships
Installation
pip install -r requirements.txt
Project Structure
bias_detection_engine/
├── core/
│ ├── __init__.py
│ ├── imbalance_detector.py
│ ├── leakage_detector.py
│ ├── distribution_analyzer.py
│ └── feature_analyzer.py
├── utils/
│ ├── __init__.py
│ ├── preprocessing.py
│ └── visualization.py
├── models/
│ ├── __init__.py
│ ├── autoencoder.py
│ └── contrastive_learner.py
├── notebooks/
│ └── examples/
└── tests/
└── __init__.py
Usage
from bias_detection_engine.core import ImbalanceDetector, LeakageDetector, DistributionAnalyzer
from bias_detection_engine.utils import preprocess_data
# Load and preprocess your dataset
data = preprocess_data(your_dataset)
# Detect imbalances
imbalance_detector = ImbalanceDetector()
imbalance_report = imbalance_detector.analyze(data)
# Check for label leakage
leakage_detector = LeakageDetector()
leakage_report = leakage_detector.analyze(data)
# Analyze distributions
distribution_analyzer = DistributionAnalyzer()
distribution_report = distribution_analyzer.analyze(data)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bias_detection_engine-0.1.0.tar.gz.
File metadata
- Download URL: bias_detection_engine-0.1.0.tar.gz
- Upload date:
- Size: 3.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f6baba2fb3887dcb765891544eaf0bfdec33e69897fc379bbb162d3d0c100f8
|
|
| MD5 |
66cb7e01eb1aedd619154b6b82014a7f
|
|
| BLAKE2b-256 |
45ef22807b510546c8c953b1581b8a41d7ce0d6fe99bcc77dad0029aca6276ec
|
File details
Details for the file bias_detection_engine-0.1.0-py3-none-any.whl.
File metadata
- Download URL: bias_detection_engine-0.1.0-py3-none-any.whl
- Upload date:
- Size: 3.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdfcffd6f6265a80fa65cc6ccd53977b167bb547785ae0117380377c20a07b48
|
|
| MD5 |
d4ea4fa1a43c1af831bab326b5321da3
|
|
| BLAKE2b-256 |
393843950762eac50e4704a9e6782a736c85e0324355756cff5ec1602efa8954
|