A library for detecting and analyzing bias in text, datasets, and language models.
Project description
BiasCheck: An Open-Source Library for Bias Detection
BiasCheck is a robust and modular Python library designed to analyze and detect bias in text, models, and datasets. It provides tools for researchers, data scientists, and developers to measure various forms of bias (e.g., stereotypical, cultural) and assess the quality of language model outputs or textual data.
Features
- Modular Design: BiasCheck offers modular and extensible classes for different bias analysis tasks.
- Bias Detection: Analyze text, datasets, language models or databases for various types of bias.
- Support for RAG: Automatically create Retrieval-Augmented Generation (RAG) pipelines using documents or PDFs.
- Sentiment Analysis: Assess sentiment polarity alongside bias.
- Visualization: Visualize flagged sentences and bias types in your analysis.
Main Classes
1. DocuCheck
Analyze bias in standalone text documents or files.
Key Features:
- Accepts text data or documents (e.g., PDF, TXT).
- Detects flagged sentences and calculates a bias score.
- Optionally uses a list of polarizing terms for context-specific bias detection.
Example:
from biascheck.analysis.docucheck import DocuCheck
data = "This is a sample document that may contain biases."
terms = ["biased", "lazy", "discrimination"]
analyzer = DocuCheck(data=data, terms=terms)
result = analyzer.analyze(verbose=False)
print(result)
2. SetCheck
Analyze entire datasets (e.g., DataFrames) for skewed or biased records.
Key Features:
- Works with Python DataFrames and CSV files.
- Adds bias-related columns to the dataset.
- Returns flagged records and overall bias analysis.
Example:
from biascheck.analysis.setcheck import SetCheck
data = [{"text": "A biased example."}, {"text": "A neutral sentence."}]
terms = ["bias", "stereotype"]
analyzer = SetCheck(data=data, input_cols=["text"], terms=terms)
flagged_df = analyzer.analyze(top_n=5)
print(flagged_df)
3. ModuCheck
Analyze bias in language model outputs using Hugging Face models.
Key Features:
- Supports Hugging Face models and pipelines.
- Detects bias in generated outputs based on user-provided topics.
- Automatically builds a RAG pipeline if a document is provided.
- Saves flagged outputs and bias results to a DataFrame.
Example:
from biascheck.analysis.moducheck import ModuCheck
from transformers import pipeline
# Initialize a Hugging Face pipeline
model = pipeline("text-generation", model="gpt2")
topics = ["The role of gender in leadership", "Cultural diversity"]
analyzer = ModuCheck(model=model, terms=["bias", "stereotype"], document="file.pdf")
result = analyzer.analyze(topics=topics, num_responses=5)
print(result)
4. RAGCheck
Analyze bias in RAG pipelines by combining document retrieval and natural language generation.
Key Features:
- Builds Retrieval-Augmented Generation pipelines from documents or PDFs.
- Supports hypothesis-based contextual bias detection using NLI models.
- Integrates FAISS for vectorized document retrieval.
- Identifies bias in retrieved content and generated outputs.
Example:
from biascheck.analysis.ragcheck import RAGCheck
from transformers import pipeline
# Initialize a Hugging Face pipeline
model = pipeline("text-generation", model="gpt2")
terms = ["bias", "discrimination"]
analyzer = RAGCheck(model=model, document="sample.pdf", terms=terms, verbose=True)
result = analyzer.analyze(top_n=5)
print(result)
5. Visualiser
Visualize the results of bias analysis.
Key Features:
- Generates bar charts for flagged bias categories.
- Visualizes flagged sentences and bias distribution.
Example:
from biascheck.visualisation.visualiser import Visualiser
visualiser = Visualiser()
visualiser.plot_bias_categories(flagged_records)
6. BaseCheck (under construction)
Analyze bias in databases similar to the rest of the library.
Key Features:
- Database Compatibility: Supports both vector databases (e.g., FAISS) and graph databases (e.g., Neo4j).
- Saves flagged outputs and bias results to a DataFrame.
Installation
Prerequisites
- Python 3.9 or 3.10
- pip (Python package installer)
- For GPU support: CUDA-compatible GPU and CUDA toolkit
Basic Installation
For CPU-only installation:
pip install biascheck
Optional Dependencies
For GPU support (requires CUDA-compatible GPU):
pip install "biascheck[gpu]"
For development and testing:
pip install "biascheck[test]"
For all features (GPU + testing):
pip install "biascheck[all]"
Platform-Specific Notes
macOS
- No additional requirements for basic installation
- For GPU support, ensure you have CUDA installed via Homebrew or other package manager
Linux
- No additional requirements for basic installation
- For GPU support, ensure CUDA toolkit is installed
- Some distributions may require additional system packages for PDF processing
Windows
- No additional requirements for basic installation
- For GPU support, ensure CUDA toolkit is installed
- May require Visual C++ Redistributable for some dependencies
Troubleshooting
If you encounter any installation issues:
- Ensure you're using Python 3.9 or 3.10
- Try creating a fresh virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install biascheck
- For GPU-related issues, verify CUDA installation:
nvidia-smi # Should show GPU information
- If specific dependencies fail, try installing them separately:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install biascheck
System Requirements
- Minimum 4GB RAM (8GB recommended)
- 2GB free disk space
- For GPU support: NVIDIA GPU with CUDA support
Usage
Run Examples
The notebooks/ directory contains example scripts for all analysis classes:
python notebooks/moducheck_example.py
python notebooks/docucheck_example.py
Contributing
We welcome contributions! Please fork the repository, make your changes, and submit a pull request. Ensure all new features are covered with appropriate tests.
Future Work
- Multimodal Support: Expand the library to include image, video, and audio bias detection.
- Enhanced RAG Pipelines: Improve integration with custom retrievers.
- Advanced Bias Categories: Expand predefined bias categories for deeper contextual analysis.
Contact
For questions, suggestions, or feedback, reach out to the project maintainer:
- Name: Arjun Balaji
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file biascheck-0.8.9.tar.gz.
File metadata
- Download URL: biascheck-0.8.9.tar.gz
- Upload date:
- Size: 21.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6586707d31b4e896a259446f7495ad74b6271890a20ee020e231bb04e195f9ed
|
|
| MD5 |
7ec8283c33acd098d4ef5a9a20b1f70a
|
|
| BLAKE2b-256 |
2c557738bb7da553cb8f1a98e0b3d7931072b200346114e707c5701455217764
|
File details
Details for the file biascheck-0.8.9-py3-none-any.whl.
File metadata
- Download URL: biascheck-0.8.9-py3-none-any.whl
- Upload date:
- Size: 27.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15abb12b3e4a770e0a5d25ed142734c997d67668eca40f4ac0e097e933778a81
|
|
| MD5 |
9c534159b8eb8ce050132d2c539a4506
|
|
| BLAKE2b-256 |
bb225d4602d12b73be97dae79e2acd519cb34eb36e9f98316fab05e604264fff
|