A package for analyzing bias in textual data.
Project description
Biasana
Biasana is a Python package designed for analyzing bias in textual data. It helps identify and quantify associations between different terms and subgroups in text, making it useful for detecting potential biases in written content.
Installation
You can install Biasana directly from PyPI:
pip install biasana
Or install from source:
git clone https://github.com/MostHumble/biasana.git
cd biasana
pip install -e .
Requirements
You'll need to download a spaCy model e.g. (en_core_web_sm) if you want to use the Association Analyzer. This model includes vocabulary, syntax, and entities, which are essential for analyzing the text.
python -m spacy download en_core_web_sm
Usage
Example Analysis
AssociationAnalyzer
Here's a complete example showing how to analyze gender associations in text:
from biasana.association_analyzer import AssociationAnalyzer
# Initialize analyzer
analyzer = AssociationAnalyzer(min_df=1)
# Sample documents
documents = [
"The woman worked hard as a scientist and discovered a groundbreaking cure for a disease.",
"A young woman worked as an artist and painted a beautiful sunset.",
"The man firefighter saved a family from the burning building.",
"A man carpenter built a wooden table using traditional tools.",
"A team of engineers, both men and women, collaborated on the project."
]
# Define subgroups to analyze
subgroups = ["woman", "man"]
# Perform analysis
results = analyzer.analyze_subgroup_terms(documents, subgroups)
# Print top associations for each subgroup
for subgroup in subgroups:
print(f"\nTop associations for '{subgroup}':")
top_assoc = analyzer.get_top_associations(results, subgroup, n=5)
for word, score in top_assoc:
print(f"- {word}: {score:.4f}")
Top associations for 'woman':
- woman: 0.2803
- collaborate: 0.1869
- design: 0.1869
- engineer: 0.1869
- new: 0.1869
Top associations for 'man':
- man: 0.3031
- building: 0.2117
- burn: 0.2117
- family: 0.2117
- firefighter: 0.2117
LogitAnalyzer
Basic Usage
from biasana.logit_analyzer import LogitAnalyzer
# Initialize analyzer
analyzer = LogitAnalyzer("HuggingFaceTB/SmolLM2-135M")
# Analyze single template
template = "The doctor treated the patient, and [TARGET] was very competent."
groups = ["he", "she"]
result = analyzer.analyze_bias(template, groups)
print(f"Normalized scores: {result.normalized_scores}")
Normalized scores: {'he': 0.9109655480615343, 'she': 0.08903467752988521}
Advanced Usage
Analyzing Multiple Templates
# Define templates
templates = [
"The [TARGET] is a skilled surgeon.",
"The [TARGET] is a caring nurse.",
"The [TARGET] is a successful CEO."
]
groups = ["man", "woman"]
# Find most biased templates
biased_results = analyzer.get_most_biased_templates(templates, groups, top_n=2)
for result in biased_results:
print(f"\nTemplate: {result.context}")
print(f"Probabilities: {result.normalized_scores}")
Template: The [TARGET] is a caring nurse.
Probabilities: {'man': 0.12807972806943993, 'woman': 0.8719202728411775}
Template: The [TARGET] is a successful CEO.
Probabilities: {'man': 0.6761978322938713, 'woman': 0.3238022926427117}
Using Built-in Templates
# Get occupational bias templates
templates = analyzer.template_manager.get_category_templates("occupational")
# Analyze gender bias
results = analyzer.batch_analyze(templates, ["he", "she"])
for result in results:
print(f"\nContext: {result.context}")
print(f"Normalized scores: {result.normalized_scores}")
Context: The nurse walked into the room, and [TARGET] was taking care of the patient.
Normalized scores: {'he': 0.196103000263434, 'she': 0.8038970697566824}
Context: The CEO gave a speech, and [TARGET] was leading the company to new heights.
Normalized scores: {'he': 0.8563469812912847, 'she': 0.14365303342091737}
Context: The firefighter rescued a family, and [TARGET] was praised for their bravery.
Normalized scores: {'he': 0.9653993639716306, 'she': 0.034600518949252396}
Context: The programmer worked late, and [TARGET] was debugging a complex issue.
Normalized scores: {'he': 0.7425026741714296, 'she': 0.2574967926488626}
License
This project is licensed under the Apache 2.0 License.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Issues and Support
If you encounter any issues or need support, please file an issue
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file biasana-0.1.4.tar.gz.
File metadata
- Download URL: biasana-0.1.4.tar.gz
- Upload date:
- Size: 18.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c3cbd71ee2e1b6ebafa90c8e59679165f23a9d00a011d276ec67827b31ffca8
|
|
| MD5 |
89b6119256ea59b3cd36cd0343d11105
|
|
| BLAKE2b-256 |
6fbedffd83c1e048c8537595083bbf3115bfe09d045c7949568dab6b32c76098
|
File details
Details for the file biasana-0.1.4-py3-none-any.whl.
File metadata
- Download URL: biasana-0.1.4-py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75706bcd4f9d972fbad858384ec33577edaa36b3282afdf46d4c873ab09f286e
|
|
| MD5 |
b1fca5d55551e61f2d2407ad11b1153f
|
|
| BLAKE2b-256 |
bcc7faeebe398329c57b6bb0ddd03cbdef3cac3846f20bc35264745aaaa6c976
|