unsupervised-bias-detection

package for unsupervised bias detection

These details have not been verified by PyPI

Project description

Detecting higher-dimensional forms of proxy bias

📄 Applied in real-world audit: audit report

☁️ Web app on Algorithm Audit website

🧪 Scientific paper: Arxiv pre-print

Key takeaways – Why unsupervised bias detection?

Quantitative-qualitative joint method: Data-driven bias testing combined with the balanced and context-sensitive judgment of human experts;
Normative advice commission: Expert-led, deliberative assessment to establish unfair treatment;
Bias scan tool: Scalable method based on machine learning to detect algorithmic bias;
Unsupervised bias detection: No user data needed on protected attributes;
Detects complex bias: Identifies unfairly treated groups characterized by mixture of features, detects intersectional bias;
Model-agnostic: Works for all binary AI classifiers;
Open-source and not-for-profit: Easy to use and available for the entire AI auditing community.


Code
Community

How this tool fits in our quantitative-qualitative AI auditing framework?

The Joint Fairness Assessment Method developed (JFAM) by NGO Algorithm Audit combines data-driven bias testing with normative and context-sensitive judgment of human experts, to determine fair AI on a case-by-case basis. The data-driven component comprises this unsupervised clustering tool (available as a free-to-use web app) that discovers complex and hidden forms of bias. It thereby tackles the difficult problem of detecting proxy-discrimination that stems from unforeseen and higher-dimensional forms of bias, including intersectional forms of discrimination. The results of the bias scan tool serve as a starting point for a deliberative assessment by human experts to evaluate potential discrimination and unfairness in an AI system.

As an example, we applied our bias detection tool to a BERT-based disinformation classifier and distilled a set of pressing questions about its performance and possible biases. We presented these questions to an independent advice commission composed of four academic experts on fair AI, and two civil society organizations working on disinformation detection. The advice commission believes there is a low risk of (higher-dimensional) proxy discrimination by the reviewed disinformation classifier. The commission judged that the differences in treatment identified by the quantitative bias scan can be justified, if certain conditions apply. The full advice can be read in our algoprudence case repository (ALGO:AA:2023:01).

Our joint approach to AI auditing is supported by 20+ actors from the international AI auditing community, including journalists, civil society organizations, NGOs, corporate data scientists and academics. In sum, it combines the power of rigorous, machine learning-informed bias testing with the balanced judgment of human experts, to determine fair AI in a concrete way.

_{¹The bias scan tool is based on the k-means Hierarchical Bias-Aware Clustering method as described in Bias-Aware Hierarchical Clustering for detecting the discriminated groups of users in recommendation systems, Misztal-Radecka, Indurkya, Information Processing and Management (2021). [link] Additional research indicates that k-means HBAC, in comparison to other clustering algorithms, works best to detect bias in real-world datasets.}

_{²The uploaded data is instantly deleted from the server after being processed.}

_{³Real-time Rumor Debunking on Twitter, Liu et al., Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (2015).}

Bias detection tool manual

A .csv file of max. 1GB, with columns: features, performance metric. Note: Only the naming, not the order of the columns is of importance. The dataframe displayed in Table 1 is digestible by the web app

feat_1	feat_2	...	feat_n	performance metric
10	1	...	0.1	1
20	2	...	0.2	1
30	3	...	0.3	0

_{Table 1 – Structure of input data in the bias detection tool}

Features values can be numeric or categorical values. The numeric performance metric is context-dependent. The variable can, for instance, represents being 'selected for examination' (yes or no), 'assigned to a high-risk catagory (yes or no)' or false positive (yes or no). Low scores are considered to be a negative bias, i.e., if being selected for examination is considered to be harmful, 'selected for examination=Yes' should be codified as 0 and 'selected for examination=No' should be codified as 1.

Example – Hierarchical Bias-Aware Clustering

Note: The feature labels used in this example can easily be changed for numeric targets. This flexibility enables adaptation to detect (higher-dimensional) bias in various AI classifiers.

import unsupervised-bias-detection as usb

X = [[35, 55000, 1], # age, income, number of cars
     [40, 45000, 0], 
     [20, 30000, 0]]
y = [1, 0, 0]  # flagged for fraud examination (yes:0, no:1)
hbac = BiasAwareHierarchicalKMeans(n_iter=1, min_cluster_size=1).fit(X, y)
hbac.n_clusters_
>>> 2
hbac.scores_
>>> array([ 0.5, -0.5])

Schematic overview

Contributing Members

20+ endorsements from various parts of the AI auditing community

Journalism

Gabriel Geiger, Investigative Reporter Algorithms and Automated Decision-Making at Lighthouse Reports

Civil society organisations

Maldita, an independent journalistic platform focused on the control of disinformation and public discourse through fact-checking and data journalism techniques
Demos, Britain's leading cross-party think-tank
AI Forensics, a European non-profit that investigates influential and opaque algorithms
NLAIC, The Netherlands AI Coalition
Progressive Café, public platform of young Dutch intellectuals, represented by Kiza Magendane
Dutch AI Ethics Community, represented by Samaa Mohammad
Simone Maria Parazzoli, OECD Observatory of Public Sector Innovation (OPSI)

Industry

Selma Muhammad, Trustworthy AI consultant at Deloitte
Laurens van der Maas, Data Scientist at AWS
Xiaoming op de Hoek, Data Scientist at Rabobank
Jan Overgoor, Data Scientist at SPAN
Dasha Simons, Trustworthy AI consultant at IBM

Academia

Anne Meuwese, Professor in Public Law & AI at Leiden University
Hinda Haned, Professor in Responsible Data Science at University of Amsterdam
Raphaële Xenidis, Associate Professor in EU law at Sciences Po Paris
Marlies van Eck, Assistant Professor in Administrative Law & AI at Radboud University
Aileen Nielsen, Fellow Law&Tech at ETH Zürich
Vahid Niamadpour, PhD-candidate in Linguistics at Leiden University
Ola Al Khatib, PhD-candidate in the legal regulation of algorithmic decision-making at Utrecht University

Help and Support

This project is still in its early stages, and the documentation is a work in progress. In the meantime, feel free to open an issue, and we'll do our best to assist you.

Contributing

Your contributions are highly encouraged! There are many opportunities for potential projects, so please reach out if you'd like to get involved. Whether it's code, notebooks, examples, or documentation, every contribution is valuable—so don’t hesitate to jump in. To contribute, simply fork the project, make your changes, and submit a pull request. We’ll work with you to address any issues and get your code merged into the main branch.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.6

May 16, 2025

0.2.3

May 14, 2025

0.2.1

Sep 16, 2024

0.2.0

Aug 13, 2024

0.1.0

Apr 22, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unsupervised_bias_detection-0.2.6.tar.gz (19.5 kB view details)

Uploaded May 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

unsupervised_bias_detection-0.2.6-py3-none-any.whl (20.0 kB view details)

Uploaded May 16, 2025 Python 3

File details

Details for the file unsupervised_bias_detection-0.2.6.tar.gz.

File metadata

Download URL: unsupervised_bias_detection-0.2.6.tar.gz
Upload date: May 16, 2025
Size: 19.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1012-azure

File hashes

Hashes for unsupervised_bias_detection-0.2.6.tar.gz
Algorithm	Hash digest
SHA256	`d0afed1e0a6d7b0746f3a46468c465b1770ca29673fa1f701cca44450020dcb8`
MD5	`c6e10e032f34c952fc5c5f3ffdbb1d38`
BLAKE2b-256	`766195c3865782efed12ab50429e5b706767ca2b758fb56374f269e3c8f85648`

See more details on using hashes here.

File details

Details for the file unsupervised_bias_detection-0.2.6-py3-none-any.whl.

File metadata

Download URL: unsupervised_bias_detection-0.2.6-py3-none-any.whl
Upload date: May 16, 2025
Size: 20.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1012-azure

File hashes

Hashes for unsupervised_bias_detection-0.2.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0007de4d80257980e20f44b1fd13b2b08445ca769400b47ac0e7cbfc3f8fb43a`
MD5	`0bcae4beb91c76aa03ae0c44d2eceef3`
BLAKE2b-256	`0da3ac1313812919e28ad654730caba13ec74e1ec1b1947ba4dce705d9e738c2`

See more details on using hashes here.

unsupervised-bias-detection 0.2.6

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Detecting higher-dimensional forms of proxy bias

Key takeaways – Why unsupervised bias detection?

How this tool fits in our quantitative-qualitative AI auditing framework?

Bias detection tool manual

Example – Hierarchical Bias-Aware Clustering

Schematic overview

Contributing Members

20+ endorsements from various parts of the AI auditing community

Journalism

Civil society organisations

Industry

Academia

Help and Support

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes