unsupervised-bias-detection

No project description provided

These details have not been verified by PyPI

Project description

Detecting higher-dimensional forms of proxy bias

📄 Applied in real-world audit: audit report

Note: This module is still considered experimental, so conclusions drawn from the results should be carefully reviewed by domain experts.

Key takeaways – Why unsupervised bias detection?

Quantitative-qualitative joint method: Data-driven bias testing combined with the balanced and context-sensitive judgment of human experts;
Normative advice commission: Expert-led, deliberative assessment to establish unfair treatment;
Bias scan tool: Scalable method based on machine learning to detect algorithmic bias;
Unsupervised bias detection: No user data needed on protected attributes;
Detects complex bias: Identifies unfairly treated groups characterized by mixture of features, detects intersectional bias;
Model-agnostic: Works for all binary AI classifiers;
Open-source and not-for-profit: Easy to use and available for the entire AI auditing community.


Code
Community

How this tool fits in our quantitative-qualitative AI auditing framework?

The Joint Fairness Assessment Method developed (JFAM) by NGO Algorithm Audit combines data-driven bias testing with normative and context-sensitive judgment of human experts, to determine fair AI on a case-by-case basis. The data-driven component comprises this unsupervised clustering tool (available as a free-to-use web app) that discovers complex and hidden forms of bias. It thereby tackles the difficult problem of detecting proxy-discrimination that stems from unforeseen and higher-dimensional forms of bias, including intersectional forms of discrimination. The results of the bias scan tool serve as a starting point for a deliberative assessment by human experts to evaluate potential discrimination and unfairness in an AI system.

As an example, we applied our bias detection tool to a BERT-based disinformation classifier and distilled a set of pressing questions about its performance and possible biases. We presented these questions to an independent advice commission composed of four academic experts on fair AI, and two civil society organizations working on disinformation detection. The advice commission believes there is a low risk of (higher-dimensional) proxy discrimination by the reviewed disinformation classifier. The commission judged that the differences in treatment identified by the quantitative bias scan can be justified, if certain conditions apply. The full advice can be read in our algoprudence case repository (ALGO:AA:2023:01).

Our joint approach to AI auditing is supported by 20+ actors from the international AI auditing community, including journalists, civil society organizations, NGOs, corporate data scientists and academics. In sum, it combines the power of rigorous, machine learning-informed bias testing with the balanced judgment of human experts, to determine fair AI in a concrete way.

_{¹The bias scan tool is based on the k-means Hierarchical Bias-Aware Clustering method as described in Bias-Aware Hierarchical Clustering for detecting the discriminated groups of users in recommendation systems, Misztal-Radecka, Indurkya, Information Processing and Management (2021). [link] Additional research indicates that k-means HBAC, in comparison to other clustering algorithms, works best to detect bias in real-world datasets.}

_{²The uploaded data is instantly deleted from the server after being processed.}

_{³Real-time Rumor Debunking on Twitter, Liu et al., Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (2015).}

Bias detection tool manual

A .csv file of max. 1GB, with columns: features, performance metric. Note: Only the naming, not the order of the columns is of importance. The dataframe displayed in Table 1 is digestible by the web app

feat_1	feat_2	...	feat_n	performance metric
10	1	...	0.1	1
20	2	...	0.2	1
30	3	...	0.3	0

_{Table 1 – Structure of input data in the bias detection tool}

Features values can be numeric or categorical values. The numeric performance metric is context-dependent. The variable can, for instance, represents being 'selected for examination' (yes or no), 'assigned to a high-risk catagory (yes or no)' or false positive (yes or no). Low scores are considered to be a negative bias, i.e., if being selected for examination is considered to be harmful, 'selected for examination=Yes' should be codified as 0 and 'selected for examination=No' should be codified as 1.

Example – Hierarchical Bias-Aware Clustering

Note: The feature labels used in this example can easily be changed for numeric targets. This flexibility enables adaptation to detect (higher-dimensional) bias in various AI classifiers.

import unsupervised-bias-detection as usb

X = [[35, 55000, 1], # age, income, number of cars
     [40, 45000, 0], 
     [20, 30000, 0]]
y = [1, 0, 0]  # flagged for fraud examination (yes:0, no:1)
hbac = BiasAwareHierarchicalKMeans(n_iter=1, min_cluster_size=1).fit(X, y)
hbac.n_clusters_
>>> 2
hbac.scores_
>>> array([ 0.5, -0.5])

Schematic overview

Contributing Members

20+ endorsements from various parts of the AI auditing community

Journalism

Gabriel Geiger, Investigative Reporter Algorithms and Automated Decision-Making at Lighthouse Reports

Civil society organisations

Maldita, an independent journalistic platform focused on the control of disinformation and public discourse through fact-checking and data journalism techniques
Demos, Britain's leading cross-party think-tank
AI Forensics, a European non-profit that investigates influential and opaque algorithms
NLAIC, The Netherlands AI Coalition
Progressive Café, public platform of young Dutch intellectuals, represented by Kiza Magendane
Dutch AI Ethics Community, represented by Samaa Mohammad
Simone Maria Parazzoli, OECD Observatory of Public Sector Innovation (OPSI)

Industry

Selma Muhammad, Trustworthy AI consultant at Deloitte
Laurens van der Maas, Data Scientist at AWS
Xiaoming op de Hoek, Data Scientist at Rabobank
Jan Overgoor, Data Scientist at SPAN
Dasha Simons, Trustworthy AI consultant at IBM

Academia

Anne Meuwese, Professor in Public Law & AI at Leiden University
Hinda Haned, Professor in Responsible Data Science at University of Amsterdam
Raphaële Xenidis, Associate Professor in EU law at Sciences Po Paris
Marlies van Eck, Assistant Professor in Administrative Law & AI at Radboud University
Aileen Nielsen, Fellow Law&Tech at ETH Zürich
Vahid Niamadpour, PhD-candidate in Linguistics at Leiden University
Ola Al Khatib, PhD-candidate in the legal regulation of algorithmic decision-making at Utrecht University

Help and Support

This project is still in its early stages, and the documentation is a work in progress. In the meantime, feel free to open an issue, and we'll do our best to assist you.

Contributing

Your contributions are highly encouraged! There are many opportunities for potential projects, so please reach out if you'd like to get involved. Whether it's code, notebooks, examples, or documentation, every contribution is valuable—so don’t hesitate to jump in. To contribute, simply fork the project, make your changes, and submit a pull request. We’ll work with you to address any issues and get your code merged into the main branch.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.1

Sep 16, 2024

This version

0.2.0

Aug 13, 2024

0.1.0

Apr 22, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unsupervised_bias_detection-0.2.0.tar.gz (14.3 kB view details)

Uploaded Aug 13, 2024 Source

Built Distribution

unsupervised_bias_detection-0.2.0-py3-none-any.whl (13.3 kB view details)

Uploaded Aug 13, 2024 Python 3

File details

Details for the file unsupervised_bias_detection-0.2.0.tar.gz.

File metadata

Download URL: unsupervised_bias_detection-0.2.0.tar.gz
Upload date: Aug 13, 2024
Size: 14.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.11.9 Darwin/23.2.0

File hashes

Hashes for unsupervised_bias_detection-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`d4e2569df467adf76fe4134b3a08c95f218e965bc18fff095ecbe9149f82ac9e`
MD5	`e7c024824fee7a0a86ba68b770c5af72`
BLAKE2b-256	`3ee04f01c1f56678638ad5d3b0cd4f8a74a66b9512f6326d250f51f2326d7483`

See more details on using hashes here.

File details

Details for the file unsupervised_bias_detection-0.2.0-py3-none-any.whl.

File metadata

Download URL: unsupervised_bias_detection-0.2.0-py3-none-any.whl
Upload date: Aug 13, 2024
Size: 13.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.11.9 Darwin/23.2.0

File hashes

Hashes for unsupervised_bias_detection-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b43a3c12b2b51fc5e1b1143eea0178cdf9bace72b9a9ac47c919fdfd721df23e`
MD5	`c2ac2cfc66d333d47b2f7a0e6c7c18cb`
BLAKE2b-256	`0b711fe03eb5d4fe76ca8021aa59071c4044fb23cd36f4b65f38a812fe456d2a`