hyperfair

A Python library for generating, evaluating, and improving rankings under fairness constraints.

These details have not been verified by PyPI

Project links

Repository

Project description

Package Logo hyperFA*IR

A hypergeometric approach to fair rankings with finite candidate pool

A Python library for generating, evaluating, and improving rankings under fairness constraints.

Overview

hyperFA*IR is a rigorous framework for researchers and practitioners who care about fairness in ranked outcomes. Leveraging hypergeometric tests and Monte Carlo methods, hyperFA*IR enables you to rigorously assess, visualize, and enforce fairness in any ranking scenario with a finite candidate pool.

Whether you are working with admissions, hiring, recommendations, or any ranked selection process, hyperFA*IR provides the tools you need to ensure equitable representation and transparency.

Features

Statistical Fairness Testing: Perform rigorous statistical tests (single or sequential) to detect under- or over-representation of protected groups at any cutoff in your ranking.
Monte Carlo Simulations: Accurately estimate p-values and confidence intervals for complex, sequential fairness tests using efficient Monte Carlo algorithms.
Fairness-Constrained Re-ranking: Automatically adjust unfair rankings to satisfy fairness constraints, with support for custom significance levels and test directions.
Quota and Weighted Sampling Models: Explore the impact of quotas and group-weighted selection on your rankings.
Comprehensive Visualization: Instantly visualize rankings, group proportions, confidence intervals, and fairness bounds to communicate results clearly.
Performance and Scalability: Designed for large datasets, with optimized algorithms that outperform existing methods in both speed and accuracy.

Installation

Clone the repository and install dependencies:

git clone https://github.com/CSHVienna/hyper_fair.git
cd hyper_fair
pip install -r requirements.txt

Quick Start

Get started with a single line: compute the p-value for fairness in your ranking!

Suppose you have a ranking where 1 indicates a protected candidate and 0 an unprotected candidate. To test whether the protected group is under-represented in the top-$k$ positions, simply run:

from hyperfair import measure_fairness_multiple_points

pvalue, _ = measure_fairness_multiple_points(
    x_seq=[0, 0, 0, 0, 1, 0, 1, 1, 1, 1],  # 1=protected, 0=unprotected
    k=10,  # Test the top 10 positions
    test_side='lower',  # Test for under-representation
    n_exp=100_000  # Number of Monte Carlo simulations
)
print(f"P-value: {pvalue:.3f}")
# Output: P-value: 0.023

This tells you how likely it is to observe as few protected candidates in the top $k$ as you did, under random selection. A small p-value (e.g., < 0.05) means the ranking is likely unfair to the protected group.

How to use the library

Loading data from a Pandas DataFrame

To analyze fairness in your rankings, you first need to load your data. The most stratight forward way is to load it from a Pandas DataFrame, and the package provides the load_data_from_pandas_df function for this purpose. This function extracts the relevant ranking and protected attribute information from your DataFrame and prepares it for fairness analysis.

Suppose your DataFrame has the following structure:

ID	SES	Score
0	Low SES	4.82
1	High SES	6.87
2	High SES	7.84
3	Low SES	4.17
4	High SES	4.71
...	...	...

Here, SES (socioeconomic status) is the protected attribute, and Score is the ranking criterion.

To load and process this data, use:

from code.data_loader import load_data_from_pandas_df
import pandas as pd

df = pd.read_csv(CSV_PATH)
ranking, ids = load_data_from_pandas_df(
    df,
    protected_attribute='SES',
    binary_dict={'Low SES': 1, 'High SES': 0},
    id_attribute='ID',
    order_by='Score',
    ascending=False  # Set to True if higher scores are better
)

Inputs:

df: A pandas DataFrame containing your ranking data.
protected_attribute: The column name in df that indicates group membership.
binary_dict: A dictionary mapping the values in the protected attribute column to 1 (protected group) and 0 (unprotected group).
id_attribute: The column name in df that uniquely identifies each candidate. If None, it selects the index of the DataFrame.
order_by: The column name in df used to rank candidates.

Outputs:

ranking: A NumPy array of 0s and 1s, ordered by rank (after sorting), where 1 indicates a protected candidate and 0 an unprotected candidate. This is the main input for all fairness analysis functions.
ids: A NumPy array of candidate IDs, ordered in the same way as ranking.

These outputs allow you to analyze the representation of protected and unprotected groups at every position in the ranking, and are required for all subsequent fairness tests and visualizations in the package.

Sequential tests for fairness

A key feature of hyperFA*IR is the ability to test rankings for fairness at multiple cutoffs. For example, you may want to check if the protected group is under-represented in the top $k$ positions of your ranking, or across all prefixes [1:j] for $j=1,\ldots,k$.

To do this, use the measure_fairness_multiple_points function. This function performs sequential statistical tests (using Monte Carlo simulations) to determine whether the observed representation of the protected group is consistent with random selection.

Example usage:

from hyperfair import measure_fairness_multiple_points

pvalue, generatedData = measure_fairness_multiple_points(
    x_seq=ranking,      # binary array: 1 if protected, 0 otherwise, ordered by rank
    k=30,               # number of top positions to test
    test_side='lower',  # test for under-representation
    n_exp=1000000       # number of Monte Carlo simulations
)

x_seq: Binary array indicating protected group membership, sorted by ranking.
k: Number of top positions (prefixes) to test.
test_side: Use 'lower' to test for under-representation, 'upper' for over-representation, or 'two-sided' for both.
n_exp: Number of Monte Carlo simulations for estimating p-values.

The function returns the p-value for the fairness test and a generatedData object that can be reused for further analysis or re-ranking.

For a more detailed guide and practical examples, see example.ipynb.

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

0.1.3

Jun 20, 2025

0.1.2

Jun 20, 2025

This version

0.1.1

Jun 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyperfair-0.1.1.tar.gz (17.5 kB view details)

Uploaded Jun 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hyperfair-0.1.1-py3-none-any.whl (16.7 kB view details)

Uploaded Jun 20, 2025 Python 3

File details

Details for the file hyperfair-0.1.1.tar.gz.

File metadata

Download URL: hyperfair-0.1.1.tar.gz
Upload date: Jun 20, 2025
Size: 17.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for hyperfair-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e4e800d072a87eebb4bf261120e6092351adacc6e7ed1c42d18ffc7ac475548e`
MD5	`8325ce64e19fcefe474aa44e1c31f3ae`
BLAKE2b-256	`d7bcae18561c98dcfd5454867afe9e581d97f2608596bbcbf5a3a59f03d3db21`

See more details on using hashes here.

File details

Details for the file hyperfair-0.1.1-py3-none-any.whl.

File metadata

Download URL: hyperfair-0.1.1-py3-none-any.whl
Upload date: Jun 20, 2025
Size: 16.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for hyperfair-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`44a9a0d19632835b0e902a6c3d0018fcda941e31e18a86fac91a848351269e8e`
MD5	`85847b4f73feafe63723f65f8a421dc4`
BLAKE2b-256	`776d0cf0404ed41bbeb5e6793037a79403023d376e1e7ae5a91652e6afd21b0c`

See more details on using hashes here.

hyperfair 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Overview

Features

Installation

Quick Start

How to use the library

Loading data from a Pandas DataFrame

Sequential tests for fairness

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes