benchbench

Tools for measuring sensitivity and diversity of multi-task benchmarks.

These details have not been verified by PyPI

Project description

BenchBench is a Python package that provides a suite of tools to evaluate multi-task benchmarks focusing on diversity and sensitivity against irrelevant variations, such as label noise injection and the addition of irrelevant candidate models. This package facilitates comprehensive analysis of multi-task benchmarks through a social choice lens, exposing the fundamental trade-off between diversity and stability in both cardinal and ordinal benchmarks.

For more information, including the motivations behind the measures and our empirical findings, please see our paper.

Quick Start

To install the package, simply run:

pip install benchbench

Example Usage

To evaluate a cardinal benchmark, you can use the following code:

from benchbench.data import load_cardinal_benchmark
from benchbench.measures.cardinal import get_diversity, get_sensitivity

data, cols = load_cardinal_benchmark('GLUE')
diversity = get_diversity(data, cols)
sensitivity = get_sensitivity(data, cols)

To evaluate an ordinal benchmark, you can use the following code:

from benchbench.data import load_ordinal_benchmark
from benchbench.measures.ordinal import get_diversity, get_sensitivity

data, cols = load_ordinal_benchmark('HELM-accuracy')
diversity = get_diversity(data, cols)
sensitivity = get_sensitivity(data, cols)

To use your own benchmark, you just need to provide a pandas DataFrame and a list of columns indicating the tasks. Check the documentation for more details.

Reproduce the Paper

One could check out cardinal.ipynb, ordinal.ipynb and banner.ipynb to reproduce our results using Google Colab with one click.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.1

Oct 12, 2025

This version

1.0.0

Apr 29, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

benchbench-1.0.0.tar.gz (209.2 kB view details)

Uploaded Apr 29, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

benchbench-1.0.0-py3-none-any.whl (243.2 kB view details)

Uploaded Apr 29, 2024 Python 3

File details

Details for the file benchbench-1.0.0.tar.gz.

File metadata

Download URL: benchbench-1.0.0.tar.gz
Upload date: Apr 29, 2024
Size: 209.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for benchbench-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`f7c3a7ed05c87b928676230bb00142d0e5081fe653205a6f6c79145aa2d7be1a`
MD5	`84c6a203ea2935a04d2dcbaa947d9481`
BLAKE2b-256	`339e5343fc7affadb088d843229f83506cfe272df9ab3e1936591dd746ef0425`

See more details on using hashes here.

File details

Details for the file benchbench-1.0.0-py3-none-any.whl.

File metadata

Download URL: benchbench-1.0.0-py3-none-any.whl
Upload date: Apr 29, 2024
Size: 243.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for benchbench-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dcc5a97c6bda191c50134441b986839546636d3959eff3380a46c31d2e062405`
MD5	`ab6f9bc76ec4a6f221bd682474eed43e`
BLAKE2b-256	`5139033c843e3f9e6aec8ac4a0102a154ccdfa897a80a85a3f6dababba355b66`

See more details on using hashes here.

benchbench 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Quick Start

Example Usage

Reproduce the Paper

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes