BTZSC: A Benchmark for Zero-Shot Text Classification across Cross-Encoders, Embedding Models, Rerankers and LLMs

These details have not been verified by PyPI

Project links

Project description

BTZSC banner

BTZSC

A unified benchmark for zero-shot text classification across embedding models, cross-encoders, rerankers, and LLMs.

Table of Contents

Overview
Paper and Resources
For Users
Citing
License
For Developers

Overview

BTZSC is a benchmark package for evaluating zero-shot text classification models under a unified interface. It helps you compare very different model families using the same datasets, task groupings, and metrics.

It is also the evaluation harness behind the BTZSC Hugging Face leaderboard: you can run the benchmark locally, export a leaderboard-ready JSON artifact, and submit new entries to keep the public results up to date.

The package includes:

Dataset loaders for BTZSC benchmark tasks.
A shared benchmark runner across model adapters.
Built-in adapters for embedding, NLI, reranker, and LLM-style models.
Baseline comparison utilities and a CLI for reproducible evaluation.

Paper and Resources

Paper (OpenReview): https://openreview.net/forum?id=IxMryAz2p3
PDF: https://openreview.net/pdf?id=IxMryAz2p3
Eval harness (GitHub): https://github.com/IliasAarab/btzsc
Leaderboard results dataset: https://huggingface.co/datasets/btzsc/btzsc-results
Leaderboard Space: https://huggingface.co/spaces/btzsc/btzsc-leaderboard

For Users

Installation

Install with pip:

pip install btzsc

Install with uv in an existing project:

uv add btzsc

Run as a standalone CLI tool with uvx (no project install needed):

uvx btzsc list-datasets

Quick Start (Python API)

Use this as a recommended first workflow:

Start with one or two task groups to validate your setup.
Inspect summary and per-dataset outputs.
Compare against bundled baselines.
Export a leaderboard-ready JSON artifact.

API notes:

BTZSCBenchmark(tasks=...) accepts either task groups ("sentiment", "topic", "intent", "emotion") or explicit dataset names. Leave empty to run all datasets.
evaluate(model=..., model_type=...) returns a BTZSCResults object.
model_type is required when model is a string model ID (if you pass a BaseModel instance, you can omit it). Choose from embedding, nli, reranker, llm.
Use max_samples for quick smoke tests; increase batch_size for throughput if your hardware allows it.

from btzsc import BTZSCBenchmark

benchmark = BTZSCBenchmark(tasks=["sentiment", "topic"])
results = benchmark.evaluate(
	model="intfloat/e5-base-v2",
	model_type="embedding",
	batch_size=64,
)

print(results.summary())
print(results.per_dataset())

# Compare against bundled baselines
print(results.compare_baselines(metric="f1"))

# Export leaderboard-ready JSON
results.to_json("results/embedding/e5-base-v2.json")

Quick Start (CLI)

Equivalent end-to-end CLI flow:

Note: when --model is a model ID string, you must also provide --type.

# 1) Explore benchmark metadata
btzsc list-datasets
btzsc list-model-types

# 2) Run an initial benchmark
btzsc evaluate --model intfloat/e5-base-v2 --type embedding --tasks sentiment,topic

# 3) Compare with packaged baselines
btzsc baselines --metric f1 --top 10

# 4) Export JSON for leaderboard submission
btzsc evaluate \
	--model intfloat/e5-base-v2 \
	--type embedding \
	--output-json results/embedding/e5-base-v2.json

# 5) Validate the JSON locally
btzsc validate-result results/embedding/e5-base-v2.json

Tip: run a small pilot first, then repeat with your full task scope for final reporting.

Supported Model Types

BTZSC currently supports these adapter families:

embedding
nli
reranker
llm

Pass the model type explicitly (model_type in Python or --type in CLI).

Extending with Custom Models

To make a custom model compatible with BTZSC, implement an adapter that subclasses BaseModel.

Contract requirements:

predict_scores(texts, labels, batch_size) must return a score matrix with shape (len(texts), len(labels)) where higher means more likely.
predict(texts, labels, batch_size) must return predicted label indices with shape (len(texts),).
Set model_type on your class. Use embedding, nli, reranker, or llm when applicable.

import numpy as np

from btzsc.models.base import BaseModel


class MyCustomAdapter(BaseModel):
	model_type = "embedding"

	def __init__(self, model_name: str = "my-org/my-model"):
		self.model_name = model_name

	def predict_scores(
		self,
		texts: list[str],
		labels: list[str],
		batch_size: int = 32,
	) -> np.ndarray:
		# Replace this with your real scoring implementation.
		return np.zeros((len(texts), len(labels)), dtype=float)

	def predict(
		self,
		texts: list[str],
		labels: list[str],
		batch_size: int = 32,
	) -> np.ndarray:
		scores = self.predict_scores(texts, labels, batch_size=batch_size)
		return scores.argmax(axis=1)

Run it in the benchmark:

from btzsc import BTZSCBenchmark

benchmark = BTZSCBenchmark(tasks=["sentiment", "topic"])
custom_model = MyCustomAdapter("my-org/my-model")

results = benchmark.evaluate(
	model=custom_model,
	batch_size=32,
	max_samples=200,
)

print(results.summary())
results.to_json("results/custom/my-model.json")

When you pass a BaseModel instance to evaluate(), you do not need model_type=... in the call.

Submitting to the Leaderboard

After exporting your JSON (results.to_json(...) or --output-json), first validate it:

btzsc validate-result results/<model_type>/<model-name>.json

Then publish it to the results dataset repo:

https://huggingface.co/datasets/btzsc/btzsc-results

Required destination path format:

results/<model_type>/<model-name>.json

Example:

results/embedding/e5-base-v2.json

You can submit using any of these workflows:

Web UI (no clone required)
- Open the results repo page: https://huggingface.co/datasets/btzsc/btzsc-results
- Go to Files and versions and upload your JSON at the required path.
- If you do not have write access, fork the repo and open a PR.
Git workflow (clone/fork + push)
- Clone (or fork) btzsc/btzsc-results, add your JSON at the required path, then push.
- If you pushed to a fork, open a PR to btzsc/btzsc-results.

git lfs install
git clone https://huggingface.co/datasets/btzsc/btzsc-results
cd btzsc-results

# Copy your exported JSON into the correct folder
mkdir -p results/reranker
cp /path/to/my_result.json results/reranker/my-model.json

git add results/reranker/my-model.json
git commit -m "Add BTZSC results for my-model"
git push

API workflow (huggingface_hub, PR-based)
- Authenticate first (huggingface-cli login or HF_TOKEN).
- create_pr=True creates a PR branch instead of pushing directly to main.

from huggingface_hub import HfApi

api = HfApi()
api.upload_file(
    path_or_fileobj="results/reranker/my-model.json",
    path_in_repo="results/reranker/my-model.json",
    repo_id="btzsc/btzsc-results",
    repo_type="dataset",
    commit_message="Add BTZSC results for my-model",
    create_pr=True,
)

The leaderboard Space reads from this results dataset and updates as new valid entries are added.

For full submission requirements, see hf/results_repo/SUBMISSION.md.

Benchmark Protocol

BTZSC follows a strict zero-shot protocol:

22 English single-label datasets
4 task families: sentiment, topic, intent, emotion
No BTZSC-label training or tuning on evaluation datasets
Primary leaderboard metric: macro-F1
Secondary metrics: accuracy, macro-precision, macro-recall

The leaderboard is continuously updated as new submissions are added.

Dataset

BTZSC benchmark data is available on Hugging Face:

https://huggingface.co/datasets/btzsc/btzsc

To load the raw paired-format rows with datasets:

from datasets import get_dataset_config_names, load_dataset

repo_id = "btzsc/btzsc"

# Each dataset is a config name (e.g. "agnews", "imdb", ...)
print(get_dataset_config_names(repo_id)[:5])

# Load one dataset's test split
ds = load_dataset(repo_id, "agnews", split="test")
print(ds.column_names)
print(ds[0])

The dataset stores rows as (text, hypothesis, labels) where labels is binary entailment. The package reconstructs grouped multiclass samples internally for evaluation.

Citing

@inproceedings{aarab2026btzsc,
	title     = {BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, and Rerankers},
	author    = {Aarab, Ilias},
	booktitle = {International Conference on Learning Representations (ICLR) 2026},
	year      = {2026},
	note      = {OpenReview PDF: https://openreview.net/pdf?id=IxMryAz2p3},
	url       = {https://openreview.net/forum?id=IxMryAz2p3}
}

License

Released under the MIT license.

For Developers

Developer Setup

git clone https://github.com/IliasAarab/btzsc.git
cd btzsc
uv sync --dev

Project Structure

High-level layout:

src/btzsc/benchmark.py: benchmark orchestration and result objects.
src/btzsc/data.py: dataset loading and task grouping.
src/btzsc/metrics.py: metric computation and summaries.
src/btzsc/baselines.py: baseline loading and comparison table creation.
src/btzsc/models/: model adapters (embedding, nli, reranker, llm).
src/btzsc/cli.py: command-line interface.

Quality Checks

Run formatting, linting, and typing checks before opening a PR:

uv run ruff format
uv run ruff check
uv run pyright

Packaging and Release

Build locally:

uv build

Release process:

Bump version in pyproject.toml.
Commit and push to main.
Create and push a version tag, for example:

git tag v0.1.1
git push origin v0.1.1

GitHub Actions builds and publishes tagged releases to PyPI via trusted publishing.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Mar 2, 2026

0.1.1

Mar 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

btzsc-0.1.2.tar.gz (57.5 kB view details)

Uploaded Mar 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

btzsc-0.1.2-py3-none-any.whl (66.6 kB view details)

Uploaded Mar 2, 2026 Python 3

File details

Details for the file btzsc-0.1.2.tar.gz.

File metadata

Download URL: btzsc-0.1.2.tar.gz
Upload date: Mar 2, 2026
Size: 57.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for btzsc-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`0e63b011f7dffe6a95836a3e6a7a5253aad76637f785d1e4e1c32f45c3cf06d1`
MD5	`194a9e191a9002fed6d92da30b1a2a67`
BLAKE2b-256	`0fc44ffd61d679c4a8286cdc6df38d6ce1feece8dd7fabccddcf113af197d920`

See more details on using hashes here.

File details

Details for the file btzsc-0.1.2-py3-none-any.whl.

File metadata

Download URL: btzsc-0.1.2-py3-none-any.whl
Upload date: Mar 2, 2026
Size: 66.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for btzsc-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e91c7ba438a4913dfd3efc5d0d9d469025136533719f962b7a180c0e8136a25a`
MD5	`dc4d2966767944c0ce587c6f53591b1e`
BLAKE2b-256	`73df96dde5fdaa5b797e6db33a4d1d68c7242a176f2519b805cf53870d7dfc41`

See more details on using hashes here.

btzsc 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

BTZSC

Overview

Paper and Resources

For Users

Installation

Quick Start (Python API)

Quick Start (CLI)

Supported Model Types

Extending with Custom Models

Submitting to the Leaderboard

Benchmark Protocol

Dataset

Citing

License

For Developers

Developer Setup

Project Structure

Quality Checks

Packaging and Release

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes