Tournament graphs for Pareto-optimal zero-shot LLM reranking
Project description
BlitzRank
Principled Zero-shot Ranking Agents with Tournament Graphs
BlitzRank uses tournament graphs to extract maximal information from each LLM call, a principled framework achieving Pareto optimality across 14 benchmarks and 5 LLMs with 25–40% fewer queries.
Algorithm visualization on the 25 horses puzzle: find the 3 fastest horses from 25, racing 5 at a time.
BlitzRank converges in 7 rounds vs Sliding Window's 11 rounds.
Installation
uv pip install blitzrank
From source:
git clone https://github.com/ContextualAI/BlitzRank.git
cd BlitzRank
uv pip install -e .
Install with additional dependencies for baselines (AcuRank, TourRank):
uv pip install "blitzrank[all]"
# or from source:
uv pip install -e ".[all]"
Quick Start
from blitzrank import BlitzRank, rank
ranker = BlitzRank()
query = "capital of France"
docs = [
"Berlin is the capital of Germany.",
"Paris is the capital of France.",
"Tokyo is the capital of Japan.",
]
# Any LiteLLM-compatible model works — just set the appropriate API keys as env variables
indices = rank(ranker, model="openai/gpt-4.1", query=query, docs=docs, topk=2) # [1, 0]
top_docs = [docs[i] for i in indices]
Evaluate on a Benchmark
from blitzrank import BlitzRank, evaluate
ranker = BlitzRank()
rankings, metrics = evaluate(ranker, dataset="msmarco/dl19/bm25", model="openai/gpt-4.1")
print(metrics) # {"ndcg@10": 0.72, "map@10": 0.51}
print(rankings) # [{"query": "...", "ranking": [3, 0, 7, ...]}, ...]
Dataset names follow the format collection/split/retriever.
| Category | Datasets |
|---|---|
| MSMARCO | msmarco/dl19/bm25, msmarco/dl20/bm25, msmarco/dl21/bm25, msmarco/dl22/bm25, msmarco/dl23/bm25, msmarco/dlhard/bm25 |
| BEIR | beir/nfcorpus/bm25, beir/fiqa/bm25, beir/trec-covid/bm25, beir/nq/bm25, beir/hotpotqa/bm25, beir/scifact/bm25, beir/arguana/bm25, beir/quora/bm25, beir/scidocs/bm25, beir/fever/bm25, beir/climate-fever/bm25, beir/dbpedia-entity/bm25, beir/robust04/bm25, beir/signal1m/bm25, beir/trec-news/bm25, beir/webis-touche2020/bm25 |
| BRIGHT | bright/aops/infx, bright/biology/infx, bright/earth_science/infx, bright/economics/infx, bright/leetcode/infx, bright/pony/infx, bright/psychology/infx, bright/robotics/infx, bright/stackoverflow/infx, bright/sustainable_living/infx, bright/theoremqa_questions/infx, bright/theoremqa_theorems/infx |
Baselines
All methods share the same interface. Create a ranker (with optional parameters), pass the model to rank/evaluate.
from blitzrank import BlitzRank, SlidingWindow, SetWise, PairWise, TourRank, AcuRank, rank
query = "capital of France"
docs = ["Berlin is in Germany", "Paris is in France", "Tokyo is in Japan"]
for Method in [BlitzRank, SlidingWindow, SetWise, PairWise, TourRank, AcuRank]:
indices = rank(Method(), model="openai/gpt-4.1", query=query, docs=docs, topk=2)
Available methods: BlitzRank, SlidingWindow, SetWise, PairWise, TourRank, AcuRank
Reproducing Paper Results
Run all methods across all 14 datasets and 5 LLMs from the paper (Table 3):
from blitzrank import BlitzRank, SlidingWindow, SetWise, PairWise, TourRank, AcuRank, evaluate
# 6 TREC-DL + 8 BEIR = 14 benchmarks
DATASETS = [
# TREC-DL
"msmarco/dl19/bm25", "msmarco/dl20/bm25", "msmarco/dl21/bm25",
"msmarco/dl22/bm25", "msmarco/dl23/bm25", "msmarco/dlhard/bm25",
# BEIR
"beir/trec-covid/bm25", "beir/nfcorpus/bm25", "beir/signal1m/bm25",
"beir/trec-news/bm25", "beir/robust04/bm25", "beir/webis-touche2020/bm25",
"beir/dbpedia-entity/bm25", "beir/scifact/bm25",
]
MODELS = [
"openai/gpt-4.1",
"vertex_ai/gemini-3-flash-preview",
"openrouter/deepseek/deepseek-v3.2",
"openrouter/qwen/qwen3-235b-a22b-2507",
"openrouter/z-ai/glm-4.7",
]
RANKERS = {
"Blitz-k20": BlitzRank(window_size=20),
"Blitz-k10": BlitzRank(window_size=10),
"SW": SlidingWindow(),
"SW-R2": SlidingWindow(num_rounds=2),
"Setwise": SetWise(),
"Pairwise": PairWise(),
"TourRank": TourRank(),
"TourRank-R2": TourRank(num_rounds=2),
"AcuRank": AcuRank(),
"AcuRank-H": AcuRank(tol=1e-4),
}
for dataset in DATASETS:
for model in MODELS:
for name, ranker in RANKERS.items():
rankings, metrics = evaluate(ranker, dataset=dataset, model=model)
print(f"{name:>12} | {dataset:<28} | {model:<40} | nDCG@10={metrics['ndcg@10']:.3f}")
📖 Custom datasets and methods →
Acknowledgements
This project builds upon the following open-source repositories: RankGPT, LLM-Rankers, AcuRank, Pyserini, and LiteLLM.
Citation
@article{blitzrank2026,
title={BlitzRank: Principled Zero-shot Ranking Agents with Tournament Graphs},
author={Agrawal, Sheshansh and Nguyen, Thien Hang and Kiela, Douwe},
journal={arXiv preprint arXiv:2602.05448},
year={2026}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file blitzrank-0.1.0.tar.gz.
File metadata
- Download URL: blitzrank-0.1.0.tar.gz
- Upload date:
- Size: 51.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dba48986fa9d4938bd16a9fca72ad95b784e7f6071ba4c472c7740d8610a06e5
|
|
| MD5 |
9f0510e014521c869efcb7953adb71b7
|
|
| BLAKE2b-256 |
bf0728ebd9a89ef9b18db84f4de222aa258dd2c6aa4e9a6ed81159ef5bcf776d
|
File details
Details for the file blitzrank-0.1.0-py3-none-any.whl.
File metadata
- Download URL: blitzrank-0.1.0-py3-none-any.whl
- Upload date:
- Size: 62.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
432af1897b084d7001d7f2bd892f0935be121b8f2705659f2ca6f9f4fdc82a1f
|
|
| MD5 |
47dfed59bfd2a5bddd01fcb523e80793
|
|
| BLAKE2b-256 |
8b234bf0730ea4c254e572a7055097ddf33f65225a4d38cce801c733ae4e09bc
|