PyTerrier Integration for Generative Rankers
Project description
🤖 PyTerrier Generative
Generative listwise ranking with PyTerrier. PyTerrier Generative supports the use of generative rankers and list-wise algorithms.
📘 Overview
PyTerrier Generative provides:
- Pre-configured rankers: RankZephyr, RankVicuna, RankGPT, LiT5.
- Flexible algorithms: Sliding window, single window, top-down partitioning, setwise.
- Efficient batching: Automatic batching of ranking windows.
- Customizable prompts: Jinja2 templates or Python callables.
- Multiple backends: vLLM, HuggingFace Transformers, OpenAI.
🚀 Getting Started
Install from PyPI
pip install pyterrier-generative
Install from source
git clone https://github.com/Parry-Parry/pyterrier-generative.git
cd pyterrier-generative
pip install -e .
Quick Example
import pyterrier as pt
from pyterrier_generative import RankZephyr
pt.init()
# Create ranker
ranker = RankZephyr.v1(window_size=20)
# Use in pipeline
pipeline = pt.BatchRetrieve(index) % 100 >> ranker
results = pipeline.search("machine learning")
🎯 Pre-configured Rankers
RankZephyr
from pyterrier_generative import RankZephyr
# Use default variant
ranker = RankZephyr(window_size=20)
# Or specify algorithm and parameters
ranker = RankZephyr(
algorithm=Algorithm.SLIDING_WINDOW,
window_size=20,
stride=10
)
Variants: v1 → castorini/rank_zephyr_7b_v1_full
Backend: vLLM (default), HuggingFace
RankVicuna
from pyterrier_generative import RankVicuna
ranker = RankVicuna(window_size=20)
Variants: v1 → castorini/rank_vicuna_7b_v1
Backend: vLLM (default), HuggingFace
RankGPT
from pyterrier_generative import RankGPT
# Use GPT-3.5 (default)
ranker = RankGPT.gpt35(api_key="sk-...")
# Or GPT-4
ranker = RankGPT.gpt4(api_key="sk-...")
Variants: gpt35, gpt35_16k, gpt4, gpt4_turbo
Backend: OpenAI
LiT5
from pyterrier_generative import LiT5
ranker = LiT5(
model_path='castorini/LiT5-Distill-large',
window_size=20
)
Architecture: Fusion-in-Decoder (FiD) Backend: PyTerrier-T5
⚙️ Custom Rankers
Build your own ranker with custom prompts and backends, you can find more details on backends in PyTerrier RAG:
from pyterrier_generative import GenerativeRanker, Algorithm
from pyterrier_rag.backend.vllm import VLLMBackend
# Create custom backend
backend = VLLMBackend(
model_id="meta-llama/Llama-3-8B-Instruct",
max_new_tokens=100
)
# Custom Jinja2 prompt
prompt = """
Rank these passages for: {{ query }}
{% for p in passages %}
[{{ loop.index }}] {{ p }}
{% endfor %}
Ranking:
"""
# Create ranker
ranker = GenerativeRanker(
model=backend,
prompt=prompt,
algorithm=Algorithm.SLIDING_WINDOW,
window_size=20,
stride=10
)
🔄 Ranking Algorithms
Sliding Window
Processes documents in overlapping windows, refining rankings iteratively.
ranker = RankZephyr(
algorithm=Algorithm.SLIDING_WINDOW,
window_size=20, # Documents per window
stride=10 # Window overlap
)
Best for: Exhaustive Search Complexity: O(n/stride) windows
Top-Down Partitioning
Recursively partitions documents around pivot elements.
ranker = RankZephyr(
algorithm=Algorithm.TDPART,
window_size=20,
buffer=20,
cutoff=10,
max_iters=100
)
Best for: Efficient Top-k Search Complexity: O(log n) windows (best case)
Single Window
Ranks top-k documents in one pass.
ranker = RankZephyr(
algorithm=Algorithm.SINGLE_WINDOW,
window_size=20 # Top-k to rank
)
Best for: Small candidate sets, speed-critical applications Complexity: O(1) window
Setwise
Pairwise comparison using heapsort.
ranker = RankZephyr(
algorithm=Algorithm.SETWISE,
k=10 # Top-k to extract
)
Best for: High-precision top-k ranking Complexity: O(n log k) comparisons
Backend Selection
vLLM (fastest for local models):
ranker = RankZephyr(backend='vllm') # Default
HuggingFace (maximum compatibility):
ranker = RankZephyr(backend='hf')
OpenAI (no local GPU needed):
ranker = RankGPT.gpt35(api_key="...")
🔌 PyTerrier Integration
Basic Re-ranking
import pyterrier as pt
from pyterrier_generative import RankZephyr
bm25 = pt.BatchRetrieve(index)
ranker = RankZephyr(window_size=20)
pipeline = bm25 % 100 >> ranker
results = pipeline.search("information retrieval")
Multi-stage Pipeline
from pyterrier_generative import RankGPT
# Three-stage ranking: BM25 → Dense → Generative
pipeline = (
bm25 % 1000
>> dense_ranker % 100
>> RankGPT.gpt35(api_key="...")
)
Comparative Evaluation
from pyterrier_generative import RankZephyr, RankVicuna
rankers = {
"BM25": bm25,
"BM25 >> RankZephyr": bm25 % 100 >> RankZephyr(),
"BM25 >> RankVicuna": bm25 % 100 >> RankVicuna(),
}
pt.Experiment(rankers, topics, qrels, eval_metrics=["map", "ndcg_cut_10"])
🎨 Advanced Features
System Prompts (for chat models)
from pyterrier_generative import GenerativeRanker
ranker = GenerativeRanker(
model=backend,
system_prompt="You are an expert search engine. Rank documents by relevance.",
prompt="Query: {{ query }}\n...",
algorithm=Algorithm.SLIDING_WINDOW
)
Custom Generation Parameters
from pyterrier_rag.backend.vllm import VLLMBackend
backend = VLLMBackend(
model_id="castorini/rank_zephyr_7b_v1_full",
max_new_tokens=100,
generation_args={
'temperature': 0.0,
'top_p': 1.0,
'max_tokens': 100
}
)
📊 How It Works
Listwise Ranking
Traditional pointwise/pairwise rankers score documents independently or in pairs. Listwise ranking considers all documents together:
Input: Query + [Doc1, Doc2, ..., DocN]
Model: "Rank these documents: 3, 1, 5, 2, 4"
Output: Reordered documents by LLM preference
Sliding Window Algorithm
For large document sets, sliding windows enable manageable ranking:
Documents: [D1, D2, D3, ..., D100]
Window 1: [D1...D20] → Rank → [D5, D2, D8, ...]
Window 2: [D11...D30] → Rank → [D15, D12, D18, ...]
Window 3: [D21...D40] → Rank → [D25, D22, D28, ...]
...
Final: Merge rankings → [D5, D15, D25, D2, ...]
🔬 Research
If you use PyTerrier Generative in your research, please cite:
@software{pyterrier_generative,
title = {PyTerrier Generative: Listwise Ranking with Large Language Models},
author = {Parry, Andrew},
year = {2025},
url = {https://github.com/Parry-Parry/pyterrier-generative}
}
👥 Authors
- Andrew Parry - University of Glasgow
🤝 Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new features
- Ensure all tests pass (
pytest) - Submit a pull request
🧾 Version History
| Version | Date | Changes |
|---|---|---|
| 0.1 | 2025-01-14 | Initial release with batching and 4 algorithms |
📄 License
This project is licensed under the MIT License — see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyterrier_generative-0.1.1.tar.gz.
File metadata
- Download URL: pyterrier_generative-0.1.1.tar.gz
- Upload date:
- Size: 40.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59fa9504f25cd6ea4ebfb7cf592b0787c770fe461211cf7a9f442744dc159ec2
|
|
| MD5 |
ebdc22e897e66aa9ef923d21ded8480c
|
|
| BLAKE2b-256 |
b2d2742a557a9396ae78f10e9df2a011e9c223c706cd04f386e96eb1a5157d85
|
Provenance
The following attestation bundles were made for pyterrier_generative-0.1.1.tar.gz:
Publisher:
deploy.yml on Parry-Parry/pyterrier-generative
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyterrier_generative-0.1.1.tar.gz -
Subject digest:
59fa9504f25cd6ea4ebfb7cf592b0787c770fe461211cf7a9f442744dc159ec2 - Sigstore transparency entry: 768724476
- Sigstore integration time:
-
Permalink:
Parry-Parry/pyterrier-generative@5e77e0495f7b2f6765cadc37bb5fa1296320e720 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/Parry-Parry
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
deploy.yml@5e77e0495f7b2f6765cadc37bb5fa1296320e720 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pyterrier_generative-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pyterrier_generative-0.1.1-py3-none-any.whl
- Upload date:
- Size: 29.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1062a9da55bbc365538a8711ec87dc68cb3b95fc7c12665c4e505b00f165ff2b
|
|
| MD5 |
f81dcb19a501fb3f000a033ad740edb9
|
|
| BLAKE2b-256 |
423135a6c41888487e696cc47092684ed0364d642b48a51f506f6e3f625e0a06
|
Provenance
The following attestation bundles were made for pyterrier_generative-0.1.1-py3-none-any.whl:
Publisher:
deploy.yml on Parry-Parry/pyterrier-generative
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyterrier_generative-0.1.1-py3-none-any.whl -
Subject digest:
1062a9da55bbc365538a8711ec87dc68cb3b95fc7c12665c4e505b00f165ff2b - Sigstore transparency entry: 768724480
- Sigstore integration time:
-
Permalink:
Parry-Parry/pyterrier-generative@5e77e0495f7b2f6765cadc37bb5fa1296320e720 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/Parry-Parry
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
deploy.yml@5e77e0495f7b2f6765cadc37bb5fa1296320e720 -
Trigger Event:
release
-
Statement type: