Skip to main content

PyTerrier Integration for Generative Rankers

Project description

🤖 PyTerrier Generative

Python License PyTerrier

Generative listwise ranking with PyTerrier. PyTerrier Generative supports the use of generative rankers and list-wise algorithms.

📘 Overview

PyTerrier Generative provides:

  • Pre-configured rankers: RankZephyr, RankVicuna, RankGPT, LiT5.
  • Flexible algorithms: Sliding window, single window, top-down partitioning, setwise.
  • Efficient batching: Automatic batching of ranking windows.
  • Customizable prompts: Jinja2 templates or Python callables.
  • Multiple backends: vLLM, HuggingFace Transformers, OpenAI.

🚀 Getting Started

Install from PyPI

pip install pyterrier-generative

Install from source

git clone https://github.com/Parry-Parry/pyterrier-generative.git
cd pyterrier-generative
pip install -e .

Quick Example

import pyterrier as pt
from pyterrier_generative import RankZephyr

pt.init()

# Create ranker
ranker = RankZephyr.v1(window_size=20)

# Use in pipeline
pipeline = pt.BatchRetrieve(index) % 100 >> ranker
results = pipeline.search("machine learning")

🎯 Pre-configured Rankers

RankZephyr

from pyterrier_generative import RankZephyr

# Use default variant
ranker = RankZephyr(window_size=20)

# Or specify algorithm and parameters
ranker = RankZephyr(
    algorithm=Algorithm.SLIDING_WINDOW,
    window_size=20,
    stride=10
)

Variants: v1castorini/rank_zephyr_7b_v1_full Backend: vLLM (default), HuggingFace

RankVicuna

from pyterrier_generative import RankVicuna

ranker = RankVicuna(window_size=20)

Variants: v1castorini/rank_vicuna_7b_v1 Backend: vLLM (default), HuggingFace

RankGPT

from pyterrier_generative import RankGPT

# Use GPT-3.5 (default)
ranker = RankGPT.gpt35(api_key="sk-...")

# Or GPT-4
ranker = RankGPT.gpt4(api_key="sk-...")

Variants: gpt35, gpt35_16k, gpt4, gpt4_turbo Backend: OpenAI

LiT5

from pyterrier_generative import LiT5

ranker = LiT5(
    model_path='castorini/LiT5-Distill-large',
    window_size=20
)

Architecture: Fusion-in-Decoder (FiD) Backend: PyTerrier-T5

⚙️ Custom Rankers

Build your own ranker with custom prompts and backends, you can find more details on backends in PyTerrier RAG:

from pyterrier_generative import GenerativeRanker, Algorithm
from pyterrier_rag.backend.vllm import VLLMBackend

# Create custom backend
backend = VLLMBackend(
    model_id="meta-llama/Llama-3-8B-Instruct",
    max_new_tokens=100
)

# Custom Jinja2 prompt
prompt = """
Rank these passages for: {{ query }}

{% for p in passages %}
[{{ loop.index }}] {{ p }}
{% endfor %}

Ranking:
"""

# Create ranker
ranker = GenerativeRanker(
    model=backend,
    prompt=prompt,
    algorithm=Algorithm.SLIDING_WINDOW,
    window_size=20,
    stride=10
)

🔄 Ranking Algorithms

Sliding Window

Processes documents in overlapping windows, refining rankings iteratively.

ranker = RankZephyr(
    algorithm=Algorithm.SLIDING_WINDOW,
    window_size=20,   # Documents per window
    stride=10         # Window overlap
)

Best for: Exhaustive Search Complexity: O(n/stride) windows

Top-Down Partitioning

Recursively partitions documents around pivot elements.

ranker = RankZephyr(
    algorithm=Algorithm.TDPART,
    window_size=20,
    buffer=20,
    cutoff=10,
    max_iters=100
)

Best for: Efficient Top-k Search Complexity: O(log n) windows (best case)

Single Window

Ranks top-k documents in one pass.

ranker = RankZephyr(
    algorithm=Algorithm.SINGLE_WINDOW,
    window_size=20    # Top-k to rank
)

Best for: Small candidate sets, speed-critical applications Complexity: O(1) window

Setwise

Pairwise comparison using heapsort.

ranker = RankZephyr(
    algorithm=Algorithm.SETWISE,
    k=10              # Top-k to extract
)

Best for: High-precision top-k ranking Complexity: O(n log k) comparisons

Backend Selection

vLLM (fastest for local models):

ranker = RankZephyr(backend='vllm')  # Default

HuggingFace (maximum compatibility):

ranker = RankZephyr(backend='hf')

OpenAI (no local GPU needed):

ranker = RankGPT.gpt35(api_key="...")

🔌 PyTerrier Integration

Basic Re-ranking

import pyterrier as pt
from pyterrier_generative import RankZephyr

bm25 = pt.BatchRetrieve(index)
ranker = RankZephyr(window_size=20)

pipeline = bm25 % 100 >> ranker
results = pipeline.search("information retrieval")

Multi-stage Pipeline

from pyterrier_generative import RankGPT

# Three-stage ranking: BM25 → Dense → Generative
pipeline = (
    bm25 % 1000
    >> dense_ranker % 100
    >> RankGPT.gpt35(api_key="...")
)

Comparative Evaluation

from pyterrier_generative import RankZephyr, RankVicuna

rankers = {
    "BM25": bm25,
    "BM25 >> RankZephyr": bm25 % 100 >> RankZephyr(),
    "BM25 >> RankVicuna": bm25 % 100 >> RankVicuna(),
}

pt.Experiment(rankers, topics, qrels, eval_metrics=["map", "ndcg_cut_10"])

🎨 Advanced Features

System Prompts (for chat models)

from pyterrier_generative import GenerativeRanker

ranker = GenerativeRanker(
    model=backend,
    system_prompt="You are an expert search engine. Rank documents by relevance.",
    prompt="Query: {{ query }}\n...",
    algorithm=Algorithm.SLIDING_WINDOW
)

Custom Generation Parameters

from pyterrier_rag.backend.vllm import VLLMBackend

backend = VLLMBackend(
    model_id="castorini/rank_zephyr_7b_v1_full",
    max_new_tokens=100,
    generation_args={
        'temperature': 0.0,
        'top_p': 1.0,
        'max_tokens': 100
    }
)

📊 How It Works

Listwise Ranking

Traditional pointwise/pairwise rankers score documents independently or in pairs. Listwise ranking considers all documents together:

Input:  Query + [Doc1, Doc2, ..., DocN]
Model:  "Rank these documents: 3, 1, 5, 2, 4"
Output: Reordered documents by LLM preference

Sliding Window Algorithm

For large document sets, sliding windows enable manageable ranking:

Documents: [D1, D2, D3, ..., D100]

Window 1: [D1...D20]  → Rank → [D5, D2, D8, ...]
Window 2: [D11...D30] → Rank → [D15, D12, D18, ...]
Window 3: [D21...D40] → Rank → [D25, D22, D28, ...]
...

Final: Merge rankings → [D5, D15, D25, D2, ...]

🔬 Research

If you use PyTerrier Generative in your research, please cite:

@software{pyterrier_generative,
  title = {PyTerrier Generative: Listwise Ranking with Large Language Models},
  author = {Parry, Andrew},
  year = {2025},
  url = {https://github.com/Parry-Parry/pyterrier-generative}
}

👥 Authors

🤝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new features
  4. Ensure all tests pass (pytest)
  5. Submit a pull request

🧾 Version History

Version Date Changes
0.1 2025-01-14 Initial release with batching and 4 algorithms

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyterrier_generative-0.1.0.tar.gz (39.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyterrier_generative-0.1.0-py3-none-any.whl (29.1 kB view details)

Uploaded Python 3

File details

Details for the file pyterrier_generative-0.1.0.tar.gz.

File metadata

  • Download URL: pyterrier_generative-0.1.0.tar.gz
  • Upload date:
  • Size: 39.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyterrier_generative-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d4cce9eb370108f404425d11b686d23d9305a6948f5421dec10f015d70091c9c
MD5 b550b8ce806f35915003d8bea3cbe0c6
BLAKE2b-256 92b29953bf7923d6c683d64f43059fc0c9cc69078ffb1c4677e4748ae362fbd5

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyterrier_generative-0.1.0.tar.gz:

Publisher: deploy.yml on Parry-Parry/pyterrier-generative

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyterrier_generative-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pyterrier_generative-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9514fc3e72408f6ff511306756a084f0a6a1aace858871b51aa55749728a61d7
MD5 da029b96fd42e200e744414c33a47b83
BLAKE2b-256 d55da65e130dd5b6ffc7aba638e33f0f806b65738a93938a0b6e9ce976975ba0

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyterrier_generative-0.1.0-py3-none-any.whl:

Publisher: deploy.yml on Parry-Parry/pyterrier-generative

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page