PyTerrier Integration for Generative Rankers

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ParryParry

These details have not been verified by PyPI

Operating System
- OS Independent
Programming Language
- Python
Topic
- Text Processing
- Text Processing :: Indexing

Project description

🤖 PyTerrier Generative

Generative listwise ranking with PyTerrier. PyTerrier Generative supports the use of generative rankers and list-wise algorithms.

📘 Overview

PyTerrier Generative provides:

Pre-configured rankers: RankZephyr, RankVicuna, RankGPT, LiT5.
Flexible algorithms: Sliding window, single window, top-down partitioning, setwise.
Efficient batching: Automatic batching of ranking windows.
Customizable prompts: Jinja2 templates or Python callables.
Multiple backends: vLLM, HuggingFace Transformers, OpenAI.

🚀 Getting Started

Install from PyPI

pip install pyterrier-generative

Install from source

git clone https://github.com/Parry-Parry/pyterrier-generative.git
cd pyterrier-generative
pip install -e .

Quick Example

import pyterrier as pt
from pyterrier_generative import RankZephyr

pt.init()

# Create ranker
ranker = RankZephyr.v1(window_size=20)

# Use in pipeline
pipeline = pt.BatchRetrieve(index) % 100 >> ranker
results = pipeline.search("machine learning")

🎯 Pre-configured Rankers

RankZephyr

from pyterrier_generative import RankZephyr

# Use default variant
ranker = RankZephyr(window_size=20)

# Or specify algorithm and parameters
ranker = RankZephyr(
    algorithm=Algorithm.SLIDING_WINDOW,
    window_size=20,
    stride=10
)

Variants: v1 → castorini/rank_zephyr_7b_v1_full Backend: vLLM (default), HuggingFace

RankVicuna

from pyterrier_generative import RankVicuna

ranker = RankVicuna(window_size=20)

Variants: v1 → castorini/rank_vicuna_7b_v1 Backend: vLLM (default), HuggingFace

RankGPT

from pyterrier_generative import RankGPT

# Use GPT-3.5 (default)
ranker = RankGPT.gpt35(api_key="sk-...")

# Or GPT-4
ranker = RankGPT.gpt4(api_key="sk-...")

Variants: gpt35, gpt35_16k, gpt4, gpt4_turbo Backend: OpenAI

LiT5

from pyterrier_generative import LiT5

ranker = LiT5(
    model_path='castorini/LiT5-Distill-large',
    window_size=20
)

Architecture: Fusion-in-Decoder (FiD) Backend: PyTerrier-T5

⚙️ Custom Rankers

Build your own ranker with custom prompts and backends, you can find more details on backends in PyTerrier RAG:

from pyterrier_generative import GenerativeRanker, Algorithm
from pyterrier_rag.backend.vllm import VLLMBackend

# Create custom backend
backend = VLLMBackend(
    model_id="meta-llama/Llama-3-8B-Instruct",
    max_new_tokens=100
)

# Custom Jinja2 prompt
prompt = """
Rank these passages for: {{ query }}

{% for p in passages %}
[{{ loop.index }}] {{ p }}
{% endfor %}

Ranking:
"""

# Create ranker
ranker = GenerativeRanker(
    model=backend,
    prompt=prompt,
    algorithm=Algorithm.SLIDING_WINDOW,
    window_size=20,
    stride=10
)

🔄 Ranking Algorithms

Sliding Window

Processes documents in overlapping windows, refining rankings iteratively.

ranker = RankZephyr(
    algorithm=Algorithm.SLIDING_WINDOW,
    window_size=20,   # Documents per window
    stride=10         # Window overlap
)

Best for: Exhaustive Search Complexity: O(n/stride) windows

Top-Down Partitioning

Recursively partitions documents around pivot elements.

ranker = RankZephyr(
    algorithm=Algorithm.TDPART,
    window_size=20,
    buffer=20,
    cutoff=10,
    max_iters=100
)

Best for: Efficient Top-k Search Complexity: O(log n) windows (best case)

Single Window

Ranks top-k documents in one pass.

ranker = RankZephyr(
    algorithm=Algorithm.SINGLE_WINDOW,
    window_size=20    # Top-k to rank
)

Best for: Small candidate sets, speed-critical applications Complexity: O(1) window

Setwise

Pairwise comparison using heapsort.

ranker = RankZephyr(
    algorithm=Algorithm.SETWISE,
    k=10              # Top-k to extract
)

Best for: High-precision top-k ranking Complexity: O(n log k) comparisons

Backend Selection

vLLM (fastest for local models):

ranker = RankZephyr(backend='vllm')  # Default

HuggingFace (maximum compatibility):

ranker = RankZephyr(backend='hf')

OpenAI (no local GPU needed):

ranker = RankGPT.gpt35(api_key="...")

🔌 PyTerrier Integration

Basic Re-ranking

import pyterrier as pt
from pyterrier_generative import RankZephyr

bm25 = pt.BatchRetrieve(index)
ranker = RankZephyr(window_size=20)

pipeline = bm25 % 100 >> ranker
results = pipeline.search("information retrieval")

Multi-stage Pipeline

from pyterrier_generative import RankGPT

# Three-stage ranking: BM25 → Dense → Generative
pipeline = (
    bm25 % 1000
    >> dense_ranker % 100
    >> RankGPT.gpt35(api_key="...")
)

Comparative Evaluation

from pyterrier_generative import RankZephyr, RankVicuna

rankers = {
    "BM25": bm25,
    "BM25 >> RankZephyr": bm25 % 100 >> RankZephyr(),
    "BM25 >> RankVicuna": bm25 % 100 >> RankVicuna(),
}

pt.Experiment(rankers, topics, qrels, eval_metrics=["map", "ndcg_cut_10"])

🎨 Advanced Features

System Prompts (for chat models)

from pyterrier_generative import GenerativeRanker

ranker = GenerativeRanker(
    model=backend,
    system_prompt="You are an expert search engine. Rank documents by relevance.",
    prompt="Query: {{ query }}\n...",
    algorithm=Algorithm.SLIDING_WINDOW
)

Custom Generation Parameters

from pyterrier_rag.backend.vllm import VLLMBackend

backend = VLLMBackend(
    model_id="castorini/rank_zephyr_7b_v1_full",
    max_new_tokens=100,
    generation_args={
        'temperature': 0.0,
        'top_p': 1.0,
        'max_tokens': 100
    }
)

📊 How It Works

Listwise Ranking

Traditional pointwise/pairwise rankers score documents independently or in pairs. Listwise ranking considers all documents together:

Input:  Query + [Doc1, Doc2, ..., DocN]
Model:  "Rank these documents: 3, 1, 5, 2, 4"
Output: Reordered documents by LLM preference

Sliding Window Algorithm

For large document sets, sliding windows enable manageable ranking:

Documents: [D1, D2, D3, ..., D100]

Window 1: [D1...D20]  → Rank → [D5, D2, D8, ...]
Window 2: [D11...D30] → Rank → [D15, D12, D18, ...]
Window 3: [D21...D40] → Rank → [D25, D22, D28, ...]
...

Final: Merge rankings → [D5, D15, D25, D2, ...]

🔬 Research

If you use PyTerrier Generative in your research, please cite:

@software{pyterrier_generative,
  title = {PyTerrier Generative: Listwise Ranking with Large Language Models},
  author = {Parry, Andrew},
  year = {2025},
  url = {https://github.com/Parry-Parry/pyterrier-generative}
}

👥 Authors

Andrew Parry - University of Glasgow

🤝 Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new features
Ensure all tests pass (pytest)
Submit a pull request

🧾 Version History

Version	Date	Changes
0.1	2025-01-14	Initial release with batching and 4 algorithms

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ParryParry

These details have not been verified by PyPI

Operating System
- OS Independent
Programming Language
- Python
Topic
- Text Processing
- Text Processing :: Indexing

Release history Release notifications | RSS feed

0.1.3

Dec 22, 2025

0.1.2

Dec 17, 2025

0.1.1

Dec 17, 2025

This version

0.1.0

Dec 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyterrier_generative-0.1.0.tar.gz (39.8 kB view details)

Uploaded Dec 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyterrier_generative-0.1.0-py3-none-any.whl (29.1 kB view details)

Uploaded Dec 15, 2025 Python 3

File details

Details for the file pyterrier_generative-0.1.0.tar.gz.

File metadata

Download URL: pyterrier_generative-0.1.0.tar.gz
Upload date: Dec 15, 2025
Size: 39.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyterrier_generative-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d4cce9eb370108f404425d11b686d23d9305a6948f5421dec10f015d70091c9c`
MD5	`b550b8ce806f35915003d8bea3cbe0c6`
BLAKE2b-256	`92b29953bf7923d6c683d64f43059fc0c9cc69078ffb1c4677e4748ae362fbd5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyterrier_generative-0.1.0.tar.gz:

Publisher: deploy.yml on Parry-Parry/pyterrier-generative

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyterrier_generative-0.1.0.tar.gz
- Subject digest: d4cce9eb370108f404425d11b686d23d9305a6948f5421dec10f015d70091c9c
- Sigstore transparency entry: 765447084
- Sigstore integration time: Dec 15, 2025
Source repository:
- Permalink: Parry-Parry/pyterrier-generative@92094fd789c176301c746c806d3f11f85d1e08ef
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Parry-Parry
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: deploy.yml@92094fd789c176301c746c806d3f11f85d1e08ef
- Trigger Event: release

File details

Details for the file pyterrier_generative-0.1.0-py3-none-any.whl.

File metadata

Download URL: pyterrier_generative-0.1.0-py3-none-any.whl
Upload date: Dec 15, 2025
Size: 29.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyterrier_generative-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9514fc3e72408f6ff511306756a084f0a6a1aace858871b51aa55749728a61d7`
MD5	`da029b96fd42e200e744414c33a47b83`
BLAKE2b-256	`d55da65e130dd5b6ffc7aba638e33f0f806b65738a93938a0b6e9ce976975ba0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyterrier_generative-0.1.0-py3-none-any.whl:

Publisher: deploy.yml on Parry-Parry/pyterrier-generative

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyterrier_generative-0.1.0-py3-none-any.whl
- Subject digest: 9514fc3e72408f6ff511306756a084f0a6a1aace858871b51aa55749728a61d7
- Sigstore transparency entry: 765447112
- Sigstore integration time: Dec 15, 2025
Source repository:
- Permalink: Parry-Parry/pyterrier-generative@92094fd789c176301c746c806d3f11f85d1e08ef
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Parry-Parry
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: deploy.yml@92094fd789c176301c746c806d3f11f85d1e08ef
- Trigger Event: release

pyterrier-generative 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

🤖 PyTerrier Generative

📘 Overview

🚀 Getting Started

Install from PyPI

Install from source

Quick Example

🎯 Pre-configured Rankers

RankZephyr

RankVicuna

RankGPT

LiT5

⚙️ Custom Rankers

🔄 Ranking Algorithms

Sliding Window

Top-Down Partitioning

Single Window

Setwise

Backend Selection

🔌 PyTerrier Integration

Basic Re-ranking

Multi-stage Pipeline

Comparative Evaluation

🎨 Advanced Features

System Prompts (for chat models)

Custom Generation Parameters

📊 How It Works

Listwise Ranking

Sliding Window Algorithm

🔬 Research

👥 Authors

🤝 Contributing

🧾 Version History

📄 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance