PyTerrier RAG pipelines

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Text Processing
- Text Processing :: Indexing

Project description

PyTerrier RAG

PyTerrier-RAG is an extension for PyTerrier that makes it easier to produce retrieval augmented generation pipelines. PyTerrier-RAG supports:

Easy access to common QA datasets
Pre-built indices for common corpora
Popular reader models, such as Fusion-in-Decoder, LLama
Evaluation measures

As well as access to all of the retrievers (sparse, learned sparse and dense) and rerankers (from MonoT5 to RankGPT) accessible through the wider PyTerrier ecosystem.

Installation is as easy as pip install pyterrier-rag.

Example Notebooks

Try it out here on Google Colab now by clicking the "Open in Colab" button!

Sparse Retrieval with FiD and FlanT5 readers: sparse_retrieval_FiD_FlanT5.ipynb
SearchR1 with Sparse Retrieval and MonoT5: examples/search-r1.ipynb

RAG Readers

Fusion in Decoder: pyterrier_rag.readers.T5FiD, pyterrier_rag.readers.BARTFiD
OpenAI: pyterrier_rag.readers.OpenAIReader
VLLM: pyterrier_rag.readers.VLLMReader

RAG pipelines can be formulated as easily as:

bm25 = pt.terrier.Retriever()
fid = pyterrier_rag.readers.T5FiD()
bm25_rag = bm25 % 10 >> fid 
monoT5_rag = bm25 % 10 >> MonoT5() >> fid 
monoT5_rag.search("What are chemical reactions?")

Try it out now with the example notebook: sparse_retrieval_FiD_FlanT5.ipynb .

Agentic RAG

These frameworks use search as a tool - the reasoning model decides when to search, and then integrates the retrieved results into the input for the next invocation of the model:

Search-R1: pyterrier_rag.SearchR1 https://arxiv.org/pdf/2503.09516
Search-O1: pyterrier_rag.SearchO1 https://arxiv.org/abs/2501.05366
R1-Searcher: pyterrier_rag.R1Searcher https://arxiv.org/abs/2503.05592

bm25 = pt.Artifact.from_hf('pyterrier/ragwiki-terrier').bm25(include_fields=['docno', 'text', 'title'])
monoT5 = pyterrier_t5.MonoT5()
r1_monoT5 = pyterrier_rag.SearchR1(bm25 % 20 >> monoT5)
r1_monoT5.search("What are chemical reactions?")

o1_monoT5 = pyterrier_rag.SearchO1(
    pyterrier_rag.readers.CausalLMReader("deepseek-ai/DeepSeek-R1-Distill-Qwen-14B"), 
    bm25 % 20 >> monoT5)
o1_monoT5.search("What are chemical reactions?")

Try these frameworks out now with our example notebooks:

Datasets

Queries and gold answers of common datasets can be accessed through the PyTerrier datasets API: pt.get_dataset("rag:nq").get_topics() and pt.get_dataset("rag:nq").get_answers(). The following QA datasets are available:

Natural Questions: "rag:nq"
HotpotQA: "rag:hotpotqa"
TriviaQA: "rag:triviaqa"
Musique: "rag:musique"
WebQuestions: "rag:web_questions"
WoW: "rag:wow"
PopQA: "rag:popqa"

We also provide pre-built indices for standard RAG corpora. For instance, a BM25 retriever for the Wikipedia corpus for NQ can be obtained from an pre-existing index autoamticallty downloaded from HuggingFace:

sparse_index = pt.Artifact.from_hf('pyterrier/ragwiki-terrier')
bm25 = pt.rewrite.tokenise() >> sparse_index.bm25(include_fields=['docno', 'text', 'title']) >> pt.rewrite.reset()

Dense indices are also provided, e.g. E5 on Wikipedia:

import pyterrier_dr
e5 = pyterrier_dr.E5() >> pt.Artifact.from_hf("pyterrier/ragwiki-e5.flex") >> sparse_index.text_loader(['docno', 'title', 'text'])

Evaluation

An experiment comparing multiple RAG pipelines can be expressed using PyTerrier's pt.Experiment() API:

pt.Experiment(
    [pipe1, pipe2],
    dataset.get_topics(),
    dataset.get_answers(),
    [pyterrier_rag.measures.EM, pyterrier_rag.measures.F1]
)

Available measures include:

Answer length: pyterrier_rag.measures.AnswerLen
Answers of 0 length: pyterrier_rag.measures.AnswerZeroLen
Exact match percentage: pyterrier_rag.measures.EM
F1: pyterrier_rag.measures.F1
BERTScore (measures similarity of answer with relevant documents): pyterrier_rag.measures.BERTScore
ROUGE, e.g. pyterrier_rag.measures.ROUGE1F

Use the baseline kwarg to conduct significance testing in your experiment - see the pt.Experiment() documentation for more examples.

Citations

If you use PyTerrier-RAG for you research, please cite our work:

Constructing and Evaluating Declarative RAG Pipelines in PyTerrier. Craig Macdonald, Jinyuan Fang, Andrew Parry and Zaiqiao Meng. In Proceedings of SIGIR 2025. https://arxiv.org/abs/2506.10802

Credits

Craig Macdonald, University of Glasgow
Jinyuan Fang, University of Glasgow
Andrew Parry, University of Glasgow
Zaiqiao Meng, University of Glasgow
Sean MacAvaney, University of Glasgow

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Text Processing
- Text Processing :: Indexing

Release history Release notifications | RSS feed

0.3.0

Dec 22, 2025

0.2.3

Jul 14, 2025

0.2.2

Jul 13, 2025

0.2.1

Jul 3, 2025

This version

0.2.0

Jul 1, 2025

0.1.0

May 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyterrier_rag-0.2.0.tar.gz (59.5 kB view details)

Uploaded Jul 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyterrier_rag-0.2.0-py3-none-any.whl (58.0 kB view details)

Uploaded Jul 1, 2025 Python 3

File details

Details for the file pyterrier_rag-0.2.0.tar.gz.

File metadata

Download URL: pyterrier_rag-0.2.0.tar.gz
Upload date: Jul 1, 2025
Size: 59.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for pyterrier_rag-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8bec917661c720574a2d9f395c7161f6c48ab937307cc91b47968dc2ee9b1e88`
MD5	`719aff79afb6a35d736d3c0f8ac8141c`
BLAKE2b-256	`b8136087246f38dc13446518394d5120bae58beec7757efa8f286115b634e071`

See more details on using hashes here.

File details

Details for the file pyterrier_rag-0.2.0-py3-none-any.whl.

File metadata

Download URL: pyterrier_rag-0.2.0-py3-none-any.whl
Upload date: Jul 1, 2025
Size: 58.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for pyterrier_rag-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5df05702db85d062e5048fa00f8652cdcd2a418b4317031cd415eeef23581421`
MD5	`6cc46fda1f55f563e0c40091b174063f`
BLAKE2b-256	`a3133bc33e7d6eb79c803461ecf151eb68a45060dd983727e47cb742eb85c1ef`

See more details on using hashes here.

pyterrier-rag 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PyTerrier RAG

Example Notebooks

RAG Readers

Agentic RAG

Datasets

Evaluation

Citations

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes