Skip to main content

PyTerrier RAG pipelines

Project description

PyTerrier RAG

PyTerrier-RAG is an extension for PyTerrier that makes it easier to produce retrieval augmented generation pipelines. PyTerrier-RAG supports:

  1. Easy access to common QA datasets
  2. Pre-built indices for common corpora
  3. Popular reader models, such as Fusion-in-Decoder, LLama
  4. Evaluation measures

As well as access to all of the retrievers (sparse, learned sparse and dense) and rerankers (from MonoT5 to RankGPT) accessible through the wider PyTerrier ecosystem.

Installation is as easy as pip install pyterrier-rag.

Example Notebooks

Try it out here on Google Colab now by clicking the "Open in Colab" button!

RAG Readers

  • Fusion in Decoder: pyterrier_rag.readers.T5FiD, pyterrier_rag.readers.BARTFiD
  • OpenAI: pyterrier_rag.readers.OpenAIReader
  • VLLM: pyterrier_rag.readers.VLLMReader

RAG pipelines can be formulated as easily as:

bm25 = pt.terrier.Retriever()
fid = pyterrier_rag.readers.T5FiD()
bm25_rag = bm25 % 10 >> fid 
monoT5_rag = bm25 % 10 >> MonoT5() >> fid 
monoT5_rag.search("What are chemical reactions?")

Try it out now with the example notebook: sparse_retrieval_FiD_FlanT5.ipynb Open In Colab.

Agentic RAG

These frameworks use search as a tool - the reasoning model decides when to search, and then integrates the retrieved results into the input for the next invocation of the model:

bm25 = pt.Artifact.from_hf('pyterrier/ragwiki-terrier').bm25(include_fields=['docno', 'text', 'title'])
monoT5 = pyterrier_t5.MonoT5()
r1_monoT5 = pyterrier_rag.SearchR1(bm25 % 20 >> monoT5)
r1_monoT5.search("What are chemical reactions?")

o1_monoT5 = pyterrier_rag.SearchO1(
    pyterrier_rag.readers.CausalLMReader("deepseek-ai/DeepSeek-R1-Distill-Qwen-14B"), 
    bm25 % 20 >> monoT5)
o1_monoT5.search("What are chemical reactions?")

Try these frameworks out now with our example notebooks:

Datasets

Queries and gold answers of common datasets can be accessed through the PyTerrier datasets API: pt.get_dataset("rag:nq").get_topics() and pt.get_dataset("rag:nq").get_answers(). The following QA datasets are available:

  • Natural Questions: "rag:nq"
  • HotpotQA: "rag:hotpotqa"
  • TriviaQA: "rag:triviaqa"
  • Musique: "rag:musique"
  • WebQuestions: "rag:web_questions"
  • WoW: "rag:wow"
  • PopQA: "rag:popqa"

We also provide pre-built indices for standard RAG corpora. For instance, a BM25 retriever for the Wikipedia corpus for NQ can be obtained from an pre-existing index autoamticallty downloaded from HuggingFace:

sparse_index = pt.Artifact.from_hf('pyterrier/ragwiki-terrier')
bm25 = pt.rewrite.tokenise() >> sparse_index.bm25(include_fields=['docno', 'text', 'title']) >> pt.rewrite.reset()

Dense indices are also provided, e.g. E5 on Wikipedia:

import pyterrier_dr
e5 = pyterrier_dr.E5() >> pt.Artifact.from_hf("pyterrier/ragwiki-e5.flex") >> sparse_index.text_loader(['docno', 'title', 'text'])

Evaluation

An experiment comparing multiple RAG pipelines can be expressed using PyTerrier's pt.Experiment() API:

pt.Experiment(
    [pipe1, pipe2],
    dataset.get_topics(),
    dataset.get_answers(),
    [pyterrier_rag.measures.EM, pyterrier_rag.measures.F1]
)

Available measures include:

  • Answer length: pyterrier_rag.measures.AnswerLen
  • Answers of 0 length: pyterrier_rag.measures.AnswerZeroLen
  • Exact match percentage: pyterrier_rag.measures.EM
  • F1: pyterrier_rag.measures.F1
  • BERTScore (measures similarity of answer with relevant documents): pyterrier_rag.measures.BERTScore
  • ROUGE, e.g. pyterrier_rag.measures.ROUGE1F

Use the baseline kwarg to conduct significance testing in your experiment - see the pt.Experiment() documentation for more examples.

Citations

If you use PyTerrier-RAG for you research, please cite our work:

Constructing and Evaluating Declarative RAG Pipelines in PyTerrier. Craig Macdonald, Jinyuan Fang, Andrew Parry and Zaiqiao Meng. In Proceedings of SIGIR 2025. https://arxiv.org/abs/2506.10802

Credits

  • Craig Macdonald, University of Glasgow
  • Jinyuan Fang, University of Glasgow
  • Andrew Parry, University of Glasgow
  • Zaiqiao Meng, University of Glasgow
  • Sean MacAvaney, University of Glasgow

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyterrier_rag-0.2.3.tar.gz (59.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyterrier_rag-0.2.3-py3-none-any.whl (58.1 kB view details)

Uploaded Python 3

File details

Details for the file pyterrier_rag-0.2.3.tar.gz.

File metadata

  • Download URL: pyterrier_rag-0.2.3.tar.gz
  • Upload date:
  • Size: 59.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for pyterrier_rag-0.2.3.tar.gz
Algorithm Hash digest
SHA256 8e37d34ec45eca507526f400b1e5cc5dcd144da663fbab7b0ef33e6fb2645491
MD5 13a5475f45f5ee7d6e5928e8b4240ef5
BLAKE2b-256 228ecd5de09ebdc85d04b1977a8cfcb24d4e8687806e95d9a907d29b0d0da7d1

See more details on using hashes here.

File details

Details for the file pyterrier_rag-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: pyterrier_rag-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 58.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for pyterrier_rag-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a5f3ca48e682b403592e3e3f3e62bcebc36d7037a34f90e9da157521f08022c1
MD5 b1081e45938506cdc0264e7aa2aca090
BLAKE2b-256 64a99cb6eb2dc9d1f8a127d858591e2143ac3c7e839032f0a27d4182ba7d584f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page