PyTerrier
Project description
PyTerrier
PyTerrier - v1.0
🔍 Retrieve. 🧠 Rerank. 💬 Answer. ⚙️ Experiment.
Overview
Build (sparse|learned sparse|dense) indexing and retrieval pipelines for search and RAG use-cases, and conduct experiments on standard datasets.
For example, build a re-ranking pipeline combining a Terrier BM25 retriever and the MonoT5 neural reranker (each of these are PyTerrier Transformer classes):
import pyterrier as pt
import pyterrier_t5
bm25 = pt.terrier.TerrierIndex.from_hf("pyterrier/vaswani.terrier").bm25() % 100
monot5 = bm25 >> pt.get.get_text(pt.get_dataset('irds:vaswani')) >> pyterrier_t5.MonoT5ReRanker()
monot5.search("What are chemical reactions?")
In notebook environments, PyTerrier transformers and pipelines can be visualised.
You can easily build pipeline for query expansion, learning-to-rank, dense retrieval and even RAG.
Once you have working pipelines, you can formulate an experiment to compare their effectiveness using the pt.Experiment function:
from pyterrier.measures import *
pt.Experiment(
[bm25, monot5]
pt.get_dataset('vaswani').get_topics(),
pt.get_dataset('vaswani').get_qrels(),
[nDCG@10, AP@100]
)
You can easily perform retrieval experiments using many standard datasets, including all from the ir_datasets package. E.g., use pt.datasets.get_dataset("irds:medline/2004/trec-genomics-2004")
to get the TREC Genomics 2004 dataset. A full catalogue of ir_datasets is available here.
Installation
The easiest way to get started with PyTerrier is to use one of our Colab notebooks - look for the badges below.
Linux or Google Colab or Windows or macOS
pip install 'pyterrier[all]'- You may need to set JAVA_HOME environment variable if Pyjnius cannot find your Java installation.
PyTerrier Extensions
PyTerrier has additional plugins for everything from dense retrieval to RAG:
- Pyterrier_DR: [Github] - single-representation dense retrieval
- Pyterrier_RAG: [Github] - retrieval augmented generation and LLM access
- PyTerrier_ColBERT: [Github] - mulitple-representation dense retrieval and/or neural reranking
- PyTerrier_PISA: [Github] - fast in-memory indexing and retrieval using PISA
- PyTerrier_T5: [Github] - neural reranking: monoT5, duoT5
- PyTerrier_GenRank [Github] - generative listwise reranking: RankVicuna, RankZephyr
- PyTerrier_doc2query: [Github] - neural augmented indexing
- PyTerrier_SPLADE: [Github] - neural augmented indexing
You can see examples of how to use these, including notebooks that run on Google Colab, in the contents of our Search Solutions 2022 tutorial.
Open Source Licence
PyTerrier is subject to the terms detailed in the Mozilla Public License Version 2.0. The Mozilla Public License can be found in the file LICENSE.txt. By using this software, you have agreed to the licence.
Citation Licence
The source and binary forms of PyTerrier are subject to the following citation license:
By downloading and using PyTerrier, you agree to cite at the undernoted paper describing PyTerrier in any kind of material you produce where PyTerrier was used to conduct search or experimentation, whether be it a research paper, dissertation, article, poster, presentation, or documentation. By using this software, you have agreed to the citation licence.
@inproceedings{pyterrier2020ictir,
author = {Craig Macdonald and Nicola Tonellotto},
title = {Declarative Experimentation inInformation Retrieval using PyTerrier},
booktitle = {Proceedings of ICTIR 2020},
year = {2020}
}
Credits
- Craig Macdonald, University of Glasgow
- Sean MacAvaney, University of Glasgow
- Nicola Tonellotto, University of Pisa
- Alex Tsolov, University of Glasgow
- Arthur Câmara, TU Delft
- Alberto Ueda, Federal University of Minas Gerais
- Sean MacAvaney, University of Glasgow
- Chentao Xu, University of Glasgow
- Sarawoot Kongyoung, University of Glasgow
- Zhan Su, Copenhagen University
- Marcus Schutte, TU Delft
- Lukas Zeit-Altpeter, Friedrich Schiller University Jena
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyterrier-1.0.2.tar.gz.
File metadata
- Download URL: pyterrier-1.0.2.tar.gz
- Upload date:
- Size: 231.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
624605755eefd1db16bc3a4919bab4af501f8b39070c467f68aef0feb33a2cb9
|
|
| MD5 |
bed864ce962ce268adcd1b6bfa335d51
|
|
| BLAKE2b-256 |
1d6790e05af94d72da5c50f96ca1f9574677d4a4bd8d31cb8a6181bc963ab00b
|
Provenance
The following attestation bundles were made for pyterrier-1.0.2.tar.gz:
Publisher:
publish-to-pypi.yml on terrier-org/pyterrier
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyterrier-1.0.2.tar.gz -
Subject digest:
624605755eefd1db16bc3a4919bab4af501f8b39070c467f68aef0feb33a2cb9 - Sigstore transparency entry: 844904288
- Sigstore integration time:
-
Permalink:
terrier-org/pyterrier@530e63c79c8cdd9fb8a3ec498416bc5103efe340 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/terrier-org
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@530e63c79c8cdd9fb8a3ec498416bc5103efe340 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pyterrier-1.0.2-py3-none-any.whl.
File metadata
- Download URL: pyterrier-1.0.2-py3-none-any.whl
- Upload date:
- Size: 208.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b2f7c06592aecbf298ab6025737a5c52a63276f28377fcb4c65ef60c0c170e8
|
|
| MD5 |
d74e820fe6cff853a7f63b8b720031e4
|
|
| BLAKE2b-256 |
ff2dea62243026e1c68837d188c416b0fa5f0d9632601c37335a73e4b2aad256
|
Provenance
The following attestation bundles were made for pyterrier-1.0.2-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on terrier-org/pyterrier
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyterrier-1.0.2-py3-none-any.whl -
Subject digest:
1b2f7c06592aecbf298ab6025737a5c52a63276f28377fcb4c65ef60c0c170e8 - Sigstore transparency entry: 844904291
- Sigstore integration time:
-
Permalink:
terrier-org/pyterrier@530e63c79c8cdd9fb8a3ec498416bc5103efe340 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/terrier-org
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@530e63c79c8cdd9fb8a3ec498416bc5103efe340 -
Trigger Event:
workflow_dispatch
-
Statement type: