PyTerrier implementation of Adaptive Re-Ranking using a Corpus Graph (CIKM 2022)
Project description
pyterrier_adaptive
PyTerrier implementation of Adaptive Re-Ranking using a Corpus Graph (CIKM 2022).
Getting Started
Install with pip:
pip install --upgrade git+https://github.com/terrierteam/pyterrier_adaptive.git
Basic Example over the MS MARCO passage corpus (making use of the pyterrier_t5 and pyterrier_pisa plugins):
import pyterrier as pt
pt.init()
from pyterrier_t5 import MonoT5ReRanker
from pyterrier_pisa import PisaIndex
from pyterrier_adaptive import GAR, CorpusGraph
dataset = pt.get_dataset('irds:msmarco-passage')
retriever = PisaIndex.from_dataset('msmarco_passage').bm25()
scorer = pt.text.get_text(dataset, 'text') >> MonoT5ReRanker(verbose=False, batch_size=16)
graph = CorpusGraph.from_dataset('msmarco_passage', 'corpusgraph_bm25_k16').to_limit_k(8)
pipeline = retriever >> GAR(scorer, graph) >> pt.text.get_text(dataset, 'text')
pipeline.search('clustering hypothesis information retrieval')
# qid query docno rank score iteration text
# 1 clustering hypothesis information retrieval 2180710 0 -0.017059 0 Cluster analysis or clustering is the task of ...
# 1 clustering hypothesis information retrieval 8430269 1 -0.166563 1 Clustering is the grouping of a particular set...
# 1 clustering hypothesis information retrieval 1091429 2 -0.208345 1 Clustering is a fundamental data analysis meth...
# 1 clustering hypothesis information retrieval 2180711 3 -0.341018 5 Cluster analysis or clustering is the task of ...
# 1 clustering hypothesis information retrieval 6031959 4 -0.367014 5 Cluster analysis or clustering is the task of ...
# .. ... ... ... ... ... ...
# iteration column indicates which GAR batch the document was scored in ^
# even=initial retrieval odd=corpus graph -1=backfilled
Evaluation on a test collection (TREC DL19):
from pyterrier.measures import *
dataset = pt.get_dataset('irds:msmarco-passage/trec-dl-2019/judged')
pt.Experiment(
[retriever, retriever >> scorer, retriever >> GAR(scorer, graph)],
dataset.get_topics(),
dataset.get_qrels(),
[nDCG, MAP(rel=2), R(rel=2)@1000],
names=['bm25', 'bm25 >> monot5', 'bm25 >> GAR(monot5)']
)
# name nDCG AP(rel=2) R(rel=2)@1000
# bm25 0.602325 0.303099 0.755495
# bm25 >> monot5 0.696293 0.481259 0.755495
# bm25 >> GAR(monot5) 0.724501 0.489978 0.825952
Reproduction
Detailed instructions to come!
Building a Corpus Graph
You can construct a $k$ corpus graph using any retriever transformer and a corpus iterator.
Example:
from pyterrier_adaptive import CorpusGraph
from pyterrier_pisa import PisaIndex
dataset = pt.get_dataset('irds:msmarco-passage')
# Build the index needed for BM25 retrieval (if it doesn't already exist)
idx = PisaIndex('msmarco-passage.pisa', threads=45) # adjust for your resources
if not idx.built():
idx.index(dataset.get_corpus_iter())
# Build the corpus graph
K = 16 # number of nearest neighbours
graph16 = CorpusGraph.from_retriever(
idx.bm25(num_results=K+1), # K+1 needed because retriever will return original document
dataset.get_corpus_iter(),
'msmarco-passage.gbm25.16',
k=K)
You can load a corpus graph using the .load(path)
function. You can simulate lower $k$ values
using .to_limit_k(k)
graph16 = CorpusGraph.load('msmarco-passage.gbm25.16')
graph8 = graph16.to_limit_k(8)
Citation
Adaptive Re-Ranking with a Corpus Graph. Sean MacAvaney, Nicola Tonellotto and Craig Macdonald. In Proceedings of CIKM 2022.
@inproceedings{gar2022,
title = {Adaptive Re-Ranking with a Corpus Graph},
booktitle = {Proceedings of ACM CIKM},
author = {Sean MacAvaney and Nicola Tonellotto and Craig Macdonald},
year = 2022
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyterrier_adaptive-0.1.0.tar.gz
.
File metadata
- Download URL: pyterrier_adaptive-0.1.0.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ddb5c8043106fe3f202fa4e88ce1cbe2297a6e1755d7ac09ced43df9f7fceda |
|
MD5 | 216cfb0b1b9fb01032d95339a3ecb960 |
|
BLAKE2b-256 | 9157aae74cd49d1ee6ba08c55e1cdb0b1e323dc456fe0db97b67c7ce82a7a50b |
File details
Details for the file pyterrier_adaptive-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pyterrier_adaptive-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4853d50db9e34e71a74d7c7d489a2933fb1e40348511da9e1064cce8b0db876 |
|
MD5 | db3525551b5905b1a31969716b0a39b6 |
|
BLAKE2b-256 | 6afc8dc8ece1e69669e9585c8556e08254c8192004d7ec4a2a55f830d9492e11 |