Skip to main content

PyTerrier implementation of Adaptive Re-Ranking using a Corpus Graph (CIKM 2022)

Project description

pyterrier_adaptive

PyTerrier implementation of Adaptive Re-Ranking using a Corpus Graph (CIKM 2022).

Getting Started

Install with pip:

pip install --upgrade git+https://github.com/terrierteam/pyterrier_adaptive.git

Basic Example over the MS MARCO passage corpus (making use of the pyterrier_t5 and pyterrier_pisa plugins):

Try examples in Google Colab! Open In Colab

import pyterrier as pt
from pyterrier_t5 import MonoT5ReRanker
from pyterrier_pisa import PisaIndex
from pyterrier_adaptive import GAR, NpTopKCorpusGraph

dataset = pt.get_dataset('irds:msmarco-passage')
retriever = PisaIndex.from_dataset('msmarco_passage').bm25()
scorer = pt.text.get_text(dataset, 'text') >> MonoT5ReRanker(verbose=False, batch_size=16)
graph = NpTopKCorpusGraph.from_dataset('msmarco_passage', 'corpusgraph_bm25_k16').to_limit_k(8)

pipeline = retriever >> GAR(scorer, graph) >> pt.text.get_text(dataset, 'text')

pipeline.search('clustering hypothesis information retrieval')
# qid                                        query    docno  rank       score  iteration                                               text
#   1  clustering hypothesis information retrieval  2180710     0   -0.017059          0  Cluster analysis or clustering is the task of ...
#   1  clustering hypothesis information retrieval  8430269     1   -0.166563          1  Clustering is the grouping of a particular set...
#   1  clustering hypothesis information retrieval  1091429     2   -0.208345          1  Clustering is a fundamental data analysis meth...
#   1  clustering hypothesis information retrieval  2180711     3   -0.341018          5  Cluster analysis or clustering is the task of ...
#   1  clustering hypothesis information retrieval  6031959     4   -0.367014          5  Cluster analysis or clustering is the task of ...
#  ..                                          ...      ...   ...         ...        ...                                                ...
#                iteration column indicates which GAR batch the document was scored in ^
#                even=initial retrieval   odd=corpus graph    -1=backfilled

Evaluation on a test collection (TREC DL19):

from pyterrier.measures import *
dataset = pt.get_dataset('irds:msmarco-passage/trec-dl-2019/judged')
pt.Experiment(
    [retriever, retriever >> scorer, retriever >> GAR(scorer, graph)],
    dataset.get_topics(),
    dataset.get_qrels(),
    [nDCG, MAP(rel=2), R(rel=2)@1000],
    names=['bm25', 'bm25 >> monot5', 'bm25 >> GAR(monot5)']
)
#                name      nDCG  AP(rel=2)  R(rel=2)@1000
#                bm25  0.602325   0.303099       0.755495
#      bm25 >> monot5  0.696293   0.481259       0.755495
# bm25 >> GAR(monot5)  0.724501   0.489978       0.825952

Reproduction

Detailed instructions to come!

Building a Corpus Graph

You can construct a $k$ corpus graph using any retriever transformer and a corpus iterator.

Example:

from pyterrier_adaptive import CorpusGraph
from pyterrier_pisa import PisaIndex
dataset = pt.get_dataset('irds:msmarco-passage')

# Build the index needed for BM25 retrieval (if it doesn't already exist)
idx = PisaIndex('msmarco-passage.pisa', threads=45) # adjust for your resources
if not idx.built():
    idx.index(dataset.get_corpus_iter())

# Build the corpus graph
K = 16 # number of nearest neighbours
graph16 = CorpusGraph.from_retriever(
    idx.bm25(num_results=K+1), # K+1 needed because retriever will return original document
    dataset.get_corpus_iter(),
    'msmarco-passage.gbm25.16',
    k=K)

You can load a corpus graph using the .load(path) function. You can simulate lower $k$ values using .to_limit_k(k)

graph16 = CorpusGraph.load('msmarco-passage.gbm25.16')
graph8 = graph16.to_limit_k(8)

Citation

Adaptive Re-Ranking with a Corpus Graph. Sean MacAvaney, Nicola Tonellotto and Craig Macdonald. In Proceedings of CIKM 2022.

@inproceedings{gar2022,
  title = {Adaptive Re-Ranking with a Corpus Graph},
  booktitle = {Proceedings of ACM CIKM},
  author = {Sean MacAvaney and Nicola Tonellotto and Craig Macdonald},
  year = 2022
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyterrier_adaptive-0.2.1.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyterrier_adaptive-0.2.1-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file pyterrier_adaptive-0.2.1.tar.gz.

File metadata

  • Download URL: pyterrier_adaptive-0.2.1.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for pyterrier_adaptive-0.2.1.tar.gz
Algorithm Hash digest
SHA256 f3a34af73d09d6c47e94336803c428c90afeee3dbafa7182ec6a3d111e3a0c5c
MD5 3c026a36bc60c4a2f19187c8e87cdb5b
BLAKE2b-256 e947f2e076cf5d176079eb88446c1a84594445ba9c166c7416ba6473ecd3b0e1

See more details on using hashes here.

File details

Details for the file pyterrier_adaptive-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pyterrier_adaptive-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1c087db3d9fb830b158f60f017a89a04b43aad66467765c3439a99d9c3b37152
MD5 b0ccd0e025c0f3b6d1d1a5f6bde32f0d
BLAKE2b-256 034893ceb8988876481ae9523a0c22b07564f43de0537e9368e9ff22728c2f9c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page