Skip to main content

PyTerrier implementation of Adaptive Re-Ranking using a Corpus Graph (CIKM 2022)

Project description

pyterrier_adaptive

PyTerrier implementation of Adaptive Re-Ranking using a Corpus Graph (CIKM 2022).

Getting Started

Install with pip:

pip install --upgrade git+https://github.com/terrierteam/pyterrier_adaptive.git

Basic Example over the MS MARCO passage corpus (making use of the pyterrier_t5 and pyterrier_pisa plugins):

Try examples in Google Colab! Open In Colab

import pyterrier as pt
pt.init()
from pyterrier_t5 import MonoT5ReRanker
from pyterrier_pisa import PisaIndex
from pyterrier_adaptive import GAR, CorpusGraph

dataset = pt.get_dataset('irds:msmarco-passage')
retriever = PisaIndex.from_dataset('msmarco_passage').bm25()
scorer = pt.text.get_text(dataset, 'text') >> MonoT5ReRanker(verbose=False, batch_size=16)
graph = CorpusGraph.from_dataset('msmarco_passage', 'corpusgraph_bm25_k16').to_limit_k(8)

pipeline = retriever >> GAR(scorer, graph) >> pt.text.get_text(dataset, 'text')

pipeline.search('clustering hypothesis information retrieval')
# qid                                        query    docno  rank       score  iteration                                               text
#   1  clustering hypothesis information retrieval  2180710     0   -0.017059          0  Cluster analysis or clustering is the task of ...
#   1  clustering hypothesis information retrieval  8430269     1   -0.166563          1  Clustering is the grouping of a particular set...
#   1  clustering hypothesis information retrieval  1091429     2   -0.208345          1  Clustering is a fundamental data analysis meth...
#   1  clustering hypothesis information retrieval  2180711     3   -0.341018          5  Cluster analysis or clustering is the task of ...
#   1  clustering hypothesis information retrieval  6031959     4   -0.367014          5  Cluster analysis or clustering is the task of ...
#  ..                                          ...      ...   ...         ...        ...                                                ...
#                iteration column indicates which GAR batch the document was scored in ^
#                even=initial retrieval   odd=corpus graph    -1=backfilled

Evaluation on a test collection (TREC DL19):

from pyterrier.measures import *
dataset = pt.get_dataset('irds:msmarco-passage/trec-dl-2019/judged')
pt.Experiment(
    [retriever, retriever >> scorer, retriever >> GAR(scorer, graph)],
    dataset.get_topics(),
    dataset.get_qrels(),
    [nDCG, MAP(rel=2), R(rel=2)@1000],
    names=['bm25', 'bm25 >> monot5', 'bm25 >> GAR(monot5)']
)
#                name      nDCG  AP(rel=2)  R(rel=2)@1000
#                bm25  0.602325   0.303099       0.755495
#      bm25 >> monot5  0.696293   0.481259       0.755495
# bm25 >> GAR(monot5)  0.724501   0.489978       0.825952

Reproduction

Detailed instructions to come!

Building a Corpus Graph

You can construct a $k$ corpus graph using any retriever transformer and a corpus iterator.

Example:

from pyterrier_adaptive import CorpusGraph
from pyterrier_pisa import PisaIndex
dataset = pt.get_dataset('irds:msmarco-passage')

# Build the index needed for BM25 retrieval (if it doesn't already exist)
idx = PisaIndex('msmarco-passage.pisa', threads=45) # adjust for your resources
if not idx.built():
    idx.index(dataset.get_corpus_iter())

# Build the corpus graph
K = 16 # number of nearest neighbours
graph16 = CorpusGraph.from_retriever(
    idx.bm25(num_results=K+1), # K+1 needed because retriever will return original document
    dataset.get_corpus_iter(),
    'msmarco-passage.gbm25.16',
    k=K)

You can load a corpus graph using the .load(path) function. You can simulate lower $k$ values using .to_limit_k(k)

graph16 = CorpusGraph.load('msmarco-passage.gbm25.16')
graph8 = graph16.to_limit_k(8)

Citation

Adaptive Re-Ranking with a Corpus Graph. Sean MacAvaney, Nicola Tonellotto and Craig Macdonald. In Proceedings of CIKM 2022.

@inproceedings{gar2022,
  title = {Adaptive Re-Ranking with a Corpus Graph},
  booktitle = {Proceedings of ACM CIKM},
  author = {Sean MacAvaney and Nicola Tonellotto and Craig Macdonald},
  year = 2022
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyterrier_adaptive-0.2.0.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyterrier_adaptive-0.2.0-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file pyterrier_adaptive-0.2.0.tar.gz.

File metadata

  • Download URL: pyterrier_adaptive-0.2.0.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for pyterrier_adaptive-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6c67f09e13048d18affbb9cc2817725314ac944c0cb36fe2540d742e191b56d9
MD5 5bc4d996d014da299528ebbddf3fd359
BLAKE2b-256 a94af0b4e5d9955fd9a6a8587562b499e0ed56feb18d317588e20557fe9d008f

See more details on using hashes here.

File details

Details for the file pyterrier_adaptive-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pyterrier_adaptive-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 26cb085120a7d67216072e4b3986a85e13f2a57f89ed1fd99ac25066f231b827
MD5 a5b57d1d2af63542f25a913ef866575a
BLAKE2b-256 3d707ddc21a6e1248ff51662dda07e2ce96a605bb99787766f903be17917c09f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page