Skip to main content

PyTerrier implementation of Adaptive Re-Ranking using a Corpus Graph (CIKM 2022)

Project description

pyterrier_adaptive

PyTerrier implementation of Adaptive Re-Ranking using a Corpus Graph (CIKM 2022).

Getting Started

Install with pip:

pip install --upgrade git+https://github.com/terrierteam/pyterrier_adaptive.git

Basic Example over the MS MARCO passage corpus (making use of the pyterrier_t5 and pyterrier_pisa plugins):

Try examples in Google Colab! Open In Colab

import pyterrier as pt
pt.init()
from pyterrier_t5 import MonoT5ReRanker
from pyterrier_pisa import PisaIndex
from pyterrier_adaptive import GAR, CorpusGraph

dataset = pt.get_dataset('irds:msmarco-passage')
retriever = PisaIndex.from_dataset('msmarco_passage').bm25()
scorer = pt.text.get_text(dataset, 'text') >> MonoT5ReRanker(verbose=False, batch_size=16)
graph = CorpusGraph.from_dataset('msmarco_passage', 'corpusgraph_bm25_k16').to_limit_k(8)

pipeline = retriever >> GAR(scorer, graph) >> pt.text.get_text(dataset, 'text')

pipeline.search('clustering hypothesis information retrieval')
# qid                                        query    docno  rank       score  iteration                                               text
#   1  clustering hypothesis information retrieval  2180710     0   -0.017059          0  Cluster analysis or clustering is the task of ...
#   1  clustering hypothesis information retrieval  8430269     1   -0.166563          1  Clustering is the grouping of a particular set...
#   1  clustering hypothesis information retrieval  1091429     2   -0.208345          1  Clustering is a fundamental data analysis meth...
#   1  clustering hypothesis information retrieval  2180711     3   -0.341018          5  Cluster analysis or clustering is the task of ...
#   1  clustering hypothesis information retrieval  6031959     4   -0.367014          5  Cluster analysis or clustering is the task of ...
#  ..                                          ...      ...   ...         ...        ...                                                ...
#                iteration column indicates which GAR batch the document was scored in ^
#                even=initial retrieval   odd=corpus graph    -1=backfilled

Evaluation on a test collection (TREC DL19):

from pyterrier.measures import *
dataset = pt.get_dataset('irds:msmarco-passage/trec-dl-2019/judged')
pt.Experiment(
    [retriever, retriever >> scorer, retriever >> GAR(scorer, graph)],
    dataset.get_topics(),
    dataset.get_qrels(),
    [nDCG, MAP(rel=2), R(rel=2)@1000],
    names=['bm25', 'bm25 >> monot5', 'bm25 >> GAR(monot5)']
)
#                name      nDCG  AP(rel=2)  R(rel=2)@1000
#                bm25  0.602325   0.303099       0.755495
#      bm25 >> monot5  0.696293   0.481259       0.755495
# bm25 >> GAR(monot5)  0.724501   0.489978       0.825952

Reproduction

Detailed instructions to come!

Building a Corpus Graph

You can construct a $k$ corpus graph using any retriever transformer and a corpus iterator.

Example:

from pyterrier_adaptive import CorpusGraph
from pyterrier_pisa import PisaIndex
dataset = pt.get_dataset('irds:msmarco-passage')

# Build the index needed for BM25 retrieval (if it doesn't already exist)
idx = PisaIndex('msmarco-passage.pisa', threads=45) # adjust for your resources
if not idx.built():
    idx.index(dataset.get_corpus_iter())

# Build the corpus graph
K = 16 # number of nearest neighbours
graph16 = CorpusGraph.from_retriever(
    idx.bm25(num_results=K+1), # K+1 needed because retriever will return original document
    dataset.get_corpus_iter(),
    'msmarco-passage.gbm25.16',
    k=K)

You can load a corpus graph using the .load(path) function. You can simulate lower $k$ values using .to_limit_k(k)

graph16 = CorpusGraph.load('msmarco-passage.gbm25.16')
graph8 = graph16.to_limit_k(8)

Citation

Adaptive Re-Ranking with a Corpus Graph. Sean MacAvaney, Nicola Tonellotto and Craig Macdonald. In Proceedings of CIKM 2022.

@inproceedings{gar2022,
  title = {Adaptive Re-Ranking with a Corpus Graph},
  booktitle = {Proceedings of ACM CIKM},
  author = {Sean MacAvaney and Nicola Tonellotto and Craig Macdonald},
  year = 2022
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyterrier_adaptive-0.1.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

pyterrier_adaptive-0.1.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file pyterrier_adaptive-0.1.0.tar.gz.

File metadata

  • Download URL: pyterrier_adaptive-0.1.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for pyterrier_adaptive-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4ddb5c8043106fe3f202fa4e88ce1cbe2297a6e1755d7ac09ced43df9f7fceda
MD5 216cfb0b1b9fb01032d95339a3ecb960
BLAKE2b-256 9157aae74cd49d1ee6ba08c55e1cdb0b1e323dc456fe0db97b67c7ce82a7a50b

See more details on using hashes here.

File details

Details for the file pyterrier_adaptive-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pyterrier_adaptive-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a4853d50db9e34e71a74d7c7d489a2933fb1e40348511da9e1064cce8b0db876
MD5 db3525551b5905b1a31969716b0a39b6
BLAKE2b-256 6afc8dc8ece1e69669e9585c8556e08254c8192004d7ec4a2a55f830d9492e11

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page