Skip to main content

OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline (Experimaestro version)

Project description

PyPI version

OpenNIR (experimaestro version)

OpenNIR-xpm is an end-to-end neural ad-hoc ranking pipeline.

This is an adaptation of OpenNIR using experiment manager tools (experimaestro and datamaestro).

Quick start

Install OpenNIR (XPM version) with

pip install OpenNIR-XPM

and use this file for training (other examples are available on github)

import logging
import os
from pathlib import Path

from datamaestro import prepare_dataset
from experimaestro.click import click, forwardoption
from experimaestro import experiment
from experimaestro_ir.models import BM25
from experimaestro_ir.anserini import SearchCollection
from experimaestro_ir.evaluation import TrecEval
from onir.datasets.robust import RobustDataset
from onir.predictors.reranker import Reranker
from onir.random import Random
from onir.rankers.drmm import Drmm
from onir.tasks.learner import Learner
from onir.tasks.evaluate import Evaluate
from onir.trainers.pointwise import PointwiseTrainer
from onir.vocab.wordvec_vocab import WordvecUnkVocab

logging.basicConfig(level=logging.INFO)


# --- Defines the experiment

@forwardoption.max_epoch(Learner)
@click.option("--debug", is_flag=True, help="Print debug information")
@click.option("--port", type=int, default=12345, help="Port for monitoring")
@click.argument("workdir", type=Path)
@click.command()
def cli(port, workdir, debug, max_epoch):
    """Runs an experiment"""
    logging.getLogger().setLevel(logging.DEBUG if debug else logging.INFO)

    # Sets the working directory and the name of the xp
    with experiment(workdir, "drmm", port=port) as xp:
        random = Random()
        xp.setenv("JAVA_HOME", os.environ["JAVA_HOME"])

        # Prepare the collection
        wordembs = prepare_dataset("edu.stanford.glove.6b.50")        
        vocab = WordvecUnkVocab(data=wordembs, random=random)
        robust = RobustDataset.prepare()
        train, val, test = robust('trf1'), robust('vaf1'), robust('f1')

        # Train with OpenNIR DRMM model
        ranker = Drmm(vocab=vocab).tag("ranker", "drmm")
        predictor = Reranker()
        trainer = PointwiseTrainer()
        learner = Learner(trainer=trainer, random=random, ranker=ranker, valid_pred=predictor, 
            train_dataset=train, val_dataset=val, max_epoch=max_epoch)
        model = learner.submit()

        # Evaluate
        evaluate = Evaluate(dataset=test, model=model, predictor=predictor).submit()

        # Search and evaluate with BM25
        bm25_search = (
            SearchCollection(index=test.index, topics=test.assessed_topics.topics, model=BM25())
            .tag("model", "bm25")
            .submit()
        )
        bm25_eval = TrecEval(
            assessments=test.assessed_topics.assessments, run=bm25_search
        ).submit()

        xp.wait()

        print(f"Results for DRMM\n{evaluate.results.read_text()}\n")
        print(f"Results for BM25\n{bm25_eval.results.read_text()}\n")


if __name__ == "__main__":
    cli()

Start with (using the folder drmm-test to store the ouputs)

python test.py --port 12345 drmm-test

Features

The features below are from OpenNIR

Rankers

Available in the onir.rankers module

  • DRMM onir.rankers.drmm.Drmm (since 0.1.2) paper
  • (planned) Duet (local model) paper
  • (planned) MatchPyramid paper
  • (planned) KNRM paper
  • (planned) PACRR paper
  • (planned) ConvKNRM paper
  • (since 0.1.4) Vanilla BERT paper
  • CEDR models onir.rankers.cedr_drmm.CedrDrmm paper
  • (planned) MatchZoo models source
  • (planned) MatchZoo's KNRM
  • (planned) MatchZoo's ConvKNRM

Datasets

Evaluation Metrics

  • map (from trec_eval)
  • ndcg (from trec_eval)
  • ndcg@X (from trec_eval, gdeval)
  • p@X (from trec_eval)
  • err@X (from gdeval)
  • mrr (from trec_eval)
  • rprec (from trec_eval)
  • judged@X (implemented in python)

Vocabularies

  • (planned) Binary term matching vocab=binary (i.e., changes interaction matrix from cosine similarity to to binary indicators)
  • Pretrained word vectors. Find the list with datamaestro search tag:"word embeddings"
  • BERT-based encoders

Citing OpenNIR

If you use OpenNIR, please cite the real OpenNIR WSDM demonstration paper and look at acknowledgements of the original OpenNIR.

@InProceedings{macavaney:wsdm2020-onir,
  author = {MacAvaney, Sean},
  title = {{OpenNIR}: A Complete Neural Ad-Hoc Ranking Pipeline},
  booktitle = {{WSDM} 2020},
  year = {2020}
}```

If you have space, you can also cite mine:

```bibtex
@inproceedings{10.1145/3397271.3401410,
author = {Piwowarski, Benjamin},
title = {Experimaestro and Datamaestro: Experiment and Dataset Managers (for IR)},
year = {2020},
doi = {10.1145/3397271.3401410},
booktitle = {Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
location = {Virtual Event, China},
series = {SIGIR ’20}
}```


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

OpenNIR_XPM-0.1.4-py3-none-any.whl (127.6 kB view details)

Uploaded Python 3

File details

Details for the file OpenNIR_XPM-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: OpenNIR_XPM-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 127.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.2

File hashes

Hashes for OpenNIR_XPM-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 95ea4ff772bb6af986f8dda75dc232452feb62cfcf7cf146eecab111a2dbec49
MD5 0cbadb162fb17fa7e7551f28ae24f376
BLAKE2b-256 b5a3cd2c1c11d64e7f1a2e134c92c332aaddb0018936adcdf69cef395e5710e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page