OpenNIR: A Complete Neural Ad-Hoc Ranking Pipeline (Experimaestro version)
Project description
OpenNIR (experimaestro version)
OpenNIR-xpm is an end-to-end neural ad-hoc ranking pipeline.
This is an adaptation of OpenNIR using experiment manager tools (experimaestro and datamaestro).
Quick start
This is an example for training
import logging
import os
from pathlib import Path
from datamaestro import prepare_dataset
from experimaestro.click import click, forwardoption
from experimaestro import experiment
from onir.datasets.robust import RobustDataset
from onir.predictors.reranker import Reranker
from onir.random import Random
from onir.rankers.drmm import Drmm
from onir.tasks.learner import Learner
from onir.tasks.evaluate import Evaluate
from onir.trainers.pointwise import PointwiseTrainer
from onir.vocab.wordvec_vocab import WordvecUnkVocab
logging.basicConfig(level=logging.INFO)
# --- Defines the experiment
@forwardoption.max_epoch(Learner)
@click.option("--debug", is_flag=True, help="Print debug information")
@click.option("--port", type=int, default=12345, help="Port for monitoring")
@click.argument("workdir", type=Path)
@click.command()
def cli(port, workdir, debug, max_epoch):
"""Runs an experiment"""
logging.getLogger().setLevel(logging.DEBUG if debug else logging.INFO)
# Sets the working directory and the name of the xp
with experiment(workdir, "drmm", port=port) as xp:
random = Random()
xp.setenv("JAVA_HOME", os.environ["JAVA_HOME"])
# Prepare the collection
wordembs = prepare_dataset("edu.stanford.glove.6b.50")
vocab = WordvecUnkVocab(data=wordembs, random=random)
robust = RobustDataset.prepare().submit()
# Train with OpenNIR DRMM model
ranker = Drmm(vocab=vocab).tag("ranker", "drmm")
predictor = Reranker()
trainer = PointwiseTrainer()
learner = Learner(trainer=trainer, random=random, ranker=ranker, valid_pred=predictor,
train_dataset=robust.subset('trf1'), val_dataset=robust.subset('vaf1'), max_epoch=max_epoch)
model = learner.submit()
# Evaluate
Evaluate(dataset=robust.subset('f1'), model=model, predictor=predictor).submit()
if __name__ == "__main__":
cli()
Features
The features below are from OpenNIR
Rankers
Available in the onir.rankers
module
- DRMM
onir.rankers.drmm.Drmm
paper - (planned) Duet (local model) paper
- (planned) MatchPyramid paper
- (planned) KNRM paper
- (planned) PACRR paper
- (planned) ConvKNRM paper
- (planned) Vanilla BERT
config/vanilla_bert
paper - CEDR models
onir.rankers.cedr_drmm.CedrDrmm
paper - (planned) MatchZoo models source
- (planned) MatchZoo's KNRM
- (planned) MatchZoo's ConvKNRM
Datasets
- TREC Robust 2004
- (planned) MS-MARCO
config/msmarco
- (planned) ANTIQUE
config/antique
- (planned) TREC CAR
config/car
- (planned) New York Times
config/nyt
-- for content-based weak supervision - (planned) TREC Arabic, Mandarin, and Spanish
config/multiling/*
-- for zero-shot multilingual transfer learning (instructions)
Evaluation Metrics
map
(from trec_eval)ndcg
(from trec_eval)ndcg@X
(from trec_eval, gdeval)p@X
(from trec_eval)err@X
(from gdeval)mrr
(from trec_eval)rprec
(from trec_eval)judged@X
(implemented in python)
Vocabularies
- (planned) Binary term matching
vocab=binary
(i.e., changes interaction matrix from cosine similarity to to binary indicators) - Pretrained word vectors. Find the list with
datamaestro search tag:"word embeddings"
Citing OpenNIR
If you use OpenNIR, please cite the real OpenNIR WSDM demonstration paper and look at acknowledgements of the original OpenNIR.
@InProceedings{macavaney:wsdm2020-onir,
author = {MacAvaney, Sean},
title = {{OpenNIR}: A Complete Neural Ad-Hoc Ranking Pipeline},
booktitle = {{WSDM} 2020},
year = {2020}
}
If you have space, you can also cite mine:
@inproceedings{10.1145/3397271.3401410,
author = {Piwowarski, Benjamin},
title = {Experimaestro and Datamaestro: Experiment and Dataset Managers (for IR)},
year = {2020},
doi = {10.1145/3397271.3401410},
booktitle = {Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval},
location = {Virtual Event, China},
series = {SIGIR ’20}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
OpenNIR_XPM-0.1.1-py3-none-any.whl
(127.4 kB
view details)
File details
Details for the file OpenNIR_XPM-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: OpenNIR_XPM-0.1.1-py3-none-any.whl
- Upload date:
- Size: 127.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d996ce820264c2ed4fbb90d9c22fda39e6f0ff64c5b21cc544303e1a34fecacf |
|
MD5 | 39cc02cea257af350347fd4691b9dd64 |
|
BLAKE2b-256 | 957c2858d82808a11ae242f2d8a7a673d1fecc7a69da3c8a511cb54554377ecb |