Skip to main content

provides a common interface to many IR measure tools

Project description

Python package

ir_measures

Check out our documentation website: ir-measur.es

Provides a common interface to many IR measure tools.

Provided by the Terrier Team @ Glasgow. Find us at terrierteam/ir_measures.

Getting Started

Install via pip

pip install ir-measures

Python API

from ir_measures import iter_calc, calc_aggregate
from ir_measures import AP, nDCG, RR, P

qrels = {
    'Q0': {"D0": 0, "D1": 1},
    "Q1": {"D0": 0, "D3": 2}
}
run = {
    'Q0': {"D0": 1.2, "D1": 1.0},
    "Q1": {"D0": 2.4, "D3": 3.6}
}

# aggregated results
calc_aggregate([AP, nDCG, RR, nDCG@10, P(rel=2)@10], qrels, run)
# {AP: 0.75, nDCG: 0.8154648767857288, RR: 0.75, nDCG@10: 0.8154648767857288, P(rel=2)@10: 0.05}

# by query
for metric in iter_calc([AP, nDCG, RR, nDCG@10, P(rel=2)@10], qrels, run):
    print(x)
# Metric(query_id='Q0', measure=AP, value=0.5)
# Metric(query_id='Q0', measure=RR, value=0.5)
# Metric(query_id='Q0', measure=nDCG, value=0.6309297535714575)
# Metric(query_id='Q0', measure=nDCG@10, value=0.6309297535714575)
# Metric(query_id='Q1', measure=AP, value=1.0)
# Metric(query_id='Q1', measure=RR, value=1.0)
# Metric(query_id='Q1', measure=nDCG, value=1.0)
# Metric(query_id='Q1', measure=nDCG@10, value=1.0)
# Metric(query_id='Q0', measure=P(rel=2)@10, value=0.0)
# Metric(query_id='Q1', measure=P(rel=2)@10, value=0.1)

Qrels can be provided in the following formats:

# dict of dict
qrels = {
    'Q0': {
        "D0": 1,
        "D1": 0,
    },
    "Q1": {
        "D0": 0,
        "D3": 2
    }
}

# dataframe
qrels = pd.DataFrame([
    {'query_id': "Q0", 'doc_id': "D0", 'relevance': 1},
    {'query_id': "Q0", 'doc_id': "D1", 'relevance': 0},
    {'query_id': "Q1", 'doc_id': "D0", 'relevance': 0},
    {'query_id': "Q1", 'doc_id': "D3", 'relevance': 2},
])

# any iterable of namedtuples (e.g., list, generator, etc)
from ir_measurs.util import GenericQrel
qrels = [
    GenericQrel("Q0", "D0", 1),
    GenericQrel("Q0", "D1", 0),
    GenericQrel("Q1", "D0", 0),
    GenericQrel("Q1", "D3", 2),
]

Runs can be provided in the following formats:

# dict of dict
run = {
    'Q0': {
        "D0": 1.2,
        "D1": 1.0,
    },
    "Q1": {
        "D0": 2.4,
        "D3": 3.6
    }
}

# dataframe
run = pd.DataFrame([
    {'query_id': "Q0", 'doc_id': "D0", 'score': 1.2},
    {'query_id': "Q0", 'doc_id': "D1", 'score': 1.0},
    {'query_id': "Q1", 'doc_id': "D0", 'score': 2.4},
    {'query_id': "Q1", 'doc_id': "D3", 'score': 3.6},
])

# any iterable of namedtuples (e.g., list, generator, etc)
from ir_measurs.util import GenericScoredDoc
run = [
    GenericScoredDoc("Q0", "D0", 1.2),
    GenericScoredDoc("Q0", "D1", 1.0),
    GenericScoredDoc("Q1", "D0", 2.4),
    GenericScoredDoc("Q1", "D3", 3.6),
]

Measures

AP

The [Mean] Average Precision ([M]AP). The average precision of a single query is the mean of the precision scores at each relevant item returned in a search results list.

AP is typically used for adhoc ranking tasks where getting as many relevant items as possible is. It is commonly referred to as MAP, by taking the mean of AP over the query set.

Parameters:

  • cutoff (int) - ranking cutoff threshold
  • rel (int) - minimum relevance score to be considered relevant (inclusive)

Bpref

Binary Preference (Bpref). This measure examines the relative ranks of judged relevant and non-relevant documents. Non-judged documents are not considered.

Parameters:

  • rel (int) - minimum relevance score to be considered relevant (inclusive)

ERR

The Expected Reciprocal Rank (ERR) is a precision-focused measure. In essence, an extension of reciprocal rank that encapsulates both graded relevance and a more realistic cascade-based user model of how users brwose a ranking.

Parameters:

  • cutoff (int) - ranking cutoff threshold

IPrec

Interpolated Precision at a given recall cutoff. Used for building precision-recall graphs. Unlike most measures, where @ indicates an absolute cutoff threshold, here @ sets the recall cutoff.

Parameters:

  • recall (float) - recall threshold
  • rel (int) - minimum relevance score to be considered relevant (inclusive)

Judged

Percentage of results in the top k (cutoff) results that have relevance judgments. Equivalent to P@k with a rel lower than any judgment.

Parameters:

  • cutoff (int) - ranking cutoff threshold

NumQ

The total number of queries.

NumRel

The number of relevant documents the query has (independent of what the system retrieved).

Parameters:

  • rel (int) - minimum relevance score to be counted (inclusive)

NumRet

The number of results returned. When rel is provided, counts the number of documents returned with at least that relevance score (inclusive).

Parameters:

  • rel (int) - minimum relevance score to be counted (inclusive), or all documents returned if NOT_PROVIDED

P

Basic measure for that computes the percentage of documents in the top cutoff results that are labeled as relevant. cutoff is a required parameter, and can be provided as P@cutoff.

Parameters:

  • cutoff (int) - ranking cutoff threshold
  • rel (int) - minimum relevance score to be considered relevant (inclusive)

R

Recall@k (R@k). The fraction of relevant documents for a query that have been retrieved by rank k.

Parameters:

  • cutoff (int) - ranking cutoff threshold
  • rel (int) - minimum relevance score to be considered relevant (inclusive)

RBP

The Rank-Biased Precision (RBP) TODO: write

Parameters:

  • cutoff (int) - ranking cutoff threshold
  • p (float) - persistence
  • rel (int) - minimum relevance score to be considered relevant (inclusive), or NOT_PROVIDED to use graded relevance

RR

The [Mean] Reciprocal Rank ([M]RR) is a precision-focused measure that scores based on the reciprocal of the rank of the highest-scoring relevance document. An optional cutoff can be provided to limit the depth explored. rel (default 1) controls which relevance level is considered relevant.

Parameters:

  • cutoff (int) - ranking cutoff threshold
  • rel (int) - minimum relevance score to be considered relevant (inclusive)

Rprec

The precision of at R, where R is the number of relevant documents for a given query. Has the cute property that it is also the recall at R.

Parameters:

  • rel (int) - minimum relevance score to be considered relevant (inclusive)

SetP

The Set Precision (SetP); i.e., the number of relevant docs divided by the total number retrieved

Parameters:

  • rel (int) - minimum relevance score to be considered relevant (inclusive)

Success

1 if a document with at least rel relevance is found in the first cutoff documents, else 0.

Parameters:

  • cutoff (int) - ranking cutoff threshold
  • rel (int) - minimum relevance score to be considered relevant (inclusive)

infAP

Inferred AP. AP implementation that accounts for pooled-but-unjudged documents by assuming that they are relevant at the same proportion as other judged documents. Essentially, skips documents that were pooled-but-not-judged, and assumes unjudged are non-relevant.

Pooled-but-unjudged indicated by a score of -1, by convention. Note that not all qrels use this convention.

Parameters:

  • rel (int) - minimum relevance score to be considered relevant (inclusive)

nDCG

The normalized Discounted Cumulative Gain (nDCG). Uses graded labels - systems that put the highest graded documents at the top of the ranking. It is normalized wrt. the Ideal NDCG, i.e. documents ranked in descending order of graded label.

Parameters:

  • cutoff (int) - ranking cutoff threshold
  • dcg (str) - DCG formulation

Aliases

  • BPrefBpref
  • MAPAP
  • MRRRR
  • NDCGnDCG
  • NumRelRetNumRet(rel=1)
  • RPrecRprec

Providers

gdeval

gdeval

Supported Measures:

  • nDCG
  • ERR

judged

python implementation of judgment rate

Supported Measures:

  • Judged

msmarco

MS MARCO's implementation of RR

Supported Measures:

  • RR

pytrec_eval

pytrec_eval

https://github.com/cvangysel/pytrec_eval

@inproceedings{VanGysel2018pytreceval,
  title={Pytrec\_eval: An Extremely Fast Python Interface to trec\_eval},
  author={Van Gysel, Christophe and de Rijke, Maarten},
  publisher={ACM},
  booktitle={SIGIR},
  year={2018},
}

Supported Measures:

  • P
  • RR
  • Rprec
  • AP
  • nDCG
  • R
  • Bpref
  • NumRet
  • NumQ
  • NumRel
  • SetP
  • Success
  • IPrec
  • infAP

trectools

trectools

https://github.com/joaopalotti/trectools

@inproceedings{palotti2019,
 author = {Palotti, Joao and Scells, Harrisen and Zuccon, Guido},
 title = {TrecTools: an open-source Python library for Information Retrieval practitioners involved in TREC-like campaigns},
 series = {SIGIR'19},
 year = {2019},
 location = {Paris, France},
 publisher = {ACM}
}

Supported Measures:

  • P
  • RR
  • Rprec
  • AP
  • nDCG
  • Bpref
  • RBP

Credits

  • Sean MacAvaney, University of Glasgow
  • Craig Macdonald, University of Glasgow

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ir_measures-0.1.1.tar.gz (28.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page