provides a common interface to many IR measure tools

Project description

ir_measures

Check out our documentation website: ir-measur.es

Provides a common interface to many IR measure tools.

Provided by the Terrier Team @ Glasgow. Find us at terrierteam/ir_measures.

Getting Started

Install via pip

pip install ir-measures

Python API

from ir_measures import iter_calc, calc_aggregate
from ir_measures import AP, nDCG, RR, P

qrels = {
    'Q0': {"D0": 0, "D1": 1},
    "Q1": {"D0": 0, "D3": 2}
}
run = {
    'Q0': {"D0": 1.2, "D1": 1.0},
    "Q1": {"D0": 2.4, "D3": 3.6}
}

# aggregated results
calc_aggregate([AP, nDCG, RR, nDCG@10, P(rel=2)@10], qrels, run)
# {AP: 0.75, nDCG: 0.8154648767857288, RR: 0.75, nDCG@10: 0.8154648767857288, P(rel=2)@10: 0.05}

# by query
for metric in iter_calc([AP, nDCG, RR, nDCG@10, P(rel=2)@10], qrels, run):
    print(x)
# Metric(query_id='Q0', measure=AP, value=0.5)
# Metric(query_id='Q0', measure=RR, value=0.5)
# Metric(query_id='Q0', measure=nDCG, value=0.6309297535714575)
# Metric(query_id='Q0', measure=nDCG@10, value=0.6309297535714575)
# Metric(query_id='Q1', measure=AP, value=1.0)
# Metric(query_id='Q1', measure=RR, value=1.0)
# Metric(query_id='Q1', measure=nDCG, value=1.0)
# Metric(query_id='Q1', measure=nDCG@10, value=1.0)
# Metric(query_id='Q0', measure=P(rel=2)@10, value=0.0)
# Metric(query_id='Q1', measure=P(rel=2)@10, value=0.1)

Qrels can be provided in the following formats:

# dict of dict
qrels = {
    'Q0': {
        "D0": 1,
        "D1": 0,
    },
    "Q1": {
        "D0": 0,
        "D3": 2
    }
}

# dataframe
qrels = pd.DataFrame([
    {'query_id': "Q0", 'doc_id': "D0", 'relevance': 1},
    {'query_id': "Q0", 'doc_id': "D1", 'relevance': 0},
    {'query_id': "Q1", 'doc_id': "D0", 'relevance': 0},
    {'query_id': "Q1", 'doc_id': "D3", 'relevance': 2},
])

# any iterable of namedtuples (e.g., list, generator, etc)
from ir_measures.util import GenericQrel
qrels = [
    GenericQrel("Q0", "D0", 1),
    GenericQrel("Q0", "D1", 0),
    GenericQrel("Q1", "D0", 0),
    GenericQrel("Q1", "D3", 2),
]

Runs can be provided in the following formats:

# dict of dict
run = {
    'Q0': {
        "D0": 1.2,
        "D1": 1.0,
    },
    "Q1": {
        "D0": 2.4,
        "D3": 3.6
    }
}

# dataframe
run = pd.DataFrame([
    {'query_id': "Q0", 'doc_id': "D0", 'score': 1.2},
    {'query_id': "Q0", 'doc_id': "D1", 'score': 1.0},
    {'query_id': "Q1", 'doc_id': "D0", 'score': 2.4},
    {'query_id': "Q1", 'doc_id': "D3", 'score': 3.6},
])

# any iterable of namedtuples (e.g., list, generator, etc)
from ir_measures.util import GenericScoredDoc
run = [
    GenericScoredDoc("Q0", "D0", 1.2),
    GenericScoredDoc("Q0", "D1", 1.0),
    GenericScoredDoc("Q1", "D0", 2.4),
    GenericScoredDoc("Q1", "D3", 3.6),
]

Commnad Line Interface

ir_measures also functions as a command line interface, with syntax similar to trec_eval.

Example:

ir_measures /path/to/qrels /path/to/run P@10 'P(rel=2)@5 nDCG@15 Judged@10' NumQ NumRel NumRet NumRelRet
P@10    0.4382
P(rel=2)@5  0.0827
nDCG@15 0.4357
Judged@10   0.9812
NumQ    249.0000
NumRel  17412.0000
NumRet  241339.0000
NumRet(rel=1)   10272.0000

Syntax:

ir_measures qrels run measures... [-q] [-n]

qrels: a TREC-formatted QRELS file
run: a TREC-formatted results file
measures: one or more measure name strings. Note that in bash, () must be in single quotes. For simplicity, you can provide multiple meaures in a single quotation, which are split on whitespace.
-q: provide results for each query individually
-n: when used with -q, skips summary statistics
-p: number of decimal places to report results (default: 4)

Measures

`AP`

The [Mean] Average Precision ([M]AP). The average precision of a single query is the mean of the precision scores at each relevant item returned in a search results list.

AP is typically used for adhoc ranking tasks where getting as many relevant items as possible is. It is commonly referred to as MAP, by taking the mean of AP over the query set.

Parameters:

cutoff (int) - ranking cutoff threshold
rel (int) - minimum relevance score to be considered relevant (inclusive)

`Bpref`

Binary Preference (Bpref). This measure examines the relative ranks of judged relevant and non-relevant documents. Non-judged documents are not considered.

Parameters:

rel (int) - minimum relevance score to be considered relevant (inclusive)

`ERR`

The Expected Reciprocal Rank (ERR) is a precision-focused measure. In essence, an extension of reciprocal rank that encapsulates both graded relevance and a more realistic cascade-based user model of how users brwose a ranking.

Parameters:

cutoff (int) - ranking cutoff threshold

`IPrec`

Interpolated Precision at a given recall cutoff. Used for building precision-recall graphs. Unlike most measures, where @ indicates an absolute cutoff threshold, here @ sets the recall cutoff.

Parameters:

recall (float) - recall threshold
rel (int) - minimum relevance score to be considered relevant (inclusive)

`Judged`

Percentage of results in the top k (cutoff) results that have relevance judgments. Equivalent to P@k with a rel lower than any judgment.

Parameters:

cutoff (int) - ranking cutoff threshold

`NumQ`

The total number of queries.

`NumRel`

The number of relevant documents the query has (independent of what the system retrieved).

Parameters:

rel (int) - minimum relevance score to be counted (inclusive)

`NumRet`

The number of results returned. When rel is provided, counts the number of documents returned with at least that relevance score (inclusive).

Parameters:

rel (int) - minimum relevance score to be counted (inclusive), or all documents returned if NOT_PROVIDED

`P`

Basic measure for that computes the percentage of documents in the top cutoff results that are labeled as relevant. cutoff is a required parameter, and can be provided as P@cutoff.

Parameters:

cutoff (int) - ranking cutoff threshold
rel (int) - minimum relevance score to be considered relevant (inclusive)

`R`

Recall@k (R@k). The fraction of relevant documents for a query that have been retrieved by rank k.

Parameters:

cutoff (int) - ranking cutoff threshold
rel (int) - minimum relevance score to be considered relevant (inclusive)

`RBP`

The Rank-Biased Precision (RBP) TODO: write

Parameters:

cutoff (int) - ranking cutoff threshold
p (float) - persistence
rel (int) - minimum relevance score to be considered relevant (inclusive), or NOT_PROVIDED to use graded relevance

`RR`

The [Mean] Reciprocal Rank ([M]RR) is a precision-focused measure that scores based on the reciprocal of the rank of the highest-scoring relevance document. An optional cutoff can be provided to limit the depth explored. rel (default 1) controls which relevance level is considered relevant.

Parameters:

cutoff (int) - ranking cutoff threshold
rel (int) - minimum relevance score to be considered relevant (inclusive)

`Rprec`

The precision of at R, where R is the number of relevant documents for a given query. Has the cute property that it is also the recall at R.

Parameters:

rel (int) - minimum relevance score to be considered relevant (inclusive)

`SetP`

The Set Precision (SetP); i.e., the number of relevant docs divided by the total number retrieved

Parameters:

rel (int) - minimum relevance score to be considered relevant (inclusive)

`Success`

1 if a document with at least rel relevance is found in the first cutoff documents, else 0.

Parameters:

cutoff (int) - ranking cutoff threshold
rel (int) - minimum relevance score to be considered relevant (inclusive)

`infAP`

Inferred AP. AP implementation that accounts for pooled-but-unjudged documents by assuming that they are relevant at the same proportion as other judged documents. Essentially, skips documents that were pooled-but-not-judged, and assumes unjudged are non-relevant.

Pooled-but-unjudged indicated by a score of -1, by convention. Note that not all qrels use this convention.

Parameters:

rel (int) - minimum relevance score to be considered relevant (inclusive)

`nDCG`

The normalized Discounted Cumulative Gain (nDCG). Uses graded labels - systems that put the highest graded documents at the top of the ranking. It is normalized wrt. the Ideal NDCG, i.e. documents ranked in descending order of graded label.

Parameters:

cutoff (int) - ranking cutoff threshold
dcg (str) - DCG formulation

Aliases

BPref → Bpref
MAP → AP
MRR → RR
NDCG → nDCG
NumRelRet → NumRet(rel=1)
RPrec → Rprec

Providers

`gdeval`

gdeval

Supported Measures:

nDCG
ERR

`judged`

python implementation of judgment rate

Supported Measures:

Judged

`msmarco`

MS MARCO's implementation of RR

Supported Measures:

RR

`pytrec_eval`

pytrec_eval

https://github.com/cvangysel/pytrec_eval

@inproceedings{VanGysel2018pytreceval,
  title={Pytrec\_eval: An Extremely Fast Python Interface to trec\_eval},
  author={Van Gysel, Christophe and de Rijke, Maarten},
  publisher={ACM},
  booktitle={SIGIR},
  year={2018},
}

Supported Measures:

P
RR
Rprec
AP
nDCG
R
Bpref
NumRet
NumQ
NumRel
SetP
Success
IPrec
infAP

`trectools`

trectools

https://github.com/joaopalotti/trectools

@inproceedings{palotti2019,
 author = {Palotti, Joao and Scells, Harrisen and Zuccon, Guido},
 title = {TrecTools: an open-source Python library for Information Retrieval practitioners involved in TREC-like campaigns},
 series = {SIGIR'19},
 year = {2019},
 location = {Paris, France},
 publisher = {ACM}
}

Supported Measures:

P
RR
Rprec
AP
nDCG
Bpref
RBP

Credits

Sean MacAvaney, University of Glasgow
Craig Macdonald, University of Glasgow

Project details

Release history Release notifications | RSS feed

0.4.3

Nov 25, 2025

0.4.2

Nov 4, 2025

0.4.1

Aug 19, 2025

0.4.0

Aug 18, 2025

0.3.7

Feb 21, 2025

0.3.6

Dec 4, 2024

0.3.5

Nov 28, 2024

0.3.4

Oct 25, 2024

0.3.3

Jun 23, 2023

0.3.1

Aug 20, 2022

0.3.0

Apr 10, 2022

0.2.3

Dec 8, 2021

0.2.2 yanked

Dec 7, 2021

Reason this release was yanked:

critical bug (fixed in 0.2.3)

0.2.1

Sep 21, 2021

0.2.0

Sep 16, 2021

0.1.4

May 7, 2021

This version

0.1.3

May 4, 2021

0.1.2

Apr 27, 2021

0.1.1

Apr 24, 2021

0.1.0

Apr 23, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ir_measures-0.1.3.tar.gz (32.1 kB view details)

Uploaded May 4, 2021 Source

File details

Details for the file ir_measures-0.1.3.tar.gz.

File metadata

Download URL: ir_measures-0.1.3.tar.gz
Upload date: May 4, 2021
Size: 32.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for ir_measures-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`f0a2acfd981413966fcda33b0885ceb9c8a6c4539352fbb27c220585857a76d6`
MD5	`3093373a296a9c9548181ba0fb26c9d5`
BLAKE2b-256	`8458c88dedd77f63387e8abf4d5c3a3c1d27afa987e290365216b84c52ffd4d8`

See more details on using hashes here.

ir-measures 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ir_measures

Getting Started

Python API

Commnad Line Interface

Measures

AP

Bpref

ERR

IPrec

Judged

NumQ

NumRel

NumRet

P

R

RBP

RR

Rprec

SetP

Success

infAP

nDCG

Aliases

Providers

gdeval

judged

msmarco

pytrec_eval

trectools

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

`AP`

`Bpref`

`ERR`

`IPrec`

`Judged`

`NumQ`

`NumRel`

`NumRet`

`P`

`R`

`RBP`

`RR`

`Rprec`

`SetP`

`Success`

`infAP`

`nDCG`

`gdeval`

`judged`

`msmarco`

`pytrec_eval`

`trectools`