Tie-aware Retrieval Metrics (TRM) for reliable evaluation of retrieval systems under tied relevance scores.

These details have not been verified by PyPI

Project links

Project description

Tie-aware Retrieval Metrics (TRM)

A lightweight Python library for reliable evaluation of retrieval systems in the presence of tied relevance scores.

When retrieval models operate in low numerical precision (e.g., BF16, FP16), many candidate documents receive identical scores, creating spurious ties. Conventional tie-oblivious evaluation arbitrarily breaks these ties, leading to unstable and potentially misleading metric values. TRM resolves this by computing expected metric values over all possible orderings of tied candidates, along with score range and bias diagnostics.

Reference Kisu Yang, Yoonna Jang, Hwanseok Jang, Kenneth Choi, Isabelle Augenstein, Heuiseok Lim. Reliable Evaluation Protocol for Low-Precision Retrieval. ACL 2026.

Installation

pip install tie-aware-retrieval-metrics

Or install from source:

git clone https://github.com/KisuYang/tie-aware-retrieval-metrics.git
cd tie-aware-retrieval-metrics
pip install -e .

Quick Start

import trm

# Per-query relevance scores and labels
scores = [
    [0.99, 0.97, 0.97, 0.97, 0.95],  # query 1: three docs share score 0.97
]
is_relevant = [
    [False, True, False, True, False],  # query 1: docs 1, 3 are relevant
]

result = trm.evaluate(
    scores=scores,
    is_relevant=is_relevant,
    metrics=["ndcg", "mrr", "recall"],
    k_list=[3, 5],
)

# Macro-averaged results
for metric in ["ndcg", "mrr", "recall"]:
    for k in [3, 5]:
        r = result.metrics[metric][k]
        print(f"{metric}@{k}: E[M]={r.expected:.4f}  "
              f"M_obl={r.oblivious:.4f}  "
              f"M_max={r.maximum:.4f}  "
              f"M_min={r.minimum:.4f}  "
              f"range={r.range:.4f}  "
              f"bias={r.bias:.4f}")

Output:

ndcg@3: E[M]=0.4623  M_obl=0.3869  M_max=0.6934  M_min=0.3066  range=0.3869  bias=-0.0754
ndcg@5: E[M]=0.6383  M_obl=0.6509  M_max=0.6934  M_min=0.5706  range=0.1228  bias=0.0126
mrr@3:  E[M]=0.4444  M_obl=0.5000  M_max=0.5000  M_min=0.3333  range=0.1667  bias=0.0556
mrr@5:  E[M]=0.4444  M_obl=0.5000  M_max=0.5000  M_min=0.3333  range=0.1667  bias=0.0556
recall@3: E[M]=0.6667  M_obl=0.5000  M_max=1.0000  M_min=0.5000  range=0.5000  bias=-0.1667
recall@5: E[M]=1.0000  M_obl=1.0000  M_max=1.0000  M_min=1.0000  range=0.0000  bias=0.0000

API Reference

You can also selectively import:

from trm import evaluate, build_tie_groups

`trm.evaluate(scores, is_relevant, metrics=None, k_list=None)`

Compute tie-aware retrieval metrics over a set of queries.

Parameters:

scores (list of list of float): Per-query relevance scores for each candidate document.
is_relevant (list of list of bool): Per-query binary relevance labels.
metrics (list of str, optional): Metrics to compute. Supported: "ndcg", "mrr", "map", "recall", "precision", "f1", "hits". Default: ["ndcg", "mrr", "map", "recall"].
k_list (list of int, optional): Cutoff values. Default: [1, 3, 5, 10, 20, 50, 100].

Returns: EvaluationOutput with:

.metrics[metric_name][k] → TieAwareResult (macro-averaged)
.per_query[metric_name][k] → list of per-query TieAwareResult
.to_dict() → flat dictionary for logging

`TieAwareResult`

Attribute	Description
`.expected`	E[M] — expected score over all tie orderings
`.oblivious`	M_obl — tie-oblivious (index-preserving) score
`.maximum`	M_max — best-case score
`.minimum`	M_min — worst-case score
`.range`	M_max - M_min (Eq. 4)
`.bias`	M_obl - E[M] (Eq. 5)

`trm.build_tie_groups(scores, is_relevant)`

Build tie groups from raw scores and relevance labels.

Returns: list of (group_size, num_relevant) tuples sorted by descending score.

Supported Metrics

Metric	Key	Paper Reference
nDCG@k	`"ndcg"`	Eq. 14-16
MRR@k	`"mrr"`	Eq. 17-21
MAP@k	`"map"`	Eq. 22-24
Recall@k	`"recall"`	Eq. 10
Precision@k	`"precision"`	Eq. 11
F1@k	`"f1"`	Eq. 12
Hits@k	`"hits"`	Eq. 9

Citation

@inproceedings{yang2026reliable,
    title     = {Reliable Evaluation Protocol for Low-Precision Retrieval},
    author    = {Yang, Kisu and Jang, Yoonna and Jang, Hwanseok and Choi, Kenneth and Augenstein, Isabelle and Lim, Heuiseok},
    booktitle = {Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL)},
    year      = {2026},
}

License

Apache License 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tie_aware_retrieval_metrics-0.1.0.tar.gz (13.5 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tie_aware_retrieval_metrics-0.1.0-py3-none-any.whl (12.9 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file tie_aware_retrieval_metrics-0.1.0.tar.gz.

File metadata

Download URL: tie_aware_retrieval_metrics-0.1.0.tar.gz
Upload date: Apr 13, 2026
Size: 13.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for tie_aware_retrieval_metrics-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6e700cefc0a9b6bab8dea63df64b66fe6bea0e34655470abf7fe9a008447cf6b`
MD5	`3428cb61500c2490f47148cf7ff2df86`
BLAKE2b-256	`9c58aa44ff2cd7cd81ef369afc932eaefceb6a515abd6f028f04eca93481a053`

See more details on using hashes here.

File details

Details for the file tie_aware_retrieval_metrics-0.1.0-py3-none-any.whl.

File metadata

Download URL: tie_aware_retrieval_metrics-0.1.0-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 12.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for tie_aware_retrieval_metrics-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`704371db7e3b6629d268d595ee016389bff77eff9d0ac82cfb197f4647fe075a`
MD5	`889223eca6f98e2f6004987f9eda93e9`
BLAKE2b-256	`59faf0a424f3f93c819f82a6c8fba5f17b1aebec67c4f6594c935e6705307caa`

See more details on using hashes here.

tie-aware-retrieval-metrics 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Tie-aware Retrieval Metrics (TRM)

Installation

Quick Start

API Reference

`trm.evaluate(scores, is_relevant, metrics=None, k_list=None)`

`TieAwareResult`

`trm.build_tie_groups(scores, is_relevant)`

Supported Metrics

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes