ranx: A Blazing Fast Python Library for Ranking Evaluation and Comparison
Project description
🔥 News
🤖 Dev Bulletin
-
[ver 0.1.7]
- Creating big
Qrels
andRun
from Python dictionaries, Pandas DataFrames, or loading them from files is now much faster. - It's now possible to load/save
Qrels
andRun
from/toJSON
files usingQrels.from_file(path, type="json")
(same forRun
) andQrels.save(path, type="json")
(same forRun
).
- Creating big
-
[ver 0.1.6]
- It's now possible to export a
Report
as a Python dictionary withReport.to_dict()
and save it as aJSON
file withReport.save(path)
. More details here.
- It's now possible to export a
-
ranx works on Google Colab now. Unfortunately, Google Colab takes some time to compile the
Numba
functions the first time they are called.
⚡️ Introduction
ranx is a library of fast ranking evaluation metrics implemented in Python, leveraging Numba for high-speed vector operations and automatic parallelization.
It allows you to compare different runs, perform statistical tests, and export a LaTeX table for your scientific publications.
We strongly incourage you to check the example folder to learn how to use ranx in just a few minutes.
✨ Available Metrics
- Hits
- Precision
- Recall
- rPrecision
- Mean Reciprocal Rank (MRR)
- Mean Average Precision (MAP)
- Normalized Discounted Cumulative Gain (NDCG)
The metrics have been tested against TREC Eval for correctness.
🔌 Installation
pip install ranx
💡 Usage
Create Qrels and Run
from ranx import Qrels, Run, evaluate
qrels = Qrels()
qrels.add_multi(
q_ids=["q_1", "q_2"],
doc_ids=[
["doc_12", "doc_25"], # q_1 relevant documents
["doc_11", "doc_2"], # q_2 relevant documents
],
scores=[
[5, 3], # q_1 relevance judgements
[6, 1], # q_2 relevance judgements
],
)
run = Run()
run.add_multi(
q_ids=["q_1", "q_2"],
doc_ids=[
["doc_12", "doc_23", "doc_25", "doc_36", "doc_32", "doc_35"],
["doc_12", "doc_11", "doc_25", "doc_36", "doc_2", "doc_35"],
],
scores=[
[0.9, 0.8, 0.7, 0.6, 0.5, 0.4],
[0.9, 0.8, 0.7, 0.6, 0.5, 0.4],
],
)
Evaluate
# Compute score for a single metric
evaluate(qrels, run, "ndcg@5")
>>> 0.7861
# Compute scores for multiple metrics at once
evaluate(qrels, run, ["map@5", "mrr"])
>>> {"map@5": 0.6416, "mrr": 0.75}
# Computed metric scores are saved in the Run object
run.mean_scores
>>> {"ndcg@5": 0.7861, "map@5": 0.6416, "mrr": 0.75}
# Access scores for each query
dict(run.scores)
>>> {"ndcg@5": {"q_1": 0.9430, "q_2": 0.6292},
"map@5": {"q_1": 0.8333, "q_2": 0.4500},
"mrr": {"q_1": 1.0000, "q_2": 0.5000}}
Compare
# Compare different runs and perform statistical tests
report = compare(
qrels=qrels,
runs=[run_1, run_2, run_3, run_4, run_5],
metrics=["map@100", "mrr@100", "ndcg@10"],
max_p=0.01 # P-value threshold
)
print(report)
Output:
# Model MAP@100 MRR@100 NDCG@10
--- ------- ---------- ---------- ----------
a model_1 0.3202ᵇ 0.3207ᵇ 0.3684ᵇᶜ
b model_2 0.2332 0.2339 0.239
c model_3 0.3082ᵇ 0.3089ᵇ 0.3295ᵇ
d model_4 0.3664ᵃᵇᶜ 0.3668ᵃᵇᶜ 0.4078ᵃᵇᶜ
e model_5 0.4053ᵃᵇᶜᵈ 0.4061ᵃᵇᶜᵈ 0.4512ᵃᵇᶜᵈ
📖 Examples
- Overview: This notebook shows the main features of ranx.
- Create Qrels and Run: This notebook shows different ways of creating
Qrels
andRun
.
📚 Documentation
To be updated! Please, refer to the examples in the meantime.
Search the documentation for more details and examples.
🎓 Citation
If you use ranx to evaluate results for your scientific publication, please consider citing it:
@misc{ranx2021,
title = {ranx: A Blazing-Fast Python Library for Ranking Evaluation and Comparison},
author = {Bassani, Elias},
year = {2021},
publisher = {GitHub},
howpublished = {\url{https://github.com/AmenRa/ranx}},
}
🎁 Feature Requests
Would you like to see a new metric implemented? Please, open a new issue.
🤘 Want to contribute?
Would you like to contribute? Please, drop me an e-mail.
📄 License
ranx is an open-sourced software licensed under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.