ranx: A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion
Project description
🔥 News
-
📌 [October 10, 2022] I released a new sharing platform for pre-computed runs called ranxhub, click here to learn more!
-
[November 2, 2022]
ranx
0.3.3
is out!
This release adds support for changing Qrels relevance level, i.e, the minimum relevance judgement score to consider a document to be relevant.
You can now define metric-wise relevance levels by appending-l<num>
to metric names (e.g.,evaluate(qrels, run, ["map@100-l2", "ndcg-l3])
), or setting the Qrels relevance level qrels-wise asqrels.set_relevance_level(2)
. -
[October 10, 2022]
ranx
0.3
is out!
This release adds integration with ranxhub, a new sharing platform for pre-computed runs.
Click here for a quick example.
Click here to learn how to share your own runs with the community and lead by example!
⚡️ Introduction
ranx ([raŋks]) is a library of fast ranking evaluation metrics implemented in Python, leveraging Numba for high-speed vector operations and automatic parallelization. It offers a user-friendly interface to evaluate and compare Information Retrieval and Recommender Systems. ranx allows you to perform statistical tests and export LaTeX tables for your scientific publications. Moreover, ranx provides several fusion algorithms and normalization strategies, and an automatic fusion optimization functionality. ranx was featured in ECIR 2022 and CIKM 2022.
If you use ranx to evaluate results or conducting experiments involving fusion for your scientific publication, please consider citing it: evaluation bibtex, fusion bibtex.
For a quick overview, follow the Usage section.
For a in-depth overview, follow the Examples section.
✨ Features
Metrics
- Hits
- Hit Rate
- Precision
- Recall
- F1
- r-Precision
- Bpref
- Rank-biased Precision (RBP)
- Mean Reciprocal Rank (MRR)
- Mean Average Precision (MAP)
- Normalized Discounted Cumulative Gain (NDCG)
The metrics have been tested against TREC Eval for correctness.
Statistical Tests
Please, refer to Smucker et al., Carterette, and Fuhr for additional information on statistical tests for Information Retrieval.
Off-the-shelf Qrels
You can load qrels from ir-datasets as simply as:
qrels = Qrels.from_ir_datasets("msmarco-document/dev")
A full list of the available qrels is provided here.
Off-the-shelf Runs
You can load runs from ranxhub as simply as:
run = Run.from_ranxhub("run-id")
A full list of the available runs is provided here.
Fusion Algorithms
Name | Name | Name | Name | Name |
---|---|---|---|---|
CombMIN | CombMNZ | RRF | MAPFuse | BordaFuse |
CombMED | CombGMNZ | RBC | PosFuse | Weighted BordaFuse |
CombANZ | ISR | WMNZ | ProbFuse | Condorcet |
CombMAX | Log_ISR | Mixed | SegFuse | Weighted Condorcet |
CombSUM | LogN_ISR | BayesFuse | SlideFuse | Weighted Sum |
Please, refer to the documentation for further details.
Normalization Strategies
Please, refer to the documentation for further details.
🔌 Requirements
python>=3.8
As of v.0.3.5
, ranx requires python>=3.8
.
💾 Installation
pip install ranx
💡 Usage
Create Qrels and Run
from ranx import Qrels, Run
qrels_dict = { "q_1": { "d_12": 5, "d_25": 3 },
"q_2": { "d_11": 6, "d_22": 1 } }
run_dict = { "q_1": { "d_12": 0.9, "d_23": 0.8, "d_25": 0.7,
"d_36": 0.6, "d_32": 0.5, "d_35": 0.4 },
"q_2": { "d_12": 0.9, "d_11": 0.8, "d_25": 0.7,
"d_36": 0.6, "d_22": 0.5, "d_35": 0.4 } }
qrels = Qrels(qrels_dict)
run = Run(run_dict)
Evaluate
from ranx import evaluate
# Compute score for a single metric
evaluate(qrels, run, "ndcg@5")
>>> 0.7861
# Compute scores for multiple metrics at once
evaluate(qrels, run, ["map@5", "mrr"])
>>> {"map@5": 0.6416, "mrr": 0.75}
Compare
from ranx import compare
# Compare different runs and perform Two-sided Paired Student's t-Test
report = compare(
qrels=qrels,
runs=[run_1, run_2, run_3, run_4, run_5],
metrics=["map@100", "mrr@100", "ndcg@10"],
max_p=0.01 # P-value threshold
)
Output:
print(report)
# Model MAP@100 MRR@100 NDCG@10
--- ------- -------- -------- ---------
a model_1 0.320ᵇ 0.320ᵇ 0.368ᵇᶜ
b model_2 0.233 0.234 0.239
c model_3 0.308ᵇ 0.309ᵇ 0.330ᵇ
d model_4 0.366ᵃᵇᶜ 0.367ᵃᵇᶜ 0.408ᵃᵇᶜ
e model_5 0.405ᵃᵇᶜᵈ 0.406ᵃᵇᶜᵈ 0.451ᵃᵇᶜᵈ
Fusion
from ranx import fuse, optimize_fusion
best_params = optimize_fusion(
qrels=train_qrels,
runs=[train_run_1, train_run_2, train_run_3],
norm="min-max", # The norm. to apply before fusion
method="wsum", # The fusion algorithm to use (Weighted Sum)
metric="ndcg@100", # The metric to maximize
)
combined_test_run = fuse(
runs=[test_run_1, test_run_2, test_run_3],
norm="min-max",
method="wsum",
params=best_params,
)
📖 Examples
Name | Link |
---|---|
Overview | |
Qrels and Run | |
Evaluation | |
Comparison and Report | |
Fusion | |
Share your runs with ranxhub |
📚 Documentation
Browse the documentation for more details and examples.
🎓 Citation
If you use ranx to evaluate results for your scientific publication, please consider citing our ECIR 2022 paper:
BibTeX
@inproceedings{DBLP:conf/ecir/Bassani22,
author = {Elias Bassani},
title = {ranx: {A} Blazing-Fast Python Library for Ranking Evaluation and Comparison},
booktitle = {{ECIR} {(2)}},
series = {Lecture Notes in Computer Science},
volume = {13186},
pages = {259--264},
publisher = {Springer},
year = {2022}
}
If you use the fusion functionalities provided by ranx for conducting the experiments of your scientific publication, please consider citing our CIKM 2022 paper:
BibTeX
@inproceedings{DBLP:conf/cikm/BassaniR22,
author = {Elias Bassani and
Luca Romelli},
title = {ranx.fuse: {A} Python Library for Metasearch},
booktitle = {{CIKM}},
pages = {4808--4812},
publisher = {{ACM}},
year = {2022}
}
🎁 Feature Requests
Would you like to see other features implemented? Please, open a feature request.
🤘 Want to contribute?
Would you like to contribute? Please, drop me an e-mail.
📄 License
ranx is an open-sourced software licensed under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.