Skip to main content

Fast information retrieval evaluation metrics in numba

Project description

Information Retrieval Evaluation with numba

image Actions status License: MIT

This project provides simple and tested numba implementations of popular information retrieval metrics. The source code is clear and easy to understand. All functions have pydoc help strings.

The metrics can be used to determine the quality of rankings that are returned by a retrieval or recommender system.

Alternative library

If you don't need numba and want a library that is written in pure python, check out plurch/ir_evaluation

Installation

Requirements:

Python 3.11 or 3.12

numba>=0.60.0

ir_eval_numba can be installed from pypi with:

pip install ir_eval_numba

Usage

Metric functions will generally accept the following arguments:

actual (npt.NDArray[IntType]): An array of integer ground truth relevant items.

predicted (npt.NDArray[IntType]): An array of integer predicted items, ordered by relevance.

k (int): The number of top predictions to consider.

Functions will return a float value as the computed metric value.

Unit tests

Unit tests with easy to follow scenarios and sample data are included.

Run unit tests

uv run pytest

Metrics

Recall

Recall is defined as the ratio of the total number of relevant items retrieved within the top-k predictions to the total number of relevant items in the entire database.

Usage scenario: Prioritize returning all relevant items from database. Early retrieval stages where many candidates are returned should focus on this metric.

from ir_eval_numba.metrics import recall

Precision

Precision is defined as the ratio of the total number of relevant items retrieved within the top-k predictions to the total number of returned items (k).

Usage scenario: Minimize false positives in predictions. Later ranking stages should focus on this metric.

from ir_eval_numba.metrics import precision

F1 Score

The F1-score is calculated as the harmonic mean of precision and recall. The F1-score provides a balanced view of a system's performance by taking into account both precision and recall.

Usage scenario: Use when where finding all relevant documents is just as important as minimizing irrelevant ones (eg in information retrieval).

from ir_eval_numba.metrics import f1_score

Average Precision (AP)

Average Precision is calculated as the mean of precision values at each rank where a relevant item is retrieved within the top k predictions.

Usage scenario: Evaluates how well relevant items are ranked within the top-k returned list.

from ir_eval_numba.metrics import average_precision

Mean Average Precision (MAP)

MAP is the mean of the Average Precision (AP - see above) scores computed for multiple queries.

Usage scenario: Reflects overall performance of AP for multiple queries. A good holistic metric that balances the tradeoff between recall and precision.

from ir_eval_numba.metrics import mean_average_precision

Normalized Discounted Cumulative Gain (nDCG)

nDCG evaluates the quality of a predicted ranking by comparing it to an ideal ranking (i.e., perfect ordering of relevant items). It accounts for the position of relevant items in the ranking, giving higher weight to items appearing earlier.

Usage scenario: Prioritize returning relevant items higher in the returned top-k list. A good holistic metric.

from ir_eval_numba.metrics import ndcg

Reciprocal Rank (RR)

Reciprocal Rank (RR) assigns a score based on the reciprocal of the rank at which the first relevant item is found.

Usage scenario: Useful when the topmost recommendation holds siginificant value. Use this when users are presented with one or very few returned results.

from ir_eval_numba.metrics import reciprocal_rank

Mean Reciprocal Rank (MRR)

MRR calculates the mean of the Reciprocal Rank (RR) scores for a set of queries.

Usage scenario: Reflects overall performance of RR for multiple queries.

from ir_eval_numba.metrics import mean_reciprocal_rank

Online Resources

Pinecone - Evaluation Measures in Information Retrieval

Spot Intelligence - Mean Average Precision

Spot Intelligence - Mean Reciprocal Rank

google-research/ials

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ir_eval_numba-1.1.0.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ir_eval_numba-1.1.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file ir_eval_numba-1.1.0.tar.gz.

File metadata

  • Download URL: ir_eval_numba-1.1.0.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.15

File hashes

Hashes for ir_eval_numba-1.1.0.tar.gz
Algorithm Hash digest
SHA256 58a477cf3fe19150520e06f2f999cea89816e044127f3d95f458602516f5ca53
MD5 772dfe4c0b61a869643eb0dc51ad3eb4
BLAKE2b-256 0102d44214d458ac878a66469c3bbb541a30eb4145691c237029d63ddef4265a

See more details on using hashes here.

File details

Details for the file ir_eval_numba-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ir_eval_numba-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 008641d805772a5a45561fe70e5e430819c3be044ea0eb8deb6d81c6bb9be207
MD5 53325cb8b339f7affe328378fa7f4dd8
BLAKE2b-256 676fbbfef1883b94a93e22bb60ce6c8d8e382d964a9fdf35efdab1317fc5e27d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page