Skip to main content

Simple library to calculate Rank-biased Overlap between two lists

Project description

Rank-biased Overlap (RBO)

CircleCI PyPI version

This project contains a Python implementation of Rank-Biased Overlap (RBO) from: Webber, William, Alistair Moffat, and Justin Zobel. "A similarity measure for indefinite rankings." ACM Transactions on Information Systems (TOIS) 28.4 (2010): 20." (Download).

Introduction

For a more general introduction, please refer to this blog post.

RBO compares two ranked lists, and returns a numeric value between zero and one to quantify their similarity. A RBO value of zero indicates the lists are completely different, and a RBO of one means completely identical. The terms 'different' and 'identical' require a little more clarification.

Given two ranked lists:

A = ["a", "b", "c", "d", "e"]
B = ["e", "d", "c", "b", "a"]

We can see that both of them rank 5 items ("a", "b", "c", "d" and "e"), but with completely opposite order. In this case the similarity between A and B should be larger than 0 (as they contain the same items, namely, conjoint), but smaller than 1 (as the order of the items are different). If there is third ranked list

C = ["f", "g", "h", "i", "j"]

which ranks 5 totally different items, then if we ask for the similarity between A and C, we should expect a value of 0. In such a non-conjoint case, we need to be able to calculate a similarity as well.

The RBO measure can handle ranked lists with different lengths as well, with proper extrapolation. For example, the RBO between the list A and list

D = ["a", "b", "c", "d", "e", "f", "g"]

will be 1.

Usage

Installation using pip

To install the RBO module to the current interpreter with Pip:

pip install rbo

Computing RBO

The RankingSimilarity class contains the calculation for the different flavours of RBO, with clear reference to the corresponding equations in the paper. Below shows how to compute the similarity of two ranked lists S and T:

In [1]: import rbo

In [2]: S = [1, 2, 3]

In [3]: T = [1, 3, 2]

In [4]: rbo.RankingSimilarity(S, T).rbo()
Out[4]: 0.8333333333333334

Accepted data types are Python lists and Numpy arrays. Using Pandas series is possible using the underlying Numpy array as shown below. This restriction is necessary, because using [] on a Pandas series queries the index, which might not number items contiguously, or might even be non-numeric.

In [1]: import pandas as pd

In [2]: import rbo

In [3]: S = [1, 2, 3]

In [4]: U = pd.Series([1, 3, 2])

In [5]: rbo.RankingSimilarity(S, U.values).rbo()
Out[5]: 0.8333333333333334

Computing extrapolated RBO

There is an extension of the vanilla RBO implementation, in which we extrapolate from the visible lists, and assume that the degree of agreement seen up to depth $k$ is continued indefinitely.

This extrapolated version is implemented as the RankingSimilarity.rbo_ext() method.

Development

Refer to the Makefile for supplementary tasks to development, e.g., executing unit tests, or checking for proper packaging. Please let me know if there is any issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rbo-0.1.3.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

rbo-0.1.3-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file rbo-0.1.3.tar.gz.

File metadata

  • Download URL: rbo-0.1.3.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.1 Darwin/22.2.0

File hashes

Hashes for rbo-0.1.3.tar.gz
Algorithm Hash digest
SHA256 14410a38d1d5b26c6e2841098f81d3771f324d27d9cb3dc1ae53f467d845d30f
MD5 68d1af3373271f9f5f3f67259634dac8
BLAKE2b-256 bb4668f4b51550bb00bcced190518bf8ffa1172aa0c41aa5c9efb7cf98138746

See more details on using hashes here.

File details

Details for the file rbo-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: rbo-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.11.1 Darwin/22.2.0

File hashes

Hashes for rbo-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9f5b90bdca6c91e05126112d5ff3625b27835981f7da68e5143bf01120175a1f
MD5 08160964893e536a78967f13482a1b6f
BLAKE2b-256 f0b3aa1923e0ed19ecf190f7e8d9fe939f9020dd601b64e190b1f58b3692be8e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page