Skip to main content

SM-RS benchmark: data loaders and canonical task evaluators for the single- and multi-objective recommendations dataset.

Project description

SM-RS

The single- and multi-objective recommendations benchmark — self-declared user propensities (relevance · diversity · novelty · exploration) linked to contextual impressions, item selections, and perceived quality.

PyPI CI License: MIT Dataset

SM-RS is, to our knowledge, the only public recommender-systems dataset linking users' self-declared propensities toward beyond-accuracy objectives with contextual impressions, item selections, and explicit perceived-quality judgments. This repository is the benchmark code: data loaders and the canonical evaluator for each task, so everyone reports comparable numbers. The data lives on Hugging Face; the leaderboard lives in the dataset card.

Dataset & leaderboard: 🤗 pdokoupil/SM-RS Cite BOTH the SM-RS 2.0 (TORS'26) and SM-RS (SIGIR'24) papers — see Citing.

Install

pip install sm-rs                 # core (numpy / pandas / scikit-learn)
pip install "sm-rs[lstm]"         # + TensorFlow, for the optional LSTM baseline

Quick start (Task 1: propensity estimation)

import numpy as np
from smrs.tasks import task1_propensity as t1

# reproducible 80/20 split (seed 2024), reading a local SM-RS copy for now
X_train, X_test, y_train, y_test = t1.split(data_dir="path/to/sm-rs")

# ... train your estimator, produce an (N, 4) array of [rel, div, nov, exp] ...
predictions = np.full((len(y_test), 4), 0.25)        # placeholder

print(t1.evaluate(predictions, y_test))   # {'MAE': ..., 'MSE': ..., 'KLDiv': ...}

The six tasks

All six draw on the same dataset; they differ in what you predict and how it's scored. The canonical evaluator for each lives in smrs.tasks.*.

# Task You produce Metric(s) Evaluator
1 Propensity estimation a 4-vector propensity per user MAE · MSE · KL task1_propensity
2 Results proportionality a top-k list matching target propensities MAE · KL · wSUM · Pearson ρ task2_proportionality¹
3 Selections-aware reranking a reranked impression list nDCG@10 · Precision@5 task3_reranking
4 Diversity-metric definition per-list diversity values MAE · MSE · KL task4_diversity
5 Perceived quality (5.1 rel / 5.2 div / 5.3 nov / 5.4 ser) per-objective perception MAE · MSE · Kendall τ task5_perceived
6 Satisfaction (6.1 / 6.2) overall satisfaction MAE · MSE · Kendall τ task6_satisfaction

¹ Task 2's metric layer is implemented; turning a top-k list into achieved objective proportions needs the derived matrices (see Data) and lands with the source→rating-matrix builder.

Data

Two layers:

  1. Core tables (the collected study data, CC-BY, hosted on Hugging Face): behaviors, propensities, objective_perceptions, criteria_values, comparative_diversity, users, movies, books. Items are referenced by ID.

    Auto-downloaded from the Hub and cached — no manual download:

    from smrs import data
    df = data.load("propensities")                 # downloads from HF, cached
    # offline / local copy (e.g. OSF download):
    df = data.load("propensities", data_dir="path/to/sm-rs")   # or set $SMRS_DATA_DIR
    

    Users of the 🤗 datasets library can equivalently do load_dataset("pdokoupil/SM-RS", "behaviors").

  2. Derived matricesrecomputed locally, not downloaded. The list-scoring tasks (2, 3) need per-item / per-pair artifacts (relevance via item-item, intra-list diversity via a distance matrix, novelty via popularity). Rather than ship multi-GB blobs — or redistribute the third-party catalogs they come from — the benchmark recomputes them from a rating matrix you build from your own download of the public source datasets (movies: MovieLens 25M — the "Latest" snapshot at collection time — plus MovieLens Tag Genome 2021; books: goodbooks-10k):

    from smrs import derived
    art = derived.build_artifacts(rating_matrix_movies)   # {item_item, distance_matrix, mean_popularities}
    

    derived provides the deterministic pieces: popularity, cosine_distance (1 − cosine over item rating-vectors), and ease_item_item (EASE^R closed form, used as the relevance model). This keeps the benchmark lightweight and license-clean. The bit-exact original artifacts are archived on OSF (v2) for strict reproduction.

    Get the source datasets with the bundled fetcher (downloads from the official hosts — GroupLens, the goodbooks repo — under their licenses; or place them there manually):

    smrs-fetch --list                      # show sources + licenses, no download
    smrs-fetch --dest ./sm-rs-sources      # download MovieLens 25M, Tag Genome 2021, goodbooks-10k
    

Why recompute? MovieLens may not be redistributed, and a 5 GB download hurts adoption. You obtain MovieLens/goodbooks under their own licenses; we ship only the study data, the id-maps (movies.json: movieId→imdbId; books.json: book_index→goodreads_id), and the recompute code. The canonical scorer is pinned (sources above; positive feedback = rating ≥ 3; EASE λ) so results stay comparable. For strict reproduction of the paper's numbers, use the OSF artifacts.

Reproduction check

examples/reproduce_perceived.py reproduces the paper's Linear Regression baseline for Tasks 5 & 6 (the one baseline needing no derived/source data), scored with this package's evaluators — MAE matches the paper exactly:

subtask MAE (ours / paper) MSE Kendall τ
5.1 relevance 0.235 / 0.235 0.086 / 0.085 0.076 / 0.080
5.2 diversity 0.222 / 0.222 0.080 / 0.061 0.197 / 0.196
5.3 novelty 0.259 / 0.259 0.104 / 0.104 0.143 / 0.143
5.4 serendipity 0.270 / 0.270 0.104 / 0.103 0.036 / 0.039
6 satisfaction 0.255 / 0.255 0.102 / 0.102 0.039 / 0.045
SMRS_DATA_DIR=/path/to/sm-rs python examples/reproduce_perceived.py

Submitting to the leaderboard

Self-service (no submission server): run the canonical evaluate() for a task, then open a PR adding your row to the leaderboard in the dataset card, with a link to reproduce. The shipped baselines are the rows to beat.

Citing

Please cite both papers (GitHub's "Cite this repository" reads CITATION.cff):

@article{dokoupil2026smrs2,
  author  = {Dokoupil, Patrik and Peska, Ladislav},
  title   = {SM-RS 2.0: User-perceived Qualities of Single- and Multi-Objective Recommender Systems},
  journal = {ACM Transactions on Recommender Systems},
  volume  = {4}, number = {3}, year = {2026},
  doi     = {10.1145/3754459}
}
@inproceedings{dokoupil2024smrs,
  author    = {Dokoupil, Patrik and Peska, Ladislav and Boratto, Ludovico},
  title     = {SM-RS: Single- and Multi-Objective Recommendations with Contextual Impressions and Beyond-Accuracy Propensity Scores},
  booktitle = {Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  series    = {SIGIR '24}, pages = {988--995}, year = {2024},
  doi       = {10.1145/3626772.3657863}
}

License

Code: MIT (see LICENSE). Data: CC-BY-4.0 (see the dataset card).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sm_rs-0.1.0.tar.gz (27.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sm_rs-0.1.0-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file sm_rs-0.1.0.tar.gz.

File metadata

  • Download URL: sm_rs-0.1.0.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for sm_rs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 62f7a41a18f8ba2a4daa66d97a903fda9939190338a46e62faf7223d0f7c6a38
MD5 d3a083fa1554085f1c67e46700d2fefb
BLAKE2b-256 13b6421bc272e356fd92f5741ea6225b6e0010e4b476f074a3c101b6000b8e26

See more details on using hashes here.

File details

Details for the file sm_rs-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sm_rs-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.12

File hashes

Hashes for sm_rs-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 26b9a8051e7172f0510bd986fb529641e7b0e59022c694a7f6f7bff62a86c6ed
MD5 518592be71a0fd4f83196dcda9731e34
BLAKE2b-256 44b4baab5e2e1911fed14fb512eb7f62b0fcc5cef9d3f4b40c377290365ec4bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page