SM-RS benchmark: data loaders and canonical task evaluators for the single- and multi-objective recommendations dataset.
Project description
SM-RS
The single- and multi-objective recommendations benchmark — self-declared user propensities (relevance · diversity · novelty · exploration) linked to contextual impressions, item selections, and perceived quality.
SM-RS is, to our knowledge, the only public recommender-systems dataset linking users' self-declared propensities toward beyond-accuracy objectives with contextual impressions, item selections, and explicit perceived-quality judgments. This repository is the benchmark code: data loaders and the canonical evaluator for each task, so everyone reports comparable numbers. The data lives on Hugging Face; the leaderboard lives in the dataset card.
Dataset & leaderboard: 🤗 pdokoupil/SM-RS Cite BOTH the SM-RS 2.0 (TORS'26) and SM-RS (SIGIR'24) papers — see Citing.
Install
pip install sm-rs # core (numpy / pandas / scikit-learn)
pip install "sm-rs[lstm]" # + TensorFlow, for the optional LSTM baseline
Quick start (Task 1: propensity estimation)
import numpy as np
from smrs.tasks import task1_propensity as t1
# reproducible 80/20 split (seed 2024), reading a local SM-RS copy for now
X_train, X_test, y_train, y_test = t1.split(data_dir="path/to/sm-rs")
# ... train your estimator, produce an (N, 4) array of [rel, div, nov, exp] ...
predictions = np.full((len(y_test), 4), 0.25) # placeholder
print(t1.evaluate(predictions, y_test)) # {'MAE': ..., 'MSE': ..., 'KLDiv': ...}
The six tasks
All six draw on the same dataset; they differ in what you predict and how it's
scored. The canonical evaluator for each lives in smrs.tasks.*.
| # | Task | You produce | Metric(s) | Evaluator |
|---|---|---|---|---|
| 1 | Propensity estimation | a 4-vector propensity per user | MAE · MSE · KL | ✅ task1_propensity |
| 2 | Results proportionality | a top-k list matching target propensities | MAE · KL · wSUM · Pearson ρ | ✅ task2_proportionality¹ |
| 3 | Selections-aware reranking | a reranked impression list | nDCG@10 · Precision@5 | ✅ task3_reranking |
| 4 | Diversity-metric definition | per-list diversity values | MAE · MSE · KL | ✅ task4_diversity |
| 5 | Perceived quality (5.1 rel / 5.2 div / 5.3 nov / 5.4 ser) | per-objective perception | MAE · MSE · Kendall τ | ✅ task5_perceived |
| 6 | Satisfaction (6.1 / 6.2) | overall satisfaction | MAE · MSE · Kendall τ | ✅ task6_satisfaction |
¹ Task 2's metric layer is implemented; turning a top-k list into achieved objective proportions needs the derived matrices (see Data) and lands with the source→rating-matrix builder.
Data
Two layers:
-
Core tables (the collected study data, CC-BY, hosted on Hugging Face):
behaviors,propensities,objective_perceptions,criteria_values,comparative_diversity,users,movies,books. Items are referenced by ID.Auto-downloaded from the Hub and cached — no manual download:
from smrs import data df = data.load("propensities") # downloads from HF, cached # offline / local copy (e.g. OSF download): df = data.load("propensities", data_dir="path/to/sm-rs") # or set $SMRS_DATA_DIR
Users of the 🤗
datasetslibrary can equivalently doload_dataset("pdokoupil/SM-RS", "behaviors"). -
Derived matrices — recomputed locally, not downloaded. The list-scoring tasks (2, 3) need per-item / per-pair artifacts (relevance via item-item, intra-list diversity via a distance matrix, novelty via popularity). Rather than ship multi-GB blobs — or redistribute the third-party catalogs they come from — the benchmark recomputes them from a rating matrix you build from your own download of the public source datasets (movies: MovieLens 25M — the "Latest" snapshot at collection time — plus MovieLens Tag Genome 2021; books: goodbooks-10k):
from smrs import derived art = derived.build_artifacts(rating_matrix_movies) # {item_item, distance_matrix, mean_popularities}
derivedprovides the deterministic pieces:popularity,cosine_distance(1 − cosine over item rating-vectors), andease_item_item(EASE^R closed form, used as the relevance model). This keeps the benchmark lightweight and license-clean. The bit-exact original artifacts are archived on OSF (v2) for strict reproduction.Get the source datasets with the bundled fetcher (downloads from the official hosts — GroupLens, the goodbooks repo — under their licenses; or place them there manually):
smrs-fetch --list # show sources + licenses, no download smrs-fetch --dest ./sm-rs-sources # download MovieLens 25M, Tag Genome 2021, goodbooks-10k
Why recompute? MovieLens may not be redistributed, and a 5 GB download hurts adoption. You obtain MovieLens/goodbooks under their own licenses; we ship only the study data, the id-maps (
movies.json: movieId→imdbId;books.json: book_index→goodreads_id), and the recompute code. The canonical scorer is pinned (sources above; positive feedback = rating ≥ 3; EASE λ) so results stay comparable. For strict reproduction of the paper's numbers, use the OSF artifacts.
Reproduction check
examples/reproduce_perceived.py reproduces the paper's Linear Regression
baseline for Tasks 5 & 6 (the one baseline needing no derived/source data), scored
with this package's evaluators — MAE matches the paper exactly:
| subtask | MAE (ours / paper) | MSE | Kendall τ |
|---|---|---|---|
| 5.1 relevance | 0.235 / 0.235 | 0.086 / 0.085 | 0.076 / 0.080 |
| 5.2 diversity | 0.222 / 0.222 | 0.080 / 0.061 | 0.197 / 0.196 |
| 5.3 novelty | 0.259 / 0.259 | 0.104 / 0.104 | 0.143 / 0.143 |
| 5.4 serendipity | 0.270 / 0.270 | 0.104 / 0.103 | 0.036 / 0.039 |
| 6 satisfaction | 0.255 / 0.255 | 0.102 / 0.102 | 0.039 / 0.045 |
SMRS_DATA_DIR=/path/to/sm-rs python examples/reproduce_perceived.py
Submitting to the leaderboard
Self-service (no submission server): run the canonical evaluate() for a task,
then open a PR adding your row to the leaderboard in the
dataset card, with a link to
reproduce. The shipped baselines are the rows to beat.
Citing
Please cite both papers (GitHub's "Cite this repository" reads
CITATION.cff):
@article{dokoupil2026smrs2,
author = {Dokoupil, Patrik and Peska, Ladislav},
title = {SM-RS 2.0: User-perceived Qualities of Single- and Multi-Objective Recommender Systems},
journal = {ACM Transactions on Recommender Systems},
volume = {4}, number = {3}, year = {2026},
doi = {10.1145/3754459}
}
@inproceedings{dokoupil2024smrs,
author = {Dokoupil, Patrik and Peska, Ladislav and Boratto, Ludovico},
title = {SM-RS: Single- and Multi-Objective Recommendations with Contextual Impressions and Beyond-Accuracy Propensity Scores},
booktitle = {Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval},
series = {SIGIR '24}, pages = {988--995}, year = {2024},
doi = {10.1145/3626772.3657863}
}
License
Code: MIT (see LICENSE). Data: CC-BY-4.0 (see the dataset card).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sm_rs-0.1.0.tar.gz.
File metadata
- Download URL: sm_rs-0.1.0.tar.gz
- Upload date:
- Size: 27.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62f7a41a18f8ba2a4daa66d97a903fda9939190338a46e62faf7223d0f7c6a38
|
|
| MD5 |
d3a083fa1554085f1c67e46700d2fefb
|
|
| BLAKE2b-256 |
13b6421bc272e356fd92f5741ea6225b6e0010e4b476f074a3c101b6000b8e26
|
File details
Details for the file sm_rs-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sm_rs-0.1.0-py3-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26b9a8051e7172f0510bd986fb529641e7b0e59022c694a7f6f7bff62a86c6ed
|
|
| MD5 |
518592be71a0fd4f83196dcda9731e34
|
|
| BLAKE2b-256 |
44b4baab5e2e1911fed14fb512eb7f62b0fcc5cef9d3f4b40c377290365ec4bb
|