Skip to main content

Framework-agnostic recsys benchmarking harness — pinned benchmarks, pluggable algorithms, reproducible runs.

Project description

recsys

A framework-agnostic benchmarking harness for recommendation algorithms. Benchmarks are pinned contracts (dataset + split + eval protocol + metrics), algorithms plug in behind a narrow protocol, and every run is persisted to a parquet store so multi-seed comparisons are one command away.

Quickstart

One-time setup:

uv sync

Datasets.

  • MovieLens 20m must be extracted manually to ./datasets/ml-20m/ (the directory containing ratings.csv, movies.csv, tags.csv, ...). There is no auto-download for MovieLens.
  • KuaiRec and KuaiRand auto-download from Zenodo on first use and cache under ./datasets/. The loaders (src/recsys/data/kuairec.py, kuairand.py) expose download(dataset_root=...) and load(dataset_root=...) with a default that resolves to the repo-root datasets/ directory regardless of CWD. Partial downloads are rejected (Content-Length verified) so a stalled transfer cannot poison the cache. Known caveat: Zenodo is currently throttling the 432 MB KuaiRec.zip at ~0-22 KB/s, so the first KuaiRec run may be slow; KuaiRand-Pure (47 MB) finishes in minutes.

Run an experiment (benchmark x algorithm x seeds):

uv run recsys bench --experiment conf/experiments/deepfm_on_movielens_ctr.yaml     --seeds 1,2,3
uv run recsys bench --experiment conf/experiments/din_on_movielens_seq.yaml        --seeds 1,2,3
uv run recsys bench --experiment conf/experiments/popularity_on_movielens_ctr.yaml --seeds 1,2,3
uv run recsys bench --experiment conf/experiments/popularity_on_kuairand_ctr.yaml  --seeds 1,2,3
uv run recsys bench --experiment conf/experiments/popularity_on_kuairec_ctr.yaml   --seeds 1,2,3

View aggregated results (mean +/- std across seeds):

uv run recsys report --benchmark movielens_ctr
uv run recsys report --benchmark movielens_seq
uv run recsys report --benchmark kuairand_ctr
uv run recsys report --benchmark kuairec_ctr

List what's registered:

uv run recsys list benchmarks
uv run recsys list algorithms

Results land in results/<benchmark>.parquet (git-ignored). The default experiment YAMLs bake in a fast "smoke" trainer profile (limit_train_batches: 4, limit_val_batches: 4, max_epochs: 1) so the gate runs in seconds; remove those overrides for a real training run.

Adding a new algorithm

  1. Pick a backend.
    • Classical / non-neural: subclass recsys.algorithms.base.Algorithm under src/recsys/algorithms/classical/. Do not import torch at module scope. The runner branches on isinstance(algo, Algorithm) and calls fit(train, val) directly — no Lightning Trainer, no nn.Module shim.
    • Lightning-backed neural: add a plain nn.Module under src/recsys/algorithms/torch/ (see deepfm.py, din.py). The runner wraps it in engine.CTRTask and runs L.Trainer.fit against the benchmark's datamodule.
  2. Declare compatibility. Set class attributes supported_tasks: set[TaskType] and required_roles: set[str] (role names from FeatureRole: user, item, context, sequence, group, label).
  3. Register. Decorate the class with @ALGO_REGISTRY.register("my_algo") from recsys.utils.
  4. Side-effect import. Add the module to src/recsys/algorithms/classical/__init__.py or src/recsys/algorithms/torch/__init__.py so the decorator fires at import time. This step is mandatory. If you skip it, recsys list algorithms will not show your algo.
  5. Write a config. Add conf/algorithms/my_algo.yaml with the constructor kwargs.
  6. Write an experiment. Add conf/experiments/my_algo_on_<benchmark>.yaml that points at the benchmark and algo YAMLs and sets seeds: + optional trainer: overrides.

Adding a new benchmark

  1. Builder. Add a data builder under src/recsys/data/builders/ that produces a DatasetBundle (train/val/test + feature_map + feature_specs). Reuse the split modules under data/splits/ and negative samplers under data/negatives/; write new ones there if needed.
  2. Datamodule (optional). If the benchmark feeds a Lightning-backed algo, wrap the builder in a BuilderDataModule under src/recsys/data/datamodules/.
  3. Benchmark class. Create src/recsys/benchmarks/my_bench.py that subclasses recsys.benchmarks.base.Benchmark, pins its Task (CTR / retrieval / sequential) and its metric_names list, and implements build() -> BenchmarkData and version().
  4. Register. @BENCHMARK_REGISTRY.register("my_bench") and import the module from src/recsys/benchmarks/__init__.py.
  5. Config. Add conf/benchmarks/my_bench.yaml with the data/eval blocks the benchmark class consumes.

Metrics and splits are pinned by the benchmark class, not by config. If you need a different metric set or a different split, that is a new benchmark, not a new config — this is what makes benchmark comparisons meaningful.

Architecture at a glance

FeatureSpec + FeatureRole (src/recsys/schemas/features.py). Every column in a benchmark is annotated with a role (user, item, context, sequence, group, label). Builders partition columns by role and algos declare which roles they need; the runner can reject incompatible combinations before fit is called.

Algorithm (src/recsys/algorithms/base.py). A framework-agnostic protocol: fit(train, val), predict_scores(batch), predict_topk(users, k, candidates), save/load. Classical baselines like popularity live under algorithms/classical/, inherit directly from Algorithm, and never import torch at module scope — the runner bypasses Lightning for them entirely. Neural models live under algorithms/torch/ as plain nn.Module subclasses; the runner wraps them in an engine.CTRTask Lightning module at fit time.

Task (src/recsys/tasks/base.py). Declares the I/O contract between an algo and the evaluator: which roles are required, which prediction method is called, and how predictions turn into metric values. v1 ships CTRTask, RetrievalTask, and SequentialTask.

Benchmark (src/recsys/benchmarks/base.py). Immutable bundle of dataset + task + split + eval protocol + metric set. build() returns a BenchmarkData with train/val/test, feature map, feature specs, and candidate item list. version() is a hash of (dataset, split, eval_protocol) so result rows are traceable.

Runtime flow. recsys bench loads an experiment YAML and calls recsys.runner.run_experiment(algo_cfg, benchmark_cfg, seed, trainer_overrides, results_dir, store). That function builds the benchmark, builds the algo, and then branches: classical algos (isinstance(algo, Algorithm)) go through algo.fit(train, val) directly with no Lightning involvement; neural algos get wrapped in engine.CTRTask and trained via L.Trainer.fit against the benchmark's datamodule. The fitted algo (or Lightning task) is handed to benchmark.task.evaluate(...), and a RunResult row is written to results/<benchmark>.parquet. recsys report reads the same parquet and prints a mean +/- std table across seeds.

Scope (v1 + landed v2.0 work)

  • Benchmarks: movielens_ctr, movielens_seq, kuairec_ctr, kuairand_ctr. KuaiRec/KuaiRand loaders auto-download the archives from Zenodo to ./datasets/ on first use; subsequent runs hit the cache.
  • Algorithms: deepfm (CTR), din (sequential, with working ranking metrics), popularity (classical baseline — bypasses Lightning entirely via the framework-agnostic fit path).
  • Metrics: AUC, LogLoss for CTR; NDCG@{10,50}, Recall@{10,50}, HR@{10,50}, MRR for ranking. Ranking metrics now compute correctly for sequential dict-batch algos like DIN.
  • Infrastructure: parquet result store keyed by (benchmark, algo, config_hash, seed, timestamp), argparse CLI (bench, report, list), multi-seed runs as the default.

Deferred to v2

Session / conversational / cold-start tasks, beyond-accuracy metrics (coverage, diversity, novelty, fairness), statistical significance tests, experiment-tracker hooks (MLflow / W&B), hyperparameter sweeps, a typer-based CLI, additional classical / neural baselines (item-KNN, BPR-MF, SASRec, BERT4Rec, ...), and additional benchmarks (Amazon Reviews, Yoochoose / Diginetica / RetailRocket, MovieLens 1M, Netflix Prize, Yelp, Last.fm, Criteo / Avazu). See docs/dev.md for the full v1 plan and v2 roadmap.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recsysk-0.1.0.tar.gz (86.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

recsysk-0.1.0-py3-none-any.whl (122.9 kB view details)

Uploaded Python 3

File details

Details for the file recsysk-0.1.0.tar.gz.

File metadata

  • Download URL: recsysk-0.1.0.tar.gz
  • Upload date:
  • Size: 86.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.3

File hashes

Hashes for recsysk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 21499683971aa5363711b1f7f7907354fce4f9a52b55c7c6431d98eacc60ee7e
MD5 f012d0fad731e12cba537642753fd259
BLAKE2b-256 99bfaa33523983f1af0486ff414c2e250d57c8035527c9fdfde8ac5a4524db6d

See more details on using hashes here.

File details

Details for the file recsysk-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: recsysk-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 122.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.3

File hashes

Hashes for recsysk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b359552889e3b8703fdf4bfeaaeb9c7a84b5a78a87d2399538f7dd954775ab35
MD5 f93f2ba626f21455b2eb5bb33970377d
BLAKE2b-256 0b6bbaa63aed02c2064189b20d0991c4c9e228a3fd7db5f2327dcd93239f955f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page