Framework-agnostic recsys benchmarking harness — pinned benchmarks, pluggable algorithms, reproducible runs.
Project description
recsys
A framework-agnostic benchmarking harness for recommendation algorithms. Benchmarks are pinned contracts (dataset + split + eval protocol + metrics), algorithms plug in behind a narrow protocol, and every run is persisted to a parquet store so multi-seed comparisons are one command away.
Quickstart
One-time setup:
uv sync
Datasets.
- MovieLens 20m must be extracted manually to
./datasets/ml-20m/(the directory containingratings.csv,movies.csv,tags.csv, ...). There is no auto-download for MovieLens. - KuaiRec and KuaiRand auto-download from Zenodo on first use and cache under
./datasets/. The loaders (src/recsys/data/kuairec.py,kuairand.py) exposedownload(dataset_root=...)andload(dataset_root=...)with a default that resolves to the repo-rootdatasets/directory regardless of CWD. Partial downloads are rejected (Content-Length verified) so a stalled transfer cannot poison the cache. Known caveat: Zenodo is currently throttling the 432 MBKuaiRec.zipat ~0-22 KB/s, so the first KuaiRec run may be slow; KuaiRand-Pure (47 MB) finishes in minutes.
Run an experiment (benchmark x algorithm x seeds):
uv run recsys bench --experiment conf/experiments/deepfm_on_movielens_ctr.yaml --seeds 1,2,3
uv run recsys bench --experiment conf/experiments/din_on_movielens_seq.yaml --seeds 1,2,3
uv run recsys bench --experiment conf/experiments/popularity_on_movielens_ctr.yaml --seeds 1,2,3
uv run recsys bench --experiment conf/experiments/popularity_on_kuairand_ctr.yaml --seeds 1,2,3
uv run recsys bench --experiment conf/experiments/popularity_on_kuairec_ctr.yaml --seeds 1,2,3
View aggregated results (mean +/- std across seeds):
uv run recsys report --benchmark movielens_ctr
uv run recsys report --benchmark movielens_seq
uv run recsys report --benchmark kuairand_ctr
uv run recsys report --benchmark kuairec_ctr
List what's registered:
uv run recsys list benchmarks
uv run recsys list algorithms
Results land in results/<benchmark>.parquet (git-ignored). The default experiment YAMLs bake in a fast "smoke" trainer profile (limit_train_batches: 4, limit_val_batches: 4, max_epochs: 1) so the gate runs in seconds; remove those overrides for a real training run.
Adding a new algorithm
- Pick a backend.
- Classical / non-neural: subclass
recsys.algorithms.base.Algorithmundersrc/recsys/algorithms/classical/. Do not importtorchat module scope. The runner branches onisinstance(algo, Algorithm)and callsfit(train, val)directly — no LightningTrainer, nonn.Moduleshim. - Lightning-backed neural: add a plain
nn.Moduleundersrc/recsys/algorithms/torch/(seedeepfm.py,din.py). The runner wraps it inengine.CTRTaskand runsL.Trainer.fitagainst the benchmark's datamodule.
- Classical / non-neural: subclass
- Declare compatibility. Set class attributes
supported_tasks: set[TaskType]andrequired_roles: set[str](role names fromFeatureRole:user,item,context,sequence,group,label). - Register. Decorate the class with
@ALGO_REGISTRY.register("my_algo")fromrecsys.utils. - Side-effect import. Add the module to
src/recsys/algorithms/classical/__init__.pyorsrc/recsys/algorithms/torch/__init__.pyso the decorator fires at import time. This step is mandatory. If you skip it,recsys list algorithmswill not show your algo. - Write a config. Add
conf/algorithms/my_algo.yamlwith the constructor kwargs. - Write an experiment. Add
conf/experiments/my_algo_on_<benchmark>.yamlthat points at the benchmark and algo YAMLs and setsseeds:+ optionaltrainer:overrides.
Adding a new benchmark
- Builder. Add a data builder under
src/recsys/data/builders/that produces aDatasetBundle(train/val/test +feature_map+feature_specs). Reuse the split modules underdata/splits/and negative samplers underdata/negatives/; write new ones there if needed. - Datamodule (optional). If the benchmark feeds a Lightning-backed algo, wrap the builder in a
BuilderDataModuleundersrc/recsys/data/datamodules/. - Benchmark class. Create
src/recsys/benchmarks/my_bench.pythat subclassesrecsys.benchmarks.base.Benchmark, pins itsTask(CTR / retrieval / sequential) and itsmetric_nameslist, and implementsbuild() -> BenchmarkDataandversion(). - Register.
@BENCHMARK_REGISTRY.register("my_bench")and import the module fromsrc/recsys/benchmarks/__init__.py. - Config. Add
conf/benchmarks/my_bench.yamlwith the data/eval blocks the benchmark class consumes.
Metrics and splits are pinned by the benchmark class, not by config. If you need a different metric set or a different split, that is a new benchmark, not a new config — this is what makes benchmark comparisons meaningful.
Architecture at a glance
FeatureSpec + FeatureRole (src/recsys/schemas/features.py). Every column in a benchmark is annotated with a role (user, item, context, sequence, group, label). Builders partition columns by role and algos declare which roles they need; the runner can reject incompatible combinations before fit is called.
Algorithm (src/recsys/algorithms/base.py). A framework-agnostic protocol: fit(train, val), predict_scores(batch), predict_topk(users, k, candidates), save/load. Classical baselines like popularity live under algorithms/classical/, inherit directly from Algorithm, and never import torch at module scope — the runner bypasses Lightning for them entirely. Neural models live under algorithms/torch/ as plain nn.Module subclasses; the runner wraps them in an engine.CTRTask Lightning module at fit time.
Task (src/recsys/tasks/base.py). Declares the I/O contract between an algo and the evaluator: which roles are required, which prediction method is called, and how predictions turn into metric values. v1 ships CTRTask, RetrievalTask, and SequentialTask.
Benchmark (src/recsys/benchmarks/base.py). Immutable bundle of dataset + task + split + eval protocol + metric set. build() returns a BenchmarkData with train/val/test, feature map, feature specs, and candidate item list. version() is a hash of (dataset, split, eval_protocol) so result rows are traceable.
Runtime flow. recsys bench loads an experiment YAML and calls recsys.runner.run_experiment(algo_cfg, benchmark_cfg, seed, trainer_overrides, results_dir, store). That function builds the benchmark, builds the algo, and then branches: classical algos (isinstance(algo, Algorithm)) go through algo.fit(train, val) directly with no Lightning involvement; neural algos get wrapped in engine.CTRTask and trained via L.Trainer.fit against the benchmark's datamodule. The fitted algo (or Lightning task) is handed to benchmark.task.evaluate(...), and a RunResult row is written to results/<benchmark>.parquet. recsys report reads the same parquet and prints a mean +/- std table across seeds.
Scope (v1 + landed v2.0 work)
- Benchmarks:
movielens_ctr,movielens_seq,kuairec_ctr,kuairand_ctr. KuaiRec/KuaiRand loaders auto-download the archives from Zenodo to./datasets/on first use; subsequent runs hit the cache. - Algorithms:
deepfm(CTR),din(sequential, with working ranking metrics),popularity(classical baseline — bypasses Lightning entirely via the framework-agnostic fit path). - Metrics: AUC, LogLoss for CTR; NDCG@{10,50}, Recall@{10,50}, HR@{10,50}, MRR for ranking. Ranking metrics now compute correctly for sequential dict-batch algos like DIN.
- Infrastructure: parquet result store keyed by
(benchmark, algo, config_hash, seed, timestamp), argparse CLI (bench,report,list), multi-seed runs as the default.
Deferred to v2
Session / conversational / cold-start tasks, beyond-accuracy metrics (coverage, diversity, novelty, fairness), statistical significance tests, experiment-tracker hooks (MLflow / W&B), hyperparameter sweeps, a typer-based CLI, additional classical / neural baselines (item-KNN, BPR-MF, SASRec, BERT4Rec, ...), and additional benchmarks (Amazon Reviews, Yoochoose / Diginetica / RetailRocket, MovieLens 1M, Netflix Prize, Yelp, Last.fm, Criteo / Avazu). See docs/dev.md for the full v1 plan and v2 roadmap.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file recsysk-0.1.0.tar.gz.
File metadata
- Download URL: recsysk-0.1.0.tar.gz
- Upload date:
- Size: 86.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21499683971aa5363711b1f7f7907354fce4f9a52b55c7c6431d98eacc60ee7e
|
|
| MD5 |
f012d0fad731e12cba537642753fd259
|
|
| BLAKE2b-256 |
99bfaa33523983f1af0486ff414c2e250d57c8035527c9fdfde8ac5a4524db6d
|
File details
Details for the file recsysk-0.1.0-py3-none-any.whl.
File metadata
- Download URL: recsysk-0.1.0-py3-none-any.whl
- Upload date:
- Size: 122.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b359552889e3b8703fdf4bfeaaeb9c7a84b5a78a87d2399538f7dd954775ab35
|
|
| MD5 |
f93f2ba626f21455b2eb5bb33970377d
|
|
| BLAKE2b-256 |
0b6bbaa63aed02c2064189b20d0991c4c9e228a3fd7db5f2327dcd93239f955f
|