Recipe-driven recommender training and serving on irspack.
Project description
Recotem
Recipe-driven recommender training and serving, built on
irspack. One YAML recipe describes
where the data lives, how to train, and where to write the result —
recotem train produces a signed binary artifact, recotem serve
mounts it as a /predict/{name} HTTP endpoint and hot-swaps when a new
artifact appears. No database, no message broker, no admin UI.
Why Recotem
Most recommender stacks pull in a service mesh of databases, queues, and control planes before you can train your first model. Recotem keeps the moving parts to a recipe file and a binary artifact:
- Single binary, two commands.
recotem trainruns as a batch job;recotem serveruns as a long-lived FastAPI process. They share nothing but the artifact file on disk (or object storage). - Reproducible by construction. Recipes are versioned with your code; artifacts are HMAC-signed with a SHA-checked header you can inspect without loading the model.
- Hot-swap, no restart. The serving process watches the artifact directory and atomically swaps the in-memory model when training emits a new file.
- Bring-your-own scheduler.
recotem trainis a normal process — drive it from cron, Airflow, a Kubernetes CronJob, or anything else.
Features
- Recipe-driven: 1 YAML = 1 model = 1
/predict/{name}endpoint - Hyperparameter search across irspack algorithms via Optuna
- Pluggable data sources (built-in: CSV / Parquet / BigQuery / SQL / GA4; extend via Python entry points)
- HMAC-signed artifacts with multi-key rotation and a deterministic FQCN allow-list at deserialization time
- API-key authentication (
X-API-Key); keys hashed at rest - fsspec paths everywhere — local, S3, GCS, HTTPS, anything fsspec speaks
- Optional Prometheus metrics endpoint, structured JSON logs with built-in secret redaction
Data Sources
- CSV / Parquet — local files or any fsspec-reachable URL (S3, GCS, Azure, HTTPS).
- BigQuery — SQL queries with Storage Read API support.
- SQL (PostgreSQL / MySQL / MariaDB / SQLite) — via SQLAlchemy 2. See
docs/data-sources/sql.md. - Google Analytics 4 — direct Data API integration (no BigQuery Export needed). See
docs/data-sources/ga4.md. - Custom plugins — implement the
DataSourceProtocol and register viarecotem.datasourcesentry-points.
Install
pip install recotem # core
pip install "recotem[bigquery]" # BigQuery data source
pip install "recotem[metrics]" # Prometheus metrics endpoint
pip install 'recotem[postgres]' # PostgreSQL via psycopg
pip install 'recotem[mysql]' # MySQL/MariaDB via PyMySQL
pip install 'recotem[sqlite]' # SQLite (stdlib)
pip install 'recotem[ga4]' # Google Analytics 4 Data API
Requires Python 3.12+. A multi-arch Docker image is published to
ghcr.io/codelibs/recotem.
Quickstart
The repository ships with a self-contained example at
examples/quickstart/ — recipe, dataset, and
artifact directory all in one place. Train a TopPop recommender from a
60-user CSV in under a minute.
# 1. Set demo keys. DEMO ONLY — for production, generate fresh keys with
# `recotem keygen --type signing` and `recotem keygen --type api`.
export RECOTEM_SIGNING_KEYS="dev:0000000000000000000000000000000000000000000000000000000000000000"
export RECOTEM_API_PLAINTEXT="recotem-quickstart-demo-key-0000"
export RECOTEM_API_KEYS="dev:sha256:21be5c3be85b8d68123df9f9b6a26d8e307db30350ea8bcc844883e22ebcf125"
# 2. Train, serve
recotem train examples/quickstart/recipe.yaml
recotem serve --recipes examples/quickstart/ &
# Wait for the server to become ready before sending traffic.
until curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health | grep -q "200"; do sleep 1; done
# 3. Predict
curl -X POST http://localhost:8080/predict/top_picks \
-H "X-API-Key: $RECOTEM_API_PLAINTEXT" \
-H "Content-Type: application/json" \
-d '{"user_id": "u01", "cutoff": 5}'
{
"items": [{"item_id": "i00", "score": 0.91}],
"model": {"recipe": "top_picks", "trained_at": "...",
"best_class": "TopPopRecommender", "kid": "dev"},
"request_id": "..."
}
The recipe itself is 11 lines — every other field has a sensible default.
See examples/quickstart/recipe.yaml
for the source of truth and
docs/recipe-reference.md for the full schema.
Which env var is needed where?
| Variable | Required by | Purpose |
|---|---|---|
RECOTEM_SIGNING_KEYS |
train and serve |
HMAC sign / verify artifact files (server keeps plaintext; needed for both sides) |
RECOTEM_API_KEYS |
serve |
Authenticate /predict callers (server keeps hash only) |
X-API-Key: <plaintext> |
HTTP clients | Sent by clients on every /predict call; server re-hashes and compares |
Both variables accept multiple comma-separated entries (kid:value,kid2:value,…)
to enable zero-downtime key rotation — that is why they are pluralised.
Architecture
┌────────────────────────────────────────────────────────────────────────┐
│ recotem (single Python package) │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ recipe.yaml ──▶ recotem train ──▶ artifact.recotem ──▶ recotem serve │
│ (batch job) (HMAC-signed) (FastAPI, │
│ hot-swap) │
│ │
│ any scheduler local FS, S3, POST /predict/{name}│
│ (cron / k8s / …) GCS, fsspec X-API-Key auth │
│ │
└────────────────────────────────────────────────────────────────────────┘
train and serve communicate only via signed artifact files. They
can run on different machines; the watcher swaps models per recipe based
on file mtime.
Documentation
- Getting started — Docker Compose / pip walkthrough end-to-end
- Recipe reference — every field documented
- Operations — key rotation, sizing, troubleshooting
- Security — threat model, IAM scopes, secrets handling
- Plugin authoring — write a custom data source
- Documentation index
Contributing
Issues and pull requests welcome. Development uses uv for dependency management:
uv sync --all-extras
uv run pytest tests
uv run ruff check src tests
See CLAUDE.md (or the project guidelines therein) for the full
contributor workflow.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file recotem-2.0.0a0.tar.gz.
File metadata
- Download URL: recotem-2.0.0a0.tar.gz
- Upload date:
- Size: 172.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf5bbe0b758972528cf2ec7e4000f57e598ed9b2a3d89bafb3e81f2763ebe057
|
|
| MD5 |
004eb9bfacd7e4a36405af548375a6d6
|
|
| BLAKE2b-256 |
3649f550f5029dd0bc272286f3bdd67ba901520e861e6fe4cd65a3aaaf85a8b0
|
Provenance
The following attestation bundles were made for recotem-2.0.0a0.tar.gz:
Publisher:
publish.yml on codelibs/recotem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
recotem-2.0.0a0.tar.gz -
Subject digest:
cf5bbe0b758972528cf2ec7e4000f57e598ed9b2a3d89bafb3e81f2763ebe057 - Sigstore transparency entry: 1588869057
- Sigstore integration time:
-
Permalink:
codelibs/recotem@9444999b22bff1b4e4f771e2460fb0c8b63f74d6 -
Branch / Tag:
refs/tags/v2.0.0a0 - Owner: https://github.com/codelibs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9444999b22bff1b4e4f771e2460fb0c8b63f74d6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file recotem-2.0.0a0-py3-none-any.whl.
File metadata
- Download URL: recotem-2.0.0a0-py3-none-any.whl
- Upload date:
- Size: 197.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc79bfaa07f257e09f8d4f11ea77aa3343daf1f887c34ba7403c5a5bfbc1d5a8
|
|
| MD5 |
dfe4fb4111ba8e5661c72c2936c5bc04
|
|
| BLAKE2b-256 |
8acead466242820f58b2d9f1fcbfd64da6ac62436e062fa849b37d833c1f0e69
|
Provenance
The following attestation bundles were made for recotem-2.0.0a0-py3-none-any.whl:
Publisher:
publish.yml on codelibs/recotem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
recotem-2.0.0a0-py3-none-any.whl -
Subject digest:
dc79bfaa07f257e09f8d4f11ea77aa3343daf1f887c34ba7403c5a5bfbc1d5a8 - Sigstore transparency entry: 1588869139
- Sigstore integration time:
-
Permalink:
codelibs/recotem@9444999b22bff1b4e4f771e2460fb0c8b63f74d6 -
Branch / Tag:
refs/tags/v2.0.0a0 - Owner: https://github.com/codelibs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9444999b22bff1b4e4f771e2460fb0c8b63f74d6 -
Trigger Event:
push
-
Statement type: