Skip to main content

Recipe-driven recommender training and serving on irspack.

Project description

Recotem

PyPI Python License CI

Recipe-driven recommender training and serving, built on irspack. One YAML recipe describes where the data lives, how to train, and where to write the result — recotem train produces a signed binary artifact, recotem serve mounts it as a /predict/{name} HTTP endpoint and hot-swaps when a new artifact appears. No database, no message broker, no admin UI.

Why Recotem

Most recommender stacks pull in a service mesh of databases, queues, and control planes before you can train your first model. Recotem keeps the moving parts to a recipe file and a binary artifact:

  • Single binary, two commands. recotem train runs as a batch job; recotem serve runs as a long-lived FastAPI process. They share nothing but the artifact file on disk (or object storage).
  • Reproducible by construction. Recipes are versioned with your code; artifacts are HMAC-signed with a SHA-checked header you can inspect without loading the model.
  • Hot-swap, no restart. The serving process watches the artifact directory and atomically swaps the in-memory model when training emits a new file.
  • Bring-your-own scheduler. recotem train is a normal process — drive it from cron, Airflow, a Kubernetes CronJob, or anything else.

Features

  • Recipe-driven: 1 YAML = 1 model = 1 /predict/{name} endpoint
  • Hyperparameter search across irspack algorithms via Optuna
  • Pluggable data sources (built-in: CSV / Parquet / BigQuery / SQL / GA4; extend via Python entry points)
  • HMAC-signed artifacts with multi-key rotation and a deterministic FQCN allow-list at deserialization time
  • API-key authentication (X-API-Key); keys hashed at rest
  • fsspec paths everywhere — local, S3, GCS, HTTPS, anything fsspec speaks
  • Optional Prometheus metrics endpoint, structured JSON logs with built-in secret redaction

Data Sources

  • CSV / Parquet — local files or any fsspec-reachable URL (S3, GCS, Azure, HTTPS).
  • BigQuery — SQL queries with Storage Read API support.
  • SQL (PostgreSQL / MySQL / MariaDB / SQLite) — via SQLAlchemy 2. See docs/data-sources/sql.md.
  • Google Analytics 4 — direct Data API integration (no BigQuery Export needed). See docs/data-sources/ga4.md.
  • Custom plugins — implement the DataSource Protocol and register via recotem.datasources entry-points.

Install

pip install recotem                 # core
pip install "recotem[bigquery]"     # BigQuery data source
pip install "recotem[metrics]"      # Prometheus metrics endpoint
pip install 'recotem[postgres]'     # PostgreSQL via psycopg
pip install 'recotem[mysql]'        # MySQL/MariaDB via PyMySQL
pip install 'recotem[sqlite]'       # SQLite (stdlib)
pip install 'recotem[ga4]'          # Google Analytics 4 Data API

Requires Python 3.12+. A multi-arch Docker image is published to ghcr.io/codelibs/recotem.

Quickstart

The repository ships with a self-contained example at examples/quickstart/ — recipe, dataset, and artifact directory all in one place. Train a TopPop recommender from a 60-user CSV in under a minute.

# 1. Set demo keys. DEMO ONLY — for production, generate fresh keys with
#    `recotem keygen --type signing` and `recotem keygen --type api`.
export RECOTEM_SIGNING_KEYS="dev:0000000000000000000000000000000000000000000000000000000000000000"
export RECOTEM_API_PLAINTEXT="recotem-quickstart-demo-key-0000"
export RECOTEM_API_KEYS="dev:sha256:21be5c3be85b8d68123df9f9b6a26d8e307db30350ea8bcc844883e22ebcf125"

# 2. Train, serve
recotem train examples/quickstart/recipe.yaml
recotem serve --recipes examples/quickstart/ &

# Wait for the server to become ready before sending traffic.
until curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health | grep -q "200"; do sleep 1; done

# 3. Predict
curl -X POST http://localhost:8080/predict/top_picks \
  -H "X-API-Key: $RECOTEM_API_PLAINTEXT" \
  -H "Content-Type: application/json" \
  -d '{"user_id": "u01", "cutoff": 5}'
{
  "items": [{"item_id": "i00", "score": 0.91}],
  "model": {"recipe": "top_picks", "trained_at": "...",
            "best_class": "TopPopRecommender", "kid": "dev"},
  "request_id": "..."
}

The recipe itself is 11 lines — every other field has a sensible default. See examples/quickstart/recipe.yaml for the source of truth and docs/recipe-reference.md for the full schema.

Which env var is needed where?

Variable Required by Purpose
RECOTEM_SIGNING_KEYS train and serve HMAC sign / verify artifact files (server keeps plaintext; needed for both sides)
RECOTEM_API_KEYS serve Authenticate /predict callers (server keeps hash only)
X-API-Key: <plaintext> HTTP clients Sent by clients on every /predict call; server re-hashes and compares

Both variables accept multiple comma-separated entries (kid:value,kid2:value,…) to enable zero-downtime key rotation — that is why they are pluralised.

Architecture

┌────────────────────────────────────────────────────────────────────────┐
│                  recotem (single Python package)                       │
├────────────────────────────────────────────────────────────────────────┤
│                                                                        │
│   recipe.yaml ──▶ recotem train ──▶ artifact.recotem ──▶ recotem serve │
│                   (batch job)        (HMAC-signed)        (FastAPI,    │
│                                                            hot-swap)   │
│                                                                        │
│   any scheduler          local FS, S3,             POST /predict/{name}│
│   (cron / k8s / …)       GCS, fsspec               X-API-Key auth      │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

train and serve communicate only via signed artifact files. They can run on different machines; the watcher swaps models per recipe based on file mtime.

Documentation

Contributing

Issues and pull requests welcome. Development uses uv for dependency management:

uv sync --all-extras
uv run pytest tests
uv run ruff check src tests

See CLAUDE.md (or the project guidelines therein) for the full contributor workflow.

License

Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recotem-2.0.0a0.tar.gz (172.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

recotem-2.0.0a0-py3-none-any.whl (197.1 kB view details)

Uploaded Python 3

File details

Details for the file recotem-2.0.0a0.tar.gz.

File metadata

  • Download URL: recotem-2.0.0a0.tar.gz
  • Upload date:
  • Size: 172.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for recotem-2.0.0a0.tar.gz
Algorithm Hash digest
SHA256 cf5bbe0b758972528cf2ec7e4000f57e598ed9b2a3d89bafb3e81f2763ebe057
MD5 004eb9bfacd7e4a36405af548375a6d6
BLAKE2b-256 3649f550f5029dd0bc272286f3bdd67ba901520e861e6fe4cd65a3aaaf85a8b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for recotem-2.0.0a0.tar.gz:

Publisher: publish.yml on codelibs/recotem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file recotem-2.0.0a0-py3-none-any.whl.

File metadata

  • Download URL: recotem-2.0.0a0-py3-none-any.whl
  • Upload date:
  • Size: 197.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for recotem-2.0.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 dc79bfaa07f257e09f8d4f11ea77aa3343daf1f887c34ba7403c5a5bfbc1d5a8
MD5 dfe4fb4111ba8e5661c72c2936c5bc04
BLAKE2b-256 8acead466242820f58b2d9f1fcbfd64da6ac62436e062fa849b37d833c1f0e69

See more details on using hashes here.

Provenance

The following attestation bundles were made for recotem-2.0.0a0-py3-none-any.whl:

Publisher: publish.yml on codelibs/recotem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page