Minima CLI: cost-aware LLM model routing — recommend cheaper models, backed by Mubit memory.

These details have not been verified by PyPI

Project links

Project description

Minima

Recommend a cheaper LLM model for each task, so LLM-driven workflows spend fewer tokens without losing the quality the task actually needs.

Minima only recommends — it never proxies a call, runs a model, rewrites a prompt, or caches. It is a stack-agnostic advice layer backed by Mubit memory: ask which model to use, run that model yourself, then tell Minima how it went. Because it sits beside your call rather than in front of it, it adds zero latency to your real LLM request.

  POST /v1/recommend  ──▶  you run the model  ──▶  POST /v1/feedback
   (recall + rank)          (your stack)            (write outcome, reinforce memory)
        ▲                                                           │
        └──────────────  recommendations get sharper  ─────────────┘

Why it works

Minima is backed by Mubit memory. Every POST /v1/feedback writes a task → model → outcome record; every POST /v1/recommend recalls the most similar past records and picks the cheapest model expected to clear a quality bar. The longer it runs, the sharper the picks.

A cost_quality_tradeoff slider (0 = cheapest acceptable, 10 = highest quality) moves the bar. When memory is thin or conflicting, Minima can escalate to a cheap-LLM reasoner (configurable, off by default).

Cost ranking that reflects reality

A flat token estimate assumes a fixed completion length, so it ignores reasoning/thinking tokens and mis-ranks a model with cheap list prices but heavy internal reasoning. Minima ranks candidates by what they really cost, choosing one basis for the whole candidate set:

rescaled (best) — this request's input priced + the model's observed output-token behavior; size-exact and reasoning-aware.
observed — robust median of realized $/call from recalled outcomes.
estimate (cold start) — token estimate from catalog prices.

The basis climbs estimate → observed → rescaled as your /feedback calls accumulate realized tokens and cost. See Concepts → Cost-basis tiers.

Endpoints

Endpoint	Purpose
`POST /v1/recommend`	Recommend a model for one task.
`POST /v1/recommend/workflow`	Recommend a model per step of a multi-step workflow.
`POST /v1/feedback`	Report an outcome and close the learning loop.
`GET /v1/models`	The current model catalog (cost + capability priors).
`GET /v1/strategies`	Rules Mubit has promoted for a namespace (explainability).
`GET /v1/health`	Service, Mubit, catalog, and reasoner status.
`POST\|GET\|DELETE /v1/admin/tenants`	Tenant provisioning (multi-tenant mode only).

Full schemas, fields, warnings, and error formats: API Reference.

Quickstart

uv sync --extra dev
cp .env.example .env                       # set MUBIT_API_KEY (+ MUBIT_ENDPOINT if not local)

# optional: seed cold-start memory so day-one picks are grounded
uv run minima-seed --dataset synthetic --limit 2000 --lane minima:default

make run                                   # uvicorn on :8080 (interactive docs at /docs)

# recommend
curl -s localhost:8080/v1/recommend -H 'content-type: application/json' -d '{
  "task": {"task": "Summarize this incident report into 3 bullets.",
           "task_type": "summarization"},
  "cost_quality_tradeoff": 3
}' | jq

# ...run the recommended model yourself, then close the loop
curl -s localhost:8080/v1/feedback -H 'content-type: application/json' -d '{
  "recommendation_id": "<from above>", "chosen_model_id": "claude-haiku-4-5",
  "outcome": "success", "quality_score": 0.95,
  "input_tokens": 1760, "output_tokens": 110, "actual_cost_usd": 0.0021,
  "verified_in_production": true
}' | jq

Minima talks to a Mubit runtime at MUBIT_ENDPOINT (defaults to http://127.0.0.1:3000; start one with make run-mubit in the Mubit repo) and uses Mubit's server-side embeddings, so it needs no embedding model of its own.

Python client

from minima_client import MinimaClient

with MinimaClient("http://localhost:8080") as minima:
    rec = minima.recommend("Write a Python CSV parser.", cost_quality_tradeoff=3)
    # ... run rec.recommended_model.model_id yourself ...
    minima.feedback(rec.recommendation_id, rec.recommended_model.model_id, "success",
                    quality_score=0.95, input_tokens=180, output_tokens=640,
                    actual_cost_usd=0.0034, verified_in_production=True)

Sync + async clients and zero-code autocapture: Python Client SDK.

Documentation

Doc	What's in it
Getting Started	Install, configure, run, first recommendation.
Concepts	The loop, the algorithm, cost-basis tiers, escalation, how it improves.
API Reference	Every endpoint, full schemas, warnings, errors.
Configuration	Every environment variable + tuning guidance.
Python Client SDK	`minima_client` clients + autocapture.
Cold-Start Seeding	Load history so day-one picks are grounded.
Multi-Tenancy	One deployment, many orgs, per-org Mubit instances.
Operations	Deployment, health, degradation, monitoring, secrets.
Examples	Guided tour of the runnable examples.
Agent Harness	`minima_harness`: a Minima-routing port of PI's agent toolkit.

Examples

Runnable, progressively advanced — in examples/:

#	Example	Shows
1	`01_quickstart.sh`	Raw `curl` against every endpoint.
2	`02_recommend_and_feedback.py`	The core loop with the SDK.
3	`03_constraints_and_tradeoff.py`	Constraints + slider sweep.
4	`04_workflow.py`	Per-step workflow recommendations.
5	`05_autocapture.py`	Zero-code intake via `mubit.learn`.
6	`06_routed_llm_call.py`	Routing a real Claude call + feedback.
7	`07_multitenant_admin.py`	Provision an org, call as that tenant.
8	`harness_warmup.py`	The `minima_harness` agent loop (demo mode needs no keys).

Agent harness

minima_harness/ is a lean Python port of @earendil-works/pi's agent toolkit, made Minima-native: an Agent runtime with tool calling plus a MinimaAgent that routes every prompt through Minima and feeds the realized tokens/cost/quality back. It is the "run the model yourself" half of the Minima loop, packaged.

from minima_harness.minima import MinimaAgent, HarnessConfig

agent = MinimaAgent(HarnessConfig.from_env())   # MINIMA_URL, candidates, judge policy
await agent.prompt("Summarize this incident.", task_type="summarization", slider=3)
# -> Minima picked the model, the agent ran it, judged quality, fed the outcome back

Try it with no keys via the in-process demo:

uv run python examples/harness_warmup.py          # demo (in-process Minima + fake provider)
uv run python examples/harness_warmup.py --live   # real Minima + real providers

Full architecture, the loop mapping, and extension guide: Agent Harness.

Configuration

All configuration is via environment variables (see .env.example and Configuration). The only required value is MUBIT_API_KEY (in single-tenant mode). Notable knobs:

MINIMA_USE_OBSERVED_COST / MINIMA_OBSERVED_COST_MIN_N — rank by realized cost.
MINIMA_REASONER_PROVIDER — enable the cheap-LLM escalation tier (anthropic / gemini).
MINIMA_RECOMMENDATION_STORE=sqlite — durable recommendation resolution (multi-worker).
MINIMA_MULTITENANT — serve many orgs from one deployment.

Development

make install     # uv sync --extra dev
make test        # unit + integration (no Mubit needed)
make lint        # ruff + mypy
make live        # end-to-end against a running Mubit (pytest -m live)
make eval        # offline RouterBench savings evaluation (pytest -m eval)
make fmt         # ruff --fix + format
make seed        # minima-seed (LIMIT=, LANE= overridable)

Project layout

src/minima/
  api/routers/      recommend · feedback · models · strategies · health · admin
  recommender/      engine · classify · aggregate · score · escalation · propensity · recstore
  memory/           adapter (only Mubit touchpoint) · records · keys · threadpool
  catalog/          store · merge · refresh · sources/{litellm,openrouter} · data/*.json
  llm/              base · anthropic · gemini · registry   (the escalation reasoner)
  tenancy/          runtime · registry · context · keys · secrets
  seeding/          routerbench · synthetic · run_seed (minima-seed CLI)
  schemas/          common · recommend · workflow · feedback · models_catalog · strategies · admin
src/minima_harness/   ported pi-ai (ai/) + pi-agent-core (agent/) + Minima integration (minima/) — see docs/harness.md
client_sdk/minima_client/   client (sync+async) · autocapture · errors
docs/               full documentation       examples/   runnable examples
tests/              unit · integration (FakeMemory) · live (-m live) · eval (-m eval)

License

Minima is source-available under the Functional Source License, Version 1.1, Apache 2.0 Future License (FSL-1.1-Apache-2.0).

You may use, copy, modify, and self-host Minima for any Permitted Purpose — internal use, non-commercial education/research, and professional services for a licensee. The one restriction is a Competing Use: you may not offer Minima (or a substantially similar product/service) to others as a commercial or hosted offering that competes with us. Two years after each version is published, that version automatically converts to the Apache License 2.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.0

Jul 1, 2026

0.4.10

Jun 26, 2026

0.4.9

Jun 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minima_cli-0.5.0.tar.gz (906.8 kB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

minima_cli-0.5.0-py3-none-any.whl (284.3 kB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file minima_cli-0.5.0.tar.gz.

File metadata

Download URL: minima_cli-0.5.0.tar.gz
Upload date: Jul 1, 2026
Size: 906.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for minima_cli-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`beb5a9f22b9f95e5b784c9ddda1503495ec7fd26103e609261324ebef5ee3538`
MD5	`fb554e713563a3370f6821ded9201104`
BLAKE2b-256	`4f508a56de998b2f3130f5adb95150dbec21607322940d3e67f1d45e08760d34`

See more details on using hashes here.

File details

Details for the file minima_cli-0.5.0-py3-none-any.whl.

File metadata

Download URL: minima_cli-0.5.0-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 284.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for minima_cli-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`98e2365e37781dc19e87f4d22b892a5754cd9f7d6055ee94162b4be5f6342c14`
MD5	`fc9ed9d3693d7b50addcf2d87ad58d0c`
BLAKE2b-256	`1e6b9dc601a26a39e50638a79221e990cfc5e117b1044595d838195c540a540b`

See more details on using hashes here.

minima-cli 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Minima

Why it works

Cost ranking that reflects reality

Endpoints

Quickstart

Python client

Documentation

Examples

Agent harness

Configuration

Development

Project layout

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes