Budgeted LLM task orchestration with hard limits and long-context RLM.
Project description
enzu
Budgeted LLM tasks that scale beyond context.
enzu is a Python-first toolkit for AI engineers and builders who need reliable, budgeted LLM runs. It enforces hard limits (tokens, time, cost), switches to RLM when context is large, and works across OpenAI-compatible providers. Use it from Python, the CLI, or the HTTP API.
30-second quickstart
uv add enzu
export OPENAI_API_KEY=sk-...
python -c "from enzu import ask; print(ask('Say hello in one sentence.'))"
What enzu is (and isn’t)
- Enzu is a budget + reliability layer for LLM work: caps that actually stop execution when you hit token/time/cost limits.
- Enzu isn’t a giant agent framework. It’s meant to stay small, composable, and easy to drop into existing code.
Why enzu
- Hard budgets by default: tokens, time, and cost caps that actually stop work
- RLM mode for long context: recursive subcalls when prompts are too large
- Provider-agnostic: OpenAI-compatible APIs and bring-your-own model
- Production-ready surfaces: Python SDK, CLI worker, and HTTP API
What enzu is / isn't
| enzu is | enzu is not |
|---|---|
| A budget-first execution engine | A prompt library or template system |
| Hard stops when limits are hit | Best-effort throttling |
| RLM for tasks that exceed context | A vector DB or RAG framework |
| Provider-agnostic (OpenAI-compatible) | Tied to one vendor |
| Lightweight (~2k LOC core) | A full agent framework |
Quickstart (Python)
uv add enzu
# or: pip install enzu
export OPENAI_API_KEY=sk-...
from enzu import Enzu, ask
print(ask("What is 2+2?"))
client = Enzu() # Auto-detects from env
answer = client.run(
"Summarize the key points",
data="...long document...",
tokens=400,
)
print(answer)
Tip: Set OPENAI_API_KEY, OPENROUTER_API_KEY, or another provider key. You can always pass model= and provider= explicitly.
Budget hard-stop (killer feature)
enzu enforces budgets as physics, not policy. When you set a limit, the system will stop:
from enzu import Enzu
client = Enzu()
# Ask for 500 words but cap at 50 tokens - enzu stops deterministically
result = client.run(
"Write a 500-word essay on climate change.",
data="...long research document...",
tokens=50, # Hard cap: output stops here
)
# Result: "[PARTIAL - budget exhausted]..." - work stopped, no runaway costs
See examples/budget_hardstop_demo.py for the full demo.
Typed outcomes (predictable handling)
Every run returns a typed Outcome for deterministic error handling:
from enzu import Enzu, Outcome
client = Enzu()
result = client.run("Analyze this", data=doc, tokens=100, return_report=True)
if result.outcome == Outcome.SUCCESS:
print(result.answer)
elif result.outcome == Outcome.BUDGET_EXCEEDED:
print(f"Partial result: {result.answer}" if result.partial else "Budget hit")
elif result.outcome == Outcome.TIMEOUT:
handle_timeout()
# Also: PROVIDER_ERROR, TOOL_ERROR, VERIFICATION_FAILED, CANCELLED, INVALID_REQUEST
See examples/typed_outcomes_demo.py for the full demo.
RLM mode (reasoning over long context)
When your input exceeds context limits, enzu automatically switches to RLM (Reasoning Language Model) mode—recursive subcalls that break the problem into manageable pieces:
from enzu import Enzu
client = Enzu()
# Pass a large document - enzu auto-detects and uses RLM
answer = client.run(
"Who is credited with the first algorithm?",
data=open("large_research_paper.txt").read(), # 100k+ tokens
tokens=500,
)
RLM mode provides progress callbacks, step-by-step reasoning, and budget enforcement across all subcalls.
Use cases
1. Cost-controlled batch processing
# Process 1000 documents with a $10 budget cap
client = Enzu(cost=10.0)
for doc in documents:
result = client.run("Extract key entities", data=doc)
2. Research assistant with guardrails
# Research task with time and token limits
answer = client.run(
"Research recent AI safety papers and summarize",
seconds=60, # Max 1 minute
tokens=1000, # Max 1000 output tokens
)
3. Long document analysis
# Analyze a document too large for context window
summary = client.run(
"Summarize the main arguments and conclusions",
data=open("100_page_report.pdf.txt").read(),
tokens=500,
)
Job mode (async delegation)
For long-running tasks, use job mode to submit and poll:
from enzu import Enzu, JobStatus
import time
client = Enzu()
# Submit a job (returns immediately)
job = client.submit("Analyze this large dataset", data=data, cost=5.0)
print(f"Job ID: {job.job_id}")
# Poll for completion
while job.status in (JobStatus.PENDING, JobStatus.RUNNING):
time.sleep(1)
job = client.status(job.job_id)
# Get result
if job.status == JobStatus.COMPLETED:
print(job.answer)
# Or cancel if needed
# client.cancel(job.job_id)
See examples/job_delegation_demo.py for the full demo.
HTTP API (server)
uv pip install "enzu[server]"
uvicorn enzu.server:app --host 0.0.0.0 --port 8000
curl http://localhost:8000/v1/run \
-H "Content-Type: application/json" \
-d '{"task":"Say hello","model":"gpt-4o","provider":"openai"}'
If you set ENZU_API_KEY, pass X-API-Key on every request.
CLI worker
cat <<'JSON' | enzu
{
"provider": "openai",
"task": {
"task_id": "hello-1",
"input_text": "Say hello in one sentence.",
"model": "gpt-4o"
}
}
JSON
Docs
docs/README.md- Start heredocs/QUICKREF.md- Providers, env vars, model formatsdocs/DEPLOYMENT_QUICKSTART.md- CLI + integration patternsdocs/SERVER.md- HTTP APIdocs/PYTHON_API_REFERENCE.md- Full Python APIdocs/COOKBOOK.md- Patterns and recipesdocs/BUDGETS_AS_PHYSICS.md- Essay: budgets, containment, typed outcomes for delegated agentsdocs/RUN_METRICS.md- p95 cost/run and terminal state distributions
Examples
examples/budget_hardstop_demo.py- Killer demo: budget cap stops work deterministicallyexamples/typed_outcomes_demo.py- Typed outcomes for predictable error handlingexamples/job_delegation_demo.py- Async job mode with pollingexamples/python_quickstart.py- Minimal Python usageexamples/python_budget_guardrails.py- Hard budget limitsexamples/budget_cap_total_tokens.py- Tiny total-token cap (hard stop)examples/budget_cap_seconds.py- Tiny time cap (hard stop)examples/budget_cap_cost_openrouter.py- Tiny cost cap (OpenRouter only)examples/run_metrics_demo.py- p50/p95 cost/run and terminal state distributionsexamples/retry_tracking_demo.py- Retry tracking and budget attributionexamples/rlm_with_context.py- RLM run over longer contextexamples/chat_with_budget.py- TaskSpec + budgets + success criteriaexamples/http_quickstart.sh- HTTP API runexamples/research_with_exa.py- Research tool + synthesisexamples/file_chatbot.py- File-based chat loopexamples/file_researcher.py- Session-based research loop
Contributing
See CONTRIBUTING.md.
Requirements
Python 3.9+
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file enzu-0.3.0.tar.gz.
File metadata
- Download URL: enzu-0.3.0.tar.gz
- Upload date:
- Size: 392.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb6bb2a6ae3e135551a86086e6112a08535208de0290f3fc2dc45995fe1cf0ef
|
|
| MD5 |
cc6bf92d281cfe59e04cd2a6f7de63b4
|
|
| BLAKE2b-256 |
e71bb7fe3e9312dd2d99367a22463bdba7cd0abb52fe23ed590a7b65fcb45236
|
File details
Details for the file enzu-0.3.0-py3-none-any.whl.
File metadata
- Download URL: enzu-0.3.0-py3-none-any.whl
- Upload date:
- Size: 291.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94e1ebcf1d63dd39b413b7971e5af5e27c6a80ff6b5fe9f2d280e47d7e1f3595
|
|
| MD5 |
1a699497de65cf27ad76916fee65f1e3
|
|
| BLAKE2b-256 |
ce8e5835cf0c1de4be078bbe3a16583d32a14d376836c6d16c6ee1a021de01ca
|