Skip to main content

Budgeted LLM task orchestration with hard limits and long-context RLM.

Project description

enzu

enzu

Budgeted LLM tasks that scale beyond context.

PyPI Python License

enzu is a Python-first toolkit for AI engineers and builders who need reliable, budgeted LLM runs. It enforces hard limits (tokens, time, cost), switches to RLM when context is large, and works across OpenAI-compatible providers. Use it from Python, the CLI, or the HTTP API.

30-second quickstart

uv add enzu
export OPENAI_API_KEY=sk-...
python -c "from enzu import ask; print(ask('Say hello in one sentence.'))"

What enzu is (and isn’t)

  • Enzu is a budget + reliability layer for LLM work: caps that actually stop execution when you hit token/time/cost limits.
  • Enzu isn’t a giant agent framework. It’s meant to stay small, composable, and easy to drop into existing code.

Why enzu

  • Hard budgets by default: tokens, time, and cost caps that actually stop work
  • RLM mode for long context: recursive subcalls when prompts are too large
  • Provider-agnostic: OpenAI-compatible APIs and bring-your-own model
  • Production-ready surfaces: Python SDK, CLI worker, and HTTP API

What enzu is / isn't

enzu is enzu is not
A budget-first execution engine A prompt library or template system
Hard stops when limits are hit Best-effort throttling
RLM for tasks that exceed context A vector DB or RAG framework
Provider-agnostic (OpenAI-compatible) Tied to one vendor
Lightweight (~2k LOC core) A full agent framework

Quickstart (Python)

uv add enzu
# or: pip install enzu
export OPENAI_API_KEY=sk-...
from enzu import Enzu, ask

print(ask("What is 2+2?"))

client = Enzu()  # Auto-detects from env
answer = client.run(
    "Summarize the key points",
    data="...long document...",
    tokens=400,
)
print(answer)

Tip: Set OPENAI_API_KEY, OPENROUTER_API_KEY, or another provider key. You can always pass model= and provider= explicitly.

Budget hard-stop (killer feature)

enzu enforces budgets as physics, not policy. When you set a limit, the system will stop:

from enzu import Enzu

client = Enzu()

# Ask for 500 words but cap at 50 tokens - enzu stops deterministically
result = client.run(
    "Write a 500-word essay on climate change.",
    data="...long research document...",
    tokens=50,  # Hard cap: output stops here
)
# Result: "[PARTIAL - budget exhausted]..." - work stopped, no runaway costs

See examples/budget_hardstop_demo.py for the full demo.

Typed outcomes (predictable handling)

Every run returns a typed Outcome for deterministic error handling:

from enzu import Enzu, Outcome

client = Enzu()
result = client.run("Analyze this", data=doc, tokens=100, return_report=True)

if result.outcome == Outcome.SUCCESS:
    print(result.answer)
elif result.outcome == Outcome.BUDGET_EXCEEDED:
    print(f"Partial result: {result.answer}" if result.partial else "Budget hit")
elif result.outcome == Outcome.TIMEOUT:
    handle_timeout()
# Also: PROVIDER_ERROR, TOOL_ERROR, VERIFICATION_FAILED, CANCELLED, INVALID_REQUEST

See examples/typed_outcomes_demo.py for the full demo.

RLM mode (reasoning over long context)

When your input exceeds context limits, enzu automatically switches to RLM (Reasoning Language Model) mode—recursive subcalls that break the problem into manageable pieces:

from enzu import Enzu

client = Enzu()

# Pass a large document - enzu auto-detects and uses RLM
answer = client.run(
    "Who is credited with the first algorithm?",
    data=open("large_research_paper.txt").read(),  # 100k+ tokens
    tokens=500,
)

RLM mode provides progress callbacks, step-by-step reasoning, and budget enforcement across all subcalls.

Use cases

1. Cost-controlled batch processing

# Process 1000 documents with a $10 budget cap
client = Enzu(cost=10.0)
for doc in documents:
    result = client.run("Extract key entities", data=doc)

2. Research assistant with guardrails

# Research task with time and token limits
answer = client.run(
    "Research recent AI safety papers and summarize",
    seconds=60,   # Max 1 minute
    tokens=1000,  # Max 1000 output tokens
)

3. Long document analysis

# Analyze a document too large for context window
summary = client.run(
    "Summarize the main arguments and conclusions",
    data=open("100_page_report.pdf.txt").read(),
    tokens=500,
)

Job mode (async delegation)

For long-running tasks, use job mode to submit and poll:

from enzu import Enzu, JobStatus
import time

client = Enzu()

# Submit a job (returns immediately)
job = client.submit("Analyze this large dataset", data=data, cost=5.0)
print(f"Job ID: {job.job_id}")

# Poll for completion
while job.status in (JobStatus.PENDING, JobStatus.RUNNING):
    time.sleep(1)
    job = client.status(job.job_id)

# Get result
if job.status == JobStatus.COMPLETED:
    print(job.answer)

# Or cancel if needed
# client.cancel(job.job_id)

See examples/job_delegation_demo.py for the full demo.

HTTP API (server)

uv pip install "enzu[server]"
uvicorn enzu.server:app --host 0.0.0.0 --port 8000
curl http://localhost:8000/v1/run \
  -H "Content-Type: application/json" \
  -d '{"task":"Say hello","model":"gpt-4o","provider":"openai"}'

If you set ENZU_API_KEY, pass X-API-Key on every request.

CLI worker

cat <<'JSON' | enzu
{
  "provider": "openai",
  "task": {
    "task_id": "hello-1",
    "input_text": "Say hello in one sentence.",
    "model": "gpt-4o"
  }
}
JSON

Docs

  • docs/README.md - Start here
  • docs/QUICKREF.md - Providers, env vars, model formats
  • docs/DEPLOYMENT_QUICKSTART.md - CLI + integration patterns
  • docs/SERVER.md - HTTP API
  • docs/PYTHON_API_REFERENCE.md - Full Python API
  • docs/COOKBOOK.md - Patterns and recipes
  • docs/BUDGETS_AS_PHYSICS.md - Essay: budgets, containment, typed outcomes for delegated agents
  • docs/RUN_METRICS.md - p95 cost/run and terminal state distributions

Examples

  • examples/budget_hardstop_demo.py - Killer demo: budget cap stops work deterministically
  • examples/typed_outcomes_demo.py - Typed outcomes for predictable error handling
  • examples/job_delegation_demo.py - Async job mode with polling
  • examples/python_quickstart.py - Minimal Python usage
  • examples/python_budget_guardrails.py - Hard budget limits
  • examples/budget_cap_total_tokens.py - Tiny total-token cap (hard stop)
  • examples/budget_cap_seconds.py - Tiny time cap (hard stop)
  • examples/budget_cap_cost_openrouter.py - Tiny cost cap (OpenRouter only)
  • examples/run_metrics_demo.py - p50/p95 cost/run and terminal state distributions
  • examples/retry_tracking_demo.py - Retry tracking and budget attribution
  • examples/rlm_with_context.py - RLM run over longer context
  • examples/chat_with_budget.py - TaskSpec + budgets + success criteria
  • examples/http_quickstart.sh - HTTP API run
  • examples/research_with_exa.py - Research tool + synthesis
  • examples/file_chatbot.py - File-based chat loop
  • examples/file_researcher.py - Session-based research loop

Contributing

See CONTRIBUTING.md.

Requirements

Python 3.9+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

enzu-0.3.0.tar.gz (392.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

enzu-0.3.0-py3-none-any.whl (291.7 kB view details)

Uploaded Python 3

File details

Details for the file enzu-0.3.0.tar.gz.

File metadata

  • Download URL: enzu-0.3.0.tar.gz
  • Upload date:
  • Size: 392.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.16

File hashes

Hashes for enzu-0.3.0.tar.gz
Algorithm Hash digest
SHA256 cb6bb2a6ae3e135551a86086e6112a08535208de0290f3fc2dc45995fe1cf0ef
MD5 cc6bf92d281cfe59e04cd2a6f7de63b4
BLAKE2b-256 e71bb7fe3e9312dd2d99367a22463bdba7cd0abb52fe23ed590a7b65fcb45236

See more details on using hashes here.

File details

Details for the file enzu-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: enzu-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 291.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.16

File hashes

Hashes for enzu-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 94e1ebcf1d63dd39b413b7971e5af5e27c6a80ff6b5fe9f2d280e47d7e1f3595
MD5 1a699497de65cf27ad76916fee65f1e3
BLAKE2b-256 ce8e5835cf0c1de4be078bbe3a16583d32a14d376836c6d16c6ee1a021de01ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page