Production patterns for indie AI agents — token bucket, cost-aware routing, retry, observability.

These details have not been verified by PyPI

Project links

Project description

agentprod

Production patterns for indie AI agents — extracted from running a multi-LLM trading agent in production.

agentprod is a small Python library of the four things you reach for once your AI agent leaves your laptop and starts charging your credit card at 3 AM:

Module	What it gives you	Why it exists
`Router`	Cost-aware model selection (cheapest model that meets the quality bar)	Burning Sonnet on "what is the price?" is how you go bankrupt
`Throttle`	Async token bucket with jitter + hard timeout	Provider rate limits don't just slow you down, they cascade
`retry_call` / `retry_async`	Pattern-based detection of transient failures	LLM SDKs change their exception classes every release; the error string is stable
`CostTracker`	Per-call USD ledger with arbitrary labels (agent, user, route)	"Which agent burned $40 last night" is a question the provider dashboard can't answer

No hard dependency on LangChain / LangGraph / OpenAI SDK. Bring your own LLM client. agentprod just gives you the production scaffolding around it.

Status

Alpha (v0.0.1). APIs may change before 1.0. Battle-tested in one production system; tests cover the core paths but the public surface is intentionally small until usage shapes it.

Install

# Pure stdlib — no required deps
pip install agentprod

# With tenacity for richer retry semantics
pip install "agentprod[retry]"

Python 3.10+.

Quickstart

The full example is in examples/quickstart.py. Skeleton:

import asyncio
from agentprod import (
    Complexity, Router, Throttle, retry_async,
    CostTracker, ModelPricing,
)

router = Router(model_for={
    Complexity.SIMPLE:   "gpt-4o-mini",
    Complexity.MODERATE: "gpt-4o",
    Complexity.COMPLEX:  "claude-sonnet-4-6",
})
throttle = Throttle(capacity=10, refill_per_sec=10)
PRICING = {
    "gpt-4o-mini": ModelPricing(input_per_1k=0.00015, output_per_1k=0.0006),
    "gpt-4o":      ModelPricing(input_per_1k=0.0025,  output_per_1k=0.01),
}
cost = CostTracker(jsonl_path=".data/cost.jsonl")

async def handle(query: str, *, agent: str) -> str:
    model = router.select(query)
    await throttle.acquire(timeout=1.0, label=f"llm:{model}")
    text, in_tok, out_tok = await retry_async(
        lambda: your_llm_call(model, query),
        max_attempts=3,
    )
    cost.record(
        model=model,
        input_tokens=in_tok, output_tokens=out_tok,
        pricing=PRICING[model],
        labels={"agent": agent},
    )
    return text

Each piece in 30 seconds

Router — cost-aware model selection

Pick the cheapest model that can handle the query:

from agentprod import Complexity, Router

router = Router(
    model_for={
        Complexity.SIMPLE:   "gpt-4o-mini",
        Complexity.MODERATE: "gpt-4o",
        Complexity.COMPLEX:  "claude-sonnet-4-6",
    },
    # Optional: bump domain terms to a higher tier
    complex_keywords=("DCF", "valuation", "portfolio"),
    simple_keywords=("price of", "ticker"),
)

router.select("what is the price of AAPL?")
# → "gpt-4o-mini"

router.select("compare AAPL and MSFT cash flow over 5 years")
# → "claude-sonnet-4-6"

Three-tier classifier (simple / moderate / complex) using:

Simple-keyword regex (wins over everything — short queries shouldn't hit the expensive model just because they happen to contain a long word)
Complex-keyword count
Word-count thresholds (CJK width-aware — works on Korean / Japanese / Chinese mixed input)

Throttle — asyncio token bucket

from agentprod import Throttle, ThrottleTimeout

bucket = Throttle(
    capacity=12,             # max burst size
    refill_per_sec=12,       # sustained rps
    jitter_ms=(5, 30),       # avoid thundering herd
    on_acquire=lambda r: log.info("throttle wait: %s", r),
)

try:
    await bucket.acquire(timeout=1.0, label="GET /quote")
    # ... make your call ...
except ThrottleTimeout:
    # bucket couldn't free a slot in time — drop and try next cycle
    return None

Why not aiolimiter / asyncio-throttle? Two things:

Hard timeout with explicit exception. Burst > timeout is usually a signal to drop the request, not to keep waiting.
Metrics callback. Sync or async, exceptions swallowed. You ship throttle waits to your observability stack without wrapping the bucket.

retry — pattern-based transient detection

from agentprod import is_retryable, retry_call, retry_async

# Decision function — drop into any retry library
if is_retryable(exc):
    ...

# Or use the wrapper (uses tenacity if installed, manual backoff otherwise)
result = retry_call(
    lambda: openai_client.chat.completions.create(...),
    max_attempts=3,
)

# Async version
result = await retry_async(
    lambda: anthropic_client.messages.create(...),
    max_attempts=3,
)

Default patterns cover OpenAI / Anthropic / Google / bare httpx error strings: rate limit, 429, 500, 502, 503, overloaded, timeout, server error, too many requests, connection reset.

Why string matching: provider SDKs reshuffle their exception classes every release. The message is the most stable contract.

CostTracker — per-call ledger with labels

from agentprod import CostTracker, ModelPricing

pricing = ModelPricing(
    input_per_1k=0.0025,
    output_per_1k=0.01,
    cached_input_per_1k=0.00125,  # optional, for providers with prompt caching
)

tracker = CostTracker(jsonl_path=".data/cost.jsonl")

tracker.record(
    model="gpt-4o",
    input_tokens=1234, output_tokens=567, cached_input_tokens=800,
    pricing=pricing,
    labels={"agent": "fundamental_analyst", "user": "u_123", "route": "/analyze"},
)

tracker.total_usd()                       # 12.4583
tracker.total_usd(where={"user": "u_123"})  # 0.42
tracker.by_label("agent")                  # {"fundamental_analyst": 0.42, ...}
tracker.by_model()                         # {"gpt-4o": 12.4583}

Why bring your own pricing: model prices change weekly. A library that ships its own catalog goes stale fast.

Why these four

These are the four pieces I rebuilt in three different agent codebases before deciding to extract them once. Every production AI agent eventually needs:

Cost discipline at the routing layer. Per-call cost discipline alone isn't enough — by the time you see a $300 bill, the spend is sunk. Routing is where the economics start.
Rate-limit resilience that doesn't cascade. A single 429 turns into 50 once your retries pile up. Token bucket + hard timeout breaks the cascade.
Retry that survives SDK upgrades. I've had three OpenAI SDK upgrades break my retry code because the exception classes moved. String matching the message has outlived all of them.
Cost attribution by label, not just total. "We spent $40 last night" is useless. "The fundamental_analyst agent spent $38 on retries against gpt-4o" is fixable.

Everything else in your agent is your business logic and shouldn't live in a library.

Non-goals

No LLM client wrapping. Use OpenAI / Anthropic / LangChain / your own. agentprod gives you the scaffolding around the call, not the call itself.
No model catalog. Prices change too fast.
No vector DB / RAG / evaluation. Different problem domain.
No multiprocessing. The Throttle is asyncio-only by design. If you need cross-process throttling, you want Redis-backed leaky bucket.

Development

git clone https://github.com/whdrnr2583-cmd/agentprod
cd agentprod
pip install -e ".[dev]"
pytest

License

MIT.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.1

May 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentprod-0.0.1.tar.gz (17.8 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentprod-0.0.1-py3-none-any.whl (16.3 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file agentprod-0.0.1.tar.gz.

File metadata

Download URL: agentprod-0.0.1.tar.gz
Upload date: May 5, 2026
Size: 17.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentprod-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`05928079426a9b91729404243c5d7923582f847f33ebfff9a96b98de90614e06`
MD5	`fb51575bf110a81bd5df6562dc1bb94c`
BLAKE2b-256	`8ff52117b5128367871705fa3d301e12c9e68b295beb9f1e094f609987bfd70f`

See more details on using hashes here.

File details

Details for the file agentprod-0.0.1-py3-none-any.whl.

File metadata

Download URL: agentprod-0.0.1-py3-none-any.whl
Upload date: May 5, 2026
Size: 16.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for agentprod-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fd87cb96d7d5e22f20c22631bd52d7f8c32e844e6a19c4f1811c8aff184831c1`
MD5	`863896d47df6b7f99345c1f2816e37a7`
BLAKE2b-256	`1b214e9135a1c6b3adadeb75f3cf97bd60a7397d40a208c1917754fe5d3ef741`

See more details on using hashes here.

agentprod 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agentprod

Status

Install

Quickstart

Each piece in 30 seconds

Router — cost-aware model selection

Throttle — asyncio token bucket

retry — pattern-based transient detection

CostTracker — per-call ledger with labels

Why these four

Non-goals

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes