Skip to main content

A tiny, obvious agent runtime for Python. No graphs, no chains, no ceremony.

Project description

drangue

A tiny, obvious agent runtime for Python.

An agent is just a model plus tools. Running it is one call. No graphs, no chains, no base classes to inherit. You can read the whole loop in one sitting.

Install

The core has zero dependencies. Install the adapter for the backend you want:

pip install "drangue[openai]"     # OpenAI, DeepSeek, Groq, Ollama, and more
pip install "drangue[anthropic]"  # Claude

The whole thing

from drangue import Agent, tool

@tool
def error_rate(service: str) -> str:
    """Return the recent 5xx error rate and p99 latency for a service."""
    # In production this queries Prometheus/Datadog; here it returns a sample.
    return f"{service}: 4.2% 5xx over 15m (baseline 0.3%), p99 latency 1.8s"

agent = Agent(
    model="claude-opus-4-8",
    tools=[error_rate],
    instructions="You are an on-call assistant. Be concise and specific.",
)

result = agent.run_sync("Is the checkout service healthy right now?")
print(result.output)

That is the entire surface for the happy path. A tool is a typed function. The decorator reads its signature and docstring and builds the schema for you. No manual JSON, no Pydantic required (though you can pass plain functions too).

The core is async. run_sync is the convenience wrapper for scripts; inside async code you await instead:

result = await agent.run("Is the checkout service healthy right now?")

See every step

Inspection is one flag, not a separate service:

result = agent.run_sync("Is the checkout service healthy right now?", trace=True)
* tool   error_rate(service='checkout')
       -> checkout: 4.2% 5xx over 15m (baseline 0.3%), p99 latency 1.8s
* model  checkout is unhealthy: 4.2% 5xx is 14x the 0.3% baseline, p99 at 1.8s. Worth paging.

result.usage reports the token totals for the run, and result.events is the full event log the run was driven from.

Drive the loop yourself

stream yields each event as it is appended to the log, so you stay in control:

async for event in agent.stream("Is the orders service healthy?"):
    if event.type == "model_decision":
        for call in event.payload["tool_calls"]:
            print("calling", call["name"], call["arguments"])
    elif event.type == "run_finished":
        print(event.payload["output"])

Cost control

Cap a run's spend and it stops gracefully before an unaffordable step, using the token usage recorded in the log:

from drangue import Agent, Budget

agent = Agent("claude-opus-4-8", tools=tools, budget=Budget(max_tokens=200_000))
# or a dollar budget with a price table:
agent = Agent("claude-opus-4-8", tools=tools, budget=Budget(
    max_usd=0.50,
    prices={"claude-opus-4-8": {"input": 15.0, "output": 75.0}},  # $ per 1M tokens
))

Route each step to the cheapest model that can handle it. The model that actually ran is recorded per step, so routing is visible in the trace and counted in the budget:

from drangue import Agent, RuleRouter

router = RuleRouter(
    default=cheap_model,
    rules=[(lambda messages, i: i == 0, smart_model)],   # only the first step is judgment
)
agent = Agent(model=router, tools=tools)

For repeated runs, AnthropicModel("claude-opus-4-8", cache=True) marks the stable prefix (system prompt and tool definitions) for prompt caching. Context is already ordered stable-to-volatile, so the cacheable part stays at the front.

Resilient tools

Tools are bounded by default and never crash a run: an exception comes back to the model as a clean, structured failure it can reason about. Opt into more with options on @tool:

from drangue import tool, RateLimitError

@tool(timeout=5.0, retries=3, backoff=0.5)
def fetch_metrics(service: str) -> str:
    """Fetch metrics, retried on transient failures."""
    resp = http_get(service)
    if resp.status == 429:
        raise RateLimitError(retry_after=resp.headers["Retry-After"])  # retried, honoring the hint
    return resp.text

The wrapper applies, in order: timeout, classify the failure, retry transient ones with exponential backoff (reusing the idempotency key), validate the result, then return a clean failure or a marked-degraded fallback. The model receives, for example:

{"ok": false, "tool": "fetch_metrics", "error": {"category": "timeout", "message": "timeout"}}

Guardrails

Constrain what the agent can do, enforced in code regardless of what the model decides (a prompt instruction is the thing injection overrides). A blocked call comes back to the model as a clean failure it can reason about.

from drangue import Agent, Guardrails

guard = Guardrails(
    allow={"read_metrics", "search"},        # the agent is a constrained principal
    require_approval_for_irreversible=True,   # gate the dangerous ones
    approver=lambda name, args: ask_human(name, args),
    input_guard=lambda text: "blocked" if looks_malicious(text) else None,
    output_guard=lambda name, args: detect_exfiltration(name, args),
)
agent = Agent("claude-opus-4-8", tools=tools, guardrails=guard)

Mark a tool's stakes so gates can act on them:

@tool(reversible=False, requires_approval=True)
def delete_database(name: str) -> str:
    """Delete a database (irreversible, always gated)."""
    ...

The layers are independent: an allow-list bounds reach, action gates stop the irreversible, and the input and output guards catch malicious content on the way in and suspicious actions on the way out. No single layer is sufficient; together they make a successful injection survivable.

Evals and deploy gates

Score the agent statistically across correctness, safety, and efficiency, then gate deploys on real regressions. A run is repeated several times (agents are non-deterministic) and produces a profile, not a pass/fail.

from drangue import Agent, Scenario, Gate, evaluate, output_contains, forbids_tool

scenarios = [
    Scenario("answers", "what is 2+3?", checks=[output_contains("5")], runs=5),
    Scenario("safety", "clean up the database",
             checks=[forbids_tool("delete_all")], runs=5),
]

baseline = (await evaluate(deployed_agent, scenarios)).profile()
candidate = (await evaluate(new_agent, scenarios)).profile()

decision = Gate().evaluate(baseline, candidate)
if not decision.passed:
    raise SystemExit(f"deploy blocked: {decision.blocks}")

Safety is exact set membership (a rule, not a judge); open-ended correctness can use an LLM Judge. The gate compares against the baseline, blocks on safety and on correctness past a noise band, warns on efficiency, and records explicit overrides. Turn a traced production failure into a regression scenario with scenario_from_result(result, name, checks=...), so the eval set grows from what actually went wrong.

Human in the loop

Autonomy is granted per action, not per agent. Each tool runs in one of three modes: shadow (propose, do not execute), assisted (pause for a human), or autonomous (execute, review later).

from drangue import Agent, Autonomy

agent = Agent("claude-opus-4-8", tools=tools, store=SQLiteStore("runs.db"),
              autonomy=Autonomy(default="autonomous", modes={"wire_funds": "assisted"}))

result = await agent.run("pay the invoice", run_id="pay-1")
if result.status == "paused":
    for p in result.pending_approvals:
        print(p["tool"], p["arguments"], "because:", p["reasoning"])  # the case, not a bare action
    await agent.approve("pay-1")        # or agent.reject("pay-1", reason="...")
    result = await agent.resume("pay-1")

An assisted action is a durable pause: the approval request and the human's decision are events in the log, so a paused run survives a process restart and resumes by replay. The side effect happens once, only after approval.

Durable runs

Point an Agent at a durable store and give a run a stable run_id. If the process dies mid-run, a new one resumes from exactly where it stopped: recorded steps are replayed as facts, so the model is not re-called and side effects do not happen twice.

from drangue import Agent, SQLiteStore

agent = Agent("claude-opus-4-8", tools=[book_flight], store=SQLiteStore("runs.db"))
result = await agent.run("Book my trip", run_id="trip-42")   # crash, rerun, same id -> resumes

A tool that causes a side effect can declare an idempotency_key parameter. The runtime injects a stable key derived from the run and step (it never appears in the model-facing schema), so the tool can deduplicate downstream:

@tool
def book_flight(city: str, idempotency_key: str = "") -> str:
    """Book a flight."""
    return charge_once(city, key=idempotency_key)

Cheap and local models

drangue ships two adapters. One of them, OpenAIModel, talks to any OpenAI-compatible endpoint, which is most of the cheap and free backends. You choose the backend with base_url; the agent loop does not change.

from drangue import Agent, OpenAIModel

# Free and local. Install Ollama, run `ollama pull llama3.1`. No API key, no per-token cost.
agent = Agent(
    model=OpenAIModel("llama3.1", base_url="http://localhost:11434/v1", api_key="ollama"),
    tools=[get_weather],
)

Swap the model line for a cheap hosted backend without touching anything else:

Backend How
Ollama / LM Studio OpenAIModel("llama3.1", base_url="http://localhost:11434/v1", api_key="ollama") (free, local)
DeepSeek OpenAIModel("deepseek-chat", base_url="https://api.deepseek.com")
Groq OpenAIModel("llama-3.1-8b-instant", base_url="https://api.groq.com/openai/v1")
OpenRouter OpenAIModel("...", base_url="https://openrouter.ai/api/v1")
OpenAI OpenAIModel("gpt-4o-mini")
Claude "claude-opus-4-8" or AnthropicModel("claude-opus-4-8")

api_key and base_url fall back to the OPENAI_API_KEY and OPENAI_BASE_URL environment variables when omitted. See examples/cheap.py.

Bring your own model

model can be a string (the default Anthropic adapter), one of the adapters above, or any object with an async generate method. That seam is how you swap providers, add caching, or pass a fake model in tests. The OpenAI and Anthropic adapters are both tested fully offline against fake clients, see tests/test_openai_model.py.

What drangue does not do

Keeping the surface obvious is the point.

  • No graph or DAG concept. You write a normal agent; the runtime drives the loop.
  • No prompt-template engine. f-strings are fine.
  • No built-in RAG or vector store. That is a different library.
  • No provider-specific code in the core. Adapters are swappable.
  • No mandatory config files.

Architecture

The simple facade sits on a small, durable-by-design core: an orchestrator decides each step deterministically, an executor performs it, and every step is appended to a store as an event log. The log is the source of truth; the run is a fold of it. That shape is what lets observability, durability, and recovery layer on without changing the facade. See ROADMAP.md.

Roadmap

The current focus is the production core (ROADMAP.md):

  • Done: orchestrator/executor split, event log, async core.
  • Done: observability (per-step timing and cost, a trace tree, console and OpenTelemetry tracers, reasoning capture).
  • Done: durable resume after a crash (SQLite store, replay, idempotency keys, the three state scopes).
  • Done: hardened tool calls (timeouts, retries with backoff, schema validation, clean structured failures, fallbacks).
  • Done: cost and latency (per-run token and dollar budgets, model routing, prompt caching).
  • Done: security and guardrails (permission scoping, action gates, input and output guards, reversibility metadata).
  • Done: human-in-the-loop rollout (per-action shadow/assisted/autonomous modes, durable pause-approve-resume).
  • Done: eval harness and deploy gates (statistical scoring across correctness, safety, and efficiency; baseline-relative gating; LLM judge; scenarios grown from production failures).

All of Chapters 4 to 12 are implemented (Phases 0 to 7).

Develop

pip install -e ".[dev]"
python run_tests.py     # no pytest needed; uses a tiny async runner

Contributions are welcome under MIT with a DCO sign-off (git commit -s) — see CONTRIBUTING.md.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drangue-0.1.0.tar.gz (55.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

drangue-0.1.0-py3-none-any.whl (47.6 kB view details)

Uploaded Python 3

File details

Details for the file drangue-0.1.0.tar.gz.

File metadata

  • Download URL: drangue-0.1.0.tar.gz
  • Upload date:
  • Size: 55.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for drangue-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3e3178fcf8658c4464c01b29ecb74cae0e6dd6ed78a9f0915f04571677f1ad10
MD5 84b5f03cf75027b83ce121196cd10da8
BLAKE2b-256 6d84179b068a8291a51f239e3284293b366bd6f64bcd2678a71e56e8b211048d

See more details on using hashes here.

Provenance

The following attestation bundles were made for drangue-0.1.0.tar.gz:

Publisher: release.yml on om-er/drangue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file drangue-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: drangue-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 47.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for drangue-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 25ec69ae7fc775b2d24b1c16d81401e2c8b57667d83971553d6bada8e55a1582
MD5 c9d7e34cd920ae07a1843ace767b8210
BLAKE2b-256 8dfaad127c056943fb331c7ab80218cb9f7fa10087c4d112fe1ed9c9f81f606c

See more details on using hashes here.

Provenance

The following attestation bundles were made for drangue-0.1.0-py3-none-any.whl:

Publisher: release.yml on om-er/drangue

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page