A durable-execution-first framework for building production AI agents

These details have not been verified by PyPI

Project links

Project description

Kestrion

A durable-execution-first framework for building production AI agents.

Status: pre-alpha (0.1.0). Core engine, decorator API, and three LLM providers are built and tested (35 passing tests). MCP integration, scheduler, CLI, and Postgres support are designed but not yet implemented — see Roadmap below.

Why Kestrion

Most agent frameworks are strong at authoring an agent loop. Kestrion is built around a narrower, specific bet: state is never mutated directly — it's derived by folding an immutable log of events. That single decision is what makes the following true by construction, not by careful discipline on the part of whoever writes a given agent:

Crash recovery is the default. Any process — the original one or a brand new one — can reconstruct a run's exact state from the store and continue it.
Human-approval gates pause the run itself, not just a function call. A tool marked as requiring approval can't be invoked without it, enforced centrally by the engine.
Observability comes from the same log everything else does — token counts, cost, and full trace history, not a separate system bolted on after.

Install

pip install kestrion[anthropic]   # or [openai], [ollama], or [all]

Each LLM provider is an optional extra. If you only use Ollama, you never need the anthropic or openai packages installed.

Quickstart

import asyncio
from kestrion.agent.agent import Agent
from kestrion.agent.decorators import tool
from kestrion.llm.anthropic_provider import AnthropicProvider

@tool
def get_cluster_state() -> dict:
    """Read current deployment replica counts."""
    return {"deployment": "checkout-api", "replicas": 2}

@tool(requires_approval=True)
def apply_manifest(yaml: str) -> dict:
    """kubectl apply a manifest against the cluster."""
    # real kubectl call would go here
    return {"applied": True}

async def main():
    agent = Agent(
        provider=AnthropicProvider(model="claude-sonnet-4-6"),
        tools=[get_cluster_state, apply_manifest],
        store="sqlite:///agent_runs.db",
    )
    result = await agent.run("Check checkout-api and scale it up by one if it's under 3 replicas")
    print(result.status)   # "waiting_on_human" — paused before the mutating call
    print(result.output)

asyncio.run(main())

The run pauses with status=waiting_on_human the moment the model decides to call apply_manifest, since that tool is marked requires_approval=True. Nothing executes against the real cluster until that's explicitly approved.

Resuming a paused run

Resuming works from a completely independent process — this is the actual crash-recovery guarantee, not just a convenience method:

# Anywhere else, any time later, sharing only the same store file:
state.scratch["_approved_tools"] = {"apply_manifest": True}
# (persist that as a checkpoint — see examples/kubectl_agent for the full pattern;
#  Agent.approve() is not yet a polished one-liner, see Known Gaps below)

result = await agent.resume(run_id)
print(result.status)  # "completed"

What you can build with this today

Tool-calling agents where some actions are safe to auto-run and others need a human in the loop first — infrastructure agents, ops bots, anything touching a database or cluster.
Agents that need to survive a crash or restart mid-task. agent.resume(run_id) works from a totally different process than the one that started the run.
Long-running approval workflows — start a run, let it sit paused for hours, approve it from a Slack bot or web UI later, resume it from anywhere with access to the same store.
Multi-turn tool use — the agent loop keeps calling tools and feeding results back to the model until it produces a final answer with no more tool calls.

Known gaps (honest, not aspirational)

No MCP integration yet. Every tool today is a hand-written Python function via @tool. Connecting to external MCP servers is the next phase of work.
No real concurrency control. Running many agents at once against a shared rate limit isn't implemented.
No CLI or deploy story. kestrion deploy --target k8s doesn't exist yet — you'd containerize and deploy this yourself today.
Agent.approve() is a stub. Approving a paused run currently means manually setting state.scratch["_approved_tools"] and saving a checkpoint by hand (see examples/kubectl_agent), not a polished one-line call.
SQLite only. A CheckpointStore Protocol exists so Postgres can be added without touching the engine, but that implementation doesn't exist yet.

Examples

examples/kubectl_agent — the original worked example, demonstrating pause-on-approval and resume-after-restart using the raw Engine/Node primitives directly (useful for understanding what Agent builds on top of).

Development

git clone https://github.com/<your-username>/kestrion.git
cd kestrion
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v

Roadmap

See the full phased build plan in the repo for the path to 1.0.0. Short version: MCP client/server integration, a scheduler for safe concurrent execution, a CLI with Kubernetes deploy support, Postgres-backed storage, and a docs site are next.

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 25, 2026

0.0.1

Jun 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kestrion-0.1.0.tar.gz (30.8 kB view details)

Uploaded Jun 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kestrion-0.1.0-py3-none-any.whl (27.8 kB view details)

Uploaded Jun 25, 2026 Python 3

File details

Details for the file kestrion-0.1.0.tar.gz.

File metadata

Download URL: kestrion-0.1.0.tar.gz
Upload date: Jun 25, 2026
Size: 30.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for kestrion-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7c712956603513bd698ba2041d1bff9a98e5484bea40c5c16951742eaa199fbe`
MD5	`9ed674e46926e7d63eb795f713c5db39`
BLAKE2b-256	`cbe5c35994f016d7c9873bfb36084866180d8872b5fa84c5b03fa78d18e66fb2`

See more details on using hashes here.

File details

Details for the file kestrion-0.1.0-py3-none-any.whl.

File metadata

Download URL: kestrion-0.1.0-py3-none-any.whl
Upload date: Jun 25, 2026
Size: 27.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for kestrion-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0a23fe3584de8fe0a178ca248aad0eff4a92deb5d4c5ffb1d43326c637570a0e`
MD5	`7ef982539492a3d731e1620370f546ba`
BLAKE2b-256	`b1ca06c71d1dbb788b09a20afd24762b904ea7921acaff9c5902c6ff00750d72`

See more details on using hashes here.

kestrion 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Kestrion

Why Kestrion

Install

Quickstart

Resuming a paused run

What you can build with this today

Known gaps (honest, not aspirational)

Examples

Development

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes