Skip to main content

Deterministic state-machine guards for AI-agent workflows: enforce which tools an agent may call, in which order, with loop detection, cost budgets and step caps.

Project description

statepilot

Deterministic state-machine guards for AI-agent workflows.

Define a state machine, then enforce — at runtime — which tools your agent may call, in which order, with loop detection, a cost budget, and a hard step cap. The agent only gets to do what the state machine allows. Anything else raises.

Zero runtime dependencies in the core. Fully typed. Python 3.10+.

pip install statepilot

The problem

A recurring theme in 2026 agent tooling is "wrap the non-deterministic LLM in deterministic code." A few data points that frame the gap:

  • Statewright (Rust + MCP) put deterministic state machines for agents on the map — it reached the Hacker News front page and lives at https://github.com/statewright/statewright. The framing clearly resonates.
  • llm-canary ships policy gates for agent traces — tool order, cost budgets, runaway-loop checks — but as a post-hoc test layer over recorded traces, not as runtime enforcement.
  • Orchestrators like LangGraph, CrewAI and the OpenAI Agents SDK are excellent at routing, but they don't hand you a small, hard rule that says "tool X is illegal in state Y, full stop."

The missing piece is a Python-native runtime guard: a thin layer you put in front of every tool call that enforces the allowed transitions and trips on loops and budget overruns. That is what statepilot is.

It does not orchestrate, plan, or call your LLM. It is the bouncer at the door.

Quickstart (Python builder)

from statepilot import StateMachine, Pilot

machine = (
    StateMachine.builder()
    .initial("research")
    .transition("research", "research", tool="search")      # looping allowed...
    .transition("research", "draft", tool="write_draft")
    .transition("draft", "review", tool="review")
    .transition("review", "draft", tool="revise")           # send it back
    .transition("review", "published", tool="publish")
    .terminal("published")
    .build()
)

pilot = Pilot(machine, budget=5.0, max_state_visits=4, max_steps=20)

pilot.step("search", cost=0.5)        # ok, still in "research"
pilot.step("write_draft", cost=1.0)   # -> "draft"
pilot.step("review")                  # -> "review"
pilot.step("publish")                 # -> "published" (terminal)

pilot.step("review")                  # raises TransitionError: terminal state

Every accepted step is recorded:

for record in pilot.history:
    print(record.index, record.source, "--", record.tool, "->", record.dest)

The @guarded decorator

Bind your actual tool functions to the pilot. The guard runs before the function body, so a violation means the body never executes.

from statepilot import StateMachine, Pilot, guarded, GuardViolation

machine = (
    StateMachine.builder()
    .initial("research")
    .transition("research", "research", tool="search")
    .transition("research", "draft", tool="write_draft")
    .terminal("draft")
    .build()
)
pilot = Pilot(machine, budget=5.0)

@guarded(pilot, cost=1.0)                 # tool name defaults to the function name
def search(query: str) -> list[str]:
    return real_search(query)

@guarded(pilot, tool="write_draft")       # or name it explicitly
def make_draft(notes: list[str]) -> str:
    return real_draft(notes)

search("agent guardrails")                # advances the machine, charges 1.0
make_draft(["..."])                       # -> "draft"

try:
    make_draft(["..."])                   # already terminal
except GuardViolation as exc:
    print("blocked:", exc)

YAML definition

Prefer config over code? Define the machine in YAML and load it. (YAML support is an optional extra: pip install statepilot[yaml].)

# pipeline.yaml
initial: research
terminal:
  - published
transitions:
  - {from: research, to: research, tool: search}
  - {from: research, to: draft,    tool: write_draft}
  - {from: draft,    to: review,   tool: review}
  - {from: review,   to: published, tool: publish}
from statepilot import StateMachine, Pilot

machine = StateMachine.from_yaml_file("pipeline.yaml")  # from a file
# or: StateMachine.from_yaml(yaml_string)               # from an inline string
pilot = Pilot(machine, budget=5.0)

states may be omitted — it is inferred from initial, terminal, and every state named in transitions. StateMachine.from_dict(...) accepts the same shape if you already have a dict.

A realistic agent example

"Research, then draft, then review, then publish. Never publish before review. Allow at most 3 research loops. Stop if cost exceeds $5."

from statepilot import StateMachine, Pilot, guarded, GuardViolation

machine = (
    StateMachine.builder()
    .initial("research")
    .transition("research", "research", tool="search")
    .transition("research", "draft", tool="write_draft")
    .transition("draft", "review", tool="review")
    .transition("review", "draft", tool="revise")
    .transition("review", "published", tool="publish")
    .terminal("published")
    .build()
)

# initial visit counts as 1, so max_state_visits=4 allows 3 extra research loops
pilot = Pilot(machine, budget=5.0, max_state_visits=4, max_steps=25)

@guarded(pilot, cost=0.8)
def search(q: str) -> str: ...

@guarded(pilot, cost=1.2)
def write_draft(notes: str) -> str: ...

@guarded(pilot)
def review(draft: str) -> bool: ...

@guarded(pilot, cost=0.3)
def publish(draft: str) -> str: ...

The agent loop calls these as it sees fit. statepilot makes the illegal paths impossible:

  • calling publish() while still in research -> TransitionError
  • a 4th search() loop -> LoopLimitExceeded
  • cumulative cost over $5 -> BudgetExceeded
  • more than 25 steps -> StepLimitExceeded

A runnable version is in examples/research_pipeline.py.

Why deterministic guards

LLMs are probabilistic. Most of the time the model follows the plan; occasionally it calls publish before review, gets stuck re-searching the same thing, or burns the budget. "Most of the time" is not a guarantee, and prompt-only constraints are suggestions, not enforcement.

A state machine turns those soft expectations into a hard contract that lives in code, runs on every tool call, and is trivial to unit-test. You get:

  • Safety — illegal tool sequences cannot happen; they raise instead.
  • Cost control — a real budget cap, enforced before the expensive call runs.
  • Loop protection — runaway repetition trips a clear, typed exception.
  • Auditabilitypilot.history and pilot.to_trace() give you a complete, JSON-serialisable record of what the agent actually did.

It is intentionally small. The whole core is a StateMachine plus a Pilot, and the runtime cost is a dict lookup and a few integer comparisons per step.

API reference

StateMachine

Immutable, validated machine definition. Carries no runtime state.

  • StateMachine.builder(initial=None) -> StateMachineBuilder — fluent builder.
  • StateMachine.from_dict(data) -> StateMachine — build from a mapping.
  • StateMachine.from_yaml(text) -> StateMachine — build from an inline YAML string (needs the yaml extra).
  • StateMachine.from_yaml_file(path) -> StateMachine — build from a YAML file (needs the yaml extra).
  • .to_dict() — round-trips with from_dict.
  • .allowed_tools(state) -> tuple[str, ...]
  • .resolve(state, tool) -> str | None — destination, or None if disallowed.
  • .is_terminal(state) -> bool

StateMachineBuilder

  • .initial(state), .state(*states), .transition(src, dest, *, tool), .terminal(*states), .build(). Every mutator returns self.

Pilot

Stateful runtime enforcer. Construct with the machine and optional limits:

Pilot(
    machine,
    budget=None,              # cumulative cost cap
    max_steps=None,           # total steps cap
    max_state_visits=None,    # per-state visit cap (initial state counts as 1)
    max_consecutive_tool=None # same tool back-to-back cap
)
  • .step(tool, *, cost=0.0) -> str — validate + apply; returns the new state. Raises on violation; state is unchanged on failure.
  • .can(tool, *, cost=0.0) -> bool — pure check, never mutates, never raises.
  • .allowed_tools() -> tuple[str, ...], .state, .done, .steps_taken, .cost_spent, .history.
  • .to_trace() -> dict — JSON-serialisable run trace.
  • .reset() — back to the initial state, clears cost/counters/history.

@guarded(pilot, *, tool=None, cost=0.0)

Decorator that calls pilot.step(...) before the function body. tool defaults to the function name.

Exceptions

StatepilotError
├── StateMachineError          # invalid machine definition (definition-time)
└── GuardViolation             # runtime rule broken — catch this for "agent misbehaved"
    ├── TransitionError        # tool not allowed in the current state
    ├── LoopLimitExceeded      # state revisited / tool repeated too often
    ├── BudgetExceeded         # cumulative cost over budget
    └── StepLimitExceeded      # too many total steps

LangGraph adapter (experimental)

If you orchestrate with LangGraph, statepilot.adapters.guard_node wraps a node so the pilot guards it:

from statepilot import StateMachine, Pilot
from statepilot.adapters import guard_node
# from langgraph.graph import StateGraph

pilot = Pilot(machine, budget=5.0)
# graph = StateGraph(MyState)
# graph.add_node("research", guard_node(pilot, research_node, cost=1.0))
# graph.add_node("draft",    guard_node(pilot, draft_node))

It targets LangGraph's stable node contract (a callable state -> partial state dict) and never imports langgraph itself, so it adds no import-time dependency and does not break when the LangGraph API changes. It is deliberately minimal and marked experimental — conditional edges, Send fan-out, and checkpoint/resume are out of scope. For full control, just drive the Pilot inside your own node functions; that path is fully supported.

The adapter needs no extra dependency — it works with any callable. Install LangGraph in your own project if you use it.

Concurrency

A Pilot holds the mutable state of one agent run and is not thread-safe — use one pilot per run, don't share it across threads, and call pilot.reset() to reuse it. pilot.history is an immutable snapshot (a tuple), so reading or logging it can never desync the run's guards.

Status

Beta (0.1.0). The core API (StateMachine, Pilot, @guarded) is what we intend to keep stable. No benchmarks are claimed — the design goal is correctness and a tiny footprint, not throughput. Issues and PRs welcome.

License

MIT © 2026 StudioMeyer. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statepilot-0.1.0.tar.gz (23.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

statepilot-0.1.0-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file statepilot-0.1.0.tar.gz.

File metadata

  • Download URL: statepilot-0.1.0.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for statepilot-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fa643842cb0ba66890dd1cabd7477f1449cc27060ce4a9dc1fa46a0d7558a073
MD5 8ab2cc1608ac9799e9c1454dbcf547ce
BLAKE2b-256 bf21a62d991f28b76916e34c09bca6f975f2520270dad75a67ea2b33a7d57d28

See more details on using hashes here.

File details

Details for the file statepilot-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: statepilot-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for statepilot-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d411fcdedfb91ed40dba4b1baa5411a3f3ee52c687534270cc881251c3f6fa66
MD5 1672b104ebe7391b9519443a0efb944f
BLAKE2b-256 fbd84406712d4c583e21f4fed11f45b716e3b81dc6ed4d592be91945771480b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page