A lightweight Python SDK for building fault-tolerant LLM agent workflows. It enables agent systems to recover from crashes by replaying execution from checkpoints, without re-invoking deterministic operations like LLM calls or completed tool invocations.
Project description
Waypoint
A Python SDK for making LLM agent workflows fault-tolerant via event sourcing.
When an agent crashes mid-execution, Waypoint lets you resume from the last successful step—without re-running LLM calls or tool invocations that already completed. It does this by logging every step's input/output to an append-only PostgreSQL journal, then replaying from checkpoints on recovery.
Getting Started
Prerequisites
- Docker + Docker Compose
- uv (Python 3.13+)
Clone & Start
git clone git@github.com:aybruhm/waypoint.git
cd waypoint
make up
This starts the API gateway on http://localhost:9654 and PostgreSQL. The gateway auto-reloads on code changes.
Run Migrations
make run_migrations
Run Examples
# 3-step agent, no LLM
uv run python -m sdk.examples.simple_agent
# Mocked LLM + crash recovery demo
uv run python -m sdk.examples.agent_with_llm_mock
Stop
make down
Makefile Reference
| Command | Description |
|---|---|
make up / make start |
Build & start containers (detached) |
make down / make stop |
Stop & remove containers |
make run_migrations |
Apply pending Alembic migrations |
make revert_migrations |
Roll back last migration |
make add_migration MSG="msg" |
Auto-generate new migration |
make show_current_db_head |
Show current migration version |
make show_db_heads |
List all migration heads |
What It Solves
LLM agent crashes create three problems:
- Wasted spend: LLM calls that succeeded before the crash get re-invoked on retry.
- Lost context: No record of what happened, what state the agent was in, or which step failed.
- Duplicate effects: Retrying a tool call (e.g., an API write) can create duplicates or break idempotency.
Waypoint avoids all three by persisting every step's result. On crash, you resume from the checkpoint—cached LLM responses return instantly, tool outputs are reused, and execution continues from the next step.
Architecture
Agent Code
↓
@checkpoint decorators (Waypoint SDK)
↓
┌────────────────┬─────────────────┬──────────────────┐
│ Event Journal │ Checkpoint Mgr │ Replay Engine │
│ (append-only) │ (progress) │ (deterministic) │
└────────────────┴─────────────────┴──────────────────┘
↓
PostgreSQL
Core Concepts
| Concept | Description |
|---|---|
| Execution | A single run of an agent workflow, identified by a UUID. |
| Step | A decorated async function (@checkpoint("name")). Each step runs once per execution. |
| Checkpoint | A persisted record of a step's input/output + execution position. |
| Event Journal | Append-only log of all steps across all executions (PostgreSQL). |
| Replay | Reconstructing state by reading checkpoints in order, skipping re-execution. |
How It Works
@checkpoint("step_name")
async def my_step(input):
return output
The decorator:
- Checks if a checkpoint exists for this step in the current execution.
- If yes: returns cached output immediately (no function execution).
- If no: runs the function, persists input/output as a checkpoint, returns output.
On crash, create a new Waypoint instance and call resume(execution_id). The SDK rebuilds state from the journal and continues from the next uncompleted step.
Key Properties
- Deterministic replay — Same inputs always produce same outputs; no re-execution.
- LLM call caching — Cached responses are returned on replay (zero token cost).
- Framework-agnostic — Works with LangChain, CrewAI, custom async agents, FastAPI, etc.
- Minimal integration — Add
@checkpointdecorators (one per step). ~3 lines of change per step. - Full history — Query every step, error, and state transition by execution ID.
When to Use
- Long-running agent workflows (minutes to hours) where crashes are expensive.
- Cost-sensitive apps where re-calling LLMs on retry is unacceptable.
- Teams needing audit trails for agent behavior and debugging.
- Agent-as-a-service platforms running untrusted/user-submitted agents.
When Not to Use (Next Steps)
- Distributed/multi-machine workflows (Waypoint is single-process).
- High-throughput task queues (use Celery, Temporal, etc.).
- Simple chatbots with no multi-step orchestration.
Stack
- Python 3.13+
- asyncio
- FastAPI (gateway demo only; SDK is framework-agnostic)
- PostgreSQL (events + checkpoints)
- Pydantic + JSON serialization
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file waypoint_sdk-0.1.0.tar.gz.
File metadata
- Download URL: waypoint_sdk-0.1.0.tar.gz
- Upload date:
- Size: 20.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e68185f87086012e139da31ef2dfe70676d7af3dc809f220e0660ab9bf7c26a
|
|
| MD5 |
d99908737d768ba0bbb1cc38a9d176d8
|
|
| BLAKE2b-256 |
3e0efd790e4581fb14ee77c5795fe96182859a7af1545430208a686ac79e8619
|
File details
Details for the file waypoint_sdk-0.1.0-py3-none-any.whl.
File metadata
- Download URL: waypoint_sdk-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59aa21191e7306f2d569c75275ebfd76dbc27d0167af98ffc10e869d1414a9fc
|
|
| MD5 |
f1112960f9ef99f9d3cd494995dda430
|
|
| BLAKE2b-256 |
84f9dd1c42a03f9d857933962872650404d489c52b9433760f9c3b61c738e9da
|