Framework-agnostic state persistence for AI agents. Resume any agent from exactly where it stopped.
Project description
checkpnt
Framework-agnostic state persistence for AI agents.
Resume any agent from exactly where it stopped.
from checkpnt import Client, Framework
async with Client.sqlite() as client:
# Save agent state mid-execution
checkpoint_id = await client.save(
agent_id="invoice-processor",
framework=Framework.LANGGRAPH,
execution_state=graph_state,
context={"invoice_id": "INV-4821", "step": "validate"},
)
# Crash. Restart. Resume from exact position.
checkpoint = await client.restore(checkpoint_id)
graph_state = checkpoint.execution_state
The Problem
AI agents run multi-step processes. Every step, they accumulate state — tool results, decisions, intermediate data, coordination context. That state lives in memory.
The moment anything interrupts the agent — a crash, a restart, a hot reload during development, a deployment — the state is gone. The agent has no memory of where it was. It must start over.
This is not a corner case. It is the daily experience of every team building production agents.
From LangGraph Issue #5790, filed August 2025:
"Every code change triggering hot reload forces recreating conversation state from scratch. This severely impacts development efficiency."
"After restarting, all conversation state is lost. A checkpoint.db file is created but remains empty — it never receives data."
"Users must choose between Send objects OR checkpointing. You cannot use both."
LangGraph maintainers closed Issue #5790 as "by design". The langgraph dev command intentionally replaces any configured checkpointer with in-memory storage — it forces you onto LangGraph Platform. Every developer who refuses that lock-in is who Checkpnt is built for.
Quick Start
pip install checkpnt
Save and restore state — any framework, 5 lines:
import asyncio
from checkpnt import Client, Framework
async def main():
async with Client.sqlite() as client:
# Your agent runs some steps...
graph_state = {"messages": ["result from step 3"], "step": 3}
# Save state
checkpoint_id = await client.save(
agent_id="my-agent",
framework=Framework.LANGGRAPH,
execution_state=graph_state,
context={"run_id": "run-001"},
step_index=3,
step_name="after_tool_call",
)
print(f"Saved: {checkpoint_id}")
# Simulate crash — restart from here
checkpoint = await client.restore(checkpoint_id)
print(f"Restored step: {checkpoint.execution_state['step']}")
# → Restored step: 3
asyncio.run(main())
LangGraph drop-in — one line change:
# Before (state lost on every langgraph dev restart):
from langgraph.checkpoint.sqlite import SqliteSaver
with SqliteSaver.from_conn_string("checkpoints.db") as saver:
app = graph.compile(checkpointer=saver)
# After (state survives everything, no platform lock-in):
from checkpnt.adapters.langgraph import CheckpntSaver
async with CheckpntSaver.from_sqlite("./checkpnt_local.db") as saver:
app = graph.compile(checkpointer=saver)
The Five Operations
That is all there is.
| Operation | What it does |
|---|---|
save() |
Persist agent state at any point in execution |
restore() |
Resume from exact saved position — integrity verified |
handoff() |
Transfer state between agents with parent chain preserved |
timeline() |
Full execution history, newest first |
expire() |
Delete immediately, or set TTL on save for auto-expiry |
async with Client.sqlite() as client:
# 1. Save
checkpoint_id = await client.save(
agent_id="processor",
framework=Framework.LANGGRAPH,
execution_state=state,
session_id="run-001",
step_index=7,
ttl_seconds=3600, # auto-expire after 1 hour
)
# 2. Restore
checkpoint = await client.restore(checkpoint_id)
# 3. Handoff — Agent A → Agent B, state travels with it
handoff_id = await client.handoff(checkpoint_id, target_agent_id="approval-agent")
# 4. Timeline — full audit trail
history = await client.timeline(agent_id="processor", session_id="run-001")
for cp in history:
print(f" step {cp.step_index}: {cp.step_name}")
# 5. Expire
await client.expire(checkpoint_id)
LangGraph Adapter
The LangGraphAdapter and CheckpntSaver are the direct answer to Issue #5790.
from checkpnt.adapters.langgraph import LangGraphAdapter, CheckpntSaver
# Pattern 1: Drop-in checkpointer (zero code change to your graph)
async with CheckpntSaver.from_sqlite("./checkpnt_local.db") as saver:
app = graph.compile(checkpointer=saver)
result = app.invoke(input, config=config)
# State now persists across langgraph dev restarts, crashes, deployments.
# Pattern 2: Manual — full control over checkpoint granularity
adapter = LangGraphAdapter()
state_snapshot = app.get_state(config)
checkpoint = adapter.extract(
state_snapshot,
agent_id="my-agent",
context={"invoice_id": "INV-001"}, # your own keys — stored separately from framework state
)
checkpoint_id = await client._backend.save(checkpoint)
# Restore
checkpoint = await client.restore(checkpoint_id)
restored = adapter.reconstruct(checkpoint)
# restored["values"] → your graph state dict
# restored["next"] → pending nodes
# restored["config"] → RunnableConfig to resume with
Proof this is a real problem: The examples/langgraph/ directory contains:
issue_5790_reproduced.py— the exact code that exposed the bugcheckpoints.db— the real SQLite database from that session, with 7 checkpoints in an unbroken parent chain, timestamped March 4 2026
Run python issue_5790_reproduced.py. It works. Then run langgraph dev. Watch it break. That is why this library exists.
Backends
SQLite — local development
client = Client.sqlite("./checkpnt_local.db") # default: ./checkpnt_local.db
Zero dependencies beyond aiosqlite. WAL mode enabled. Full history queries indexed. Works offline. Commits to a file you can inspect directly.
Redis — production
client = Client.redis("redis://localhost:6379")
Sub-millisecond restores. Native TTL expiry. Sorted set history queries — O(log N), not O(N). Single-pipeline atomic writes. The pub/sub infrastructure is already there for multi-agent coordination (coming in v0.2).
# Spin up Redis locally
docker run -p 6379:6379 redis:7-alpine
Switching backends
# Same API, swap one line
client = Client.sqlite("./dev.db") # local
client = Client.redis("redis://prod:6379") # production
Your agent code never changes.
State Schema
Every checkpoint carries:
checkpoint.checkpoint_id # time-ordered UUID — efficient range queries
checkpoint.agent_id # stable identifier across restarts
checkpoint.session_id # groups checkpoints within one run
checkpoint.parent_id # forms an append-only execution tree
checkpoint.framework # langgraph | crewai | autogen | custom
checkpoint.schema_version # "1.0" — versioned for forward migrations
checkpoint.step_index # monotonically increasing within a session
checkpoint.execution_state # framework-specific state (LangGraph, CrewAI, etc.)
checkpoint.agent_context # your developer-owned dict (unstructured, no schema enforced)
checkpoint.checksum # SHA-256 — integrity verified on every restore
checkpoint.created_at # UTC timestamp
checkpoint.ttl_seconds # auto-expiry
checkpoint.handoff_target # for multi-agent coordination (v0.2)
Checkpoints are immutable. There is no update(). Every new state creates a new checkpoint with a parent_id link. This gives you:
- Free time travel — walk the parent chain to any previous state
- Free branching — two agents can fork from the same parent
- Free auditability — every state the agent was ever in is reconstructable
Installation
# Core (SQLite backend)
pip install checkpnt
# With Redis backend
pip install "checkpnt[redis]"
# With LangGraph adapter
pip install "checkpnt[langgraph]"
# Everything
pip install "checkpnt[all]"
Requires Python 3.11+.
Architecture
Checkpnt is built in layers. v0.1 ships Layer 1. The data model is designed to never require a breaking migration to reach Layers 2–5.
Layer 5 — Coordination Protocol typed state contracts between agents, pub/sub
Layer 4 — State Diffing compare runs, diagnose failures
Layer 3 — Time Travel replay any agent from any checkpoint
Layer 2 — Cross-Framework agent starts in LangGraph, resumes in CrewAI
Layer 1 — Survival ← v0.1 crash recovery, five operations, two backends
Read ARCHITECTURE.md for the full design — why every v0.1 field anticipates Layers 2–5, and what was deliberately left out.
Supported Frameworks
| Framework | Adapter | Status |
|---|---|---|
| LangGraph | checkpnt.adapters.langgraph |
✅ v0.1 |
| CrewAI | checkpnt.adapters.crewai |
🔜 v0.2 |
| AutoGen | checkpnt.adapters.autogen |
🔜 v0.2 |
| Custom | Framework.CUSTOM |
✅ v0.1 |
For custom frameworks, pass your state as a plain dict to execution_state. No adapter needed.
Compared to Alternatives
| Checkpnt | Mem0 | LangGraph checkpointer | DIY Redis | |
|---|---|---|---|---|
| What it persists | Execution state | Semantic memory | Execution state | Whatever you build |
| Framework-agnostic | ✅ | ✅ | ❌ LangGraph only | ✅ |
Works in langgraph dev |
✅ | — | ❌ by design | ✅ |
| Production backend | Redis | Managed | LangGraph Platform | Redis |
| Open source | ✅ BSL | ✅ Apache | ✅ MIT | — |
| Schema versioning | ✅ | — | ❌ | — |
| Integrity checks | ✅ | — | ❌ | — |
| Multi-agent handoff | 🔜 v0.2 | ❌ | ❌ | Build it yourself |
Mem0 vs Checkpnt: These are different primitives. Mem0 persists what your agent knows (semantic memory). Checkpnt persists where your agent was (execution state). Most production systems need both.
Development
git clone https://github.com/vaibhav-v2/checkpnt
cd checkpnt
pip install -e ".[dev]"
pytest tests/unit/ # 55 tests, no external dependencies
pytest tests/integration/ # Redis tests — requires Redis running
Contributing
See CONTRIBUTING.md. Issues and PRs welcome.
If you are hitting a state persistence problem that Checkpnt does not solve, open an issue with the framework, the error, and what you expected to happen. That is how this library gets better.
License
Business Source License 1.1 — source-available, not open for competing SaaS forks.
Converts to Apache 2.0 on January 1, 2029.
Built by Tech4Biz Solutions · github.com/vaibhav-v2/checkpnt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file checkpnt-0.1.0.tar.gz.
File metadata
- Download URL: checkpnt-0.1.0.tar.gz
- Upload date:
- Size: 26.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5729a4ec763fa4de5a918984b51e6fa4386a8ceb641a98dd0fbe5d47b8732ae4
|
|
| MD5 |
66841f4f9b9d634d972724a2eb1f0054
|
|
| BLAKE2b-256 |
97855183e80c4e5ee0a40d0c80a84856ffde492ef924a30d97ef398e062f0a6a
|
File details
Details for the file checkpnt-0.1.0-py3-none-any.whl.
File metadata
- Download URL: checkpnt-0.1.0-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45fca1cb54be43ba46fbdc241fa7126714ab24e5e550c083805071cf9d92f666
|
|
| MD5 |
e45ef8b886348d8b858ea27a76a379d0
|
|
| BLAKE2b-256 |
da6deddd30a6e402cbec3a0e02f0a7c18a5dceaf5253821ffb1803e6a1768cd1
|