Checkpoint, recovery, and replay infrastructure for AI agents.

These details have not been verified by PyPI

Project links

Project description

Living AI

Crash recovery, checkpointing, and replay for AI agents — one runtime that works across LangGraph, CrewAI, and OpenAI Agents.

The problem

AI agents crash. A process dies mid-workflow — after the LLM reasoned, after the tool charged a card, three steps into a ten-step plan — and all of that work is gone. You restart from zero, pay for the tokens again, and hope the tool doesn't fire its side effects twice. And when something goes wrong, you can't replay what happened to understand why.

The solution

Living AI records every step of an agent execution to an append-only log, so any run can be:

Recovered — resume from the last durable checkpoint after a crash, replaying only the idempotent work and never re-running side-effecting tool calls (payments, emails, API writes).
Replayed — re-run a recorded execution for debugging, with MOCK_TOOLS mode returning recorded tool responses so you can iterate on reasoning without real API calls.
Audited — inspect cost, latency, and the full node graph of any run.

Why it's different

Most observability and checkpointing tools lock you into one framework. Living AI ships a framework-agnostic core with thin adapters for all three major agent frameworks — the same recovery guarantees whether you use LangGraph, CrewAI, or the OpenAI Agents SDK:

from livingai.adapters import LangGraphAdapter, CrewAIAdapter, OpenAIAgentsAdapter

And the core has zero runtime dependencies — it's pure standard library.

Install

pip install livingai

Crash recovery in 18 lines

import asyncio
from livingai import (
    CheckpointEngine, ExecutionNode, NodeType, RecoveryEngine, SQLiteStore, Status,
)


async def main():
    engine = CheckpointEngine(SQLiteStore("agent.db"))

    # Your agent checkpoints after an expensive step.
    step = ExecutionNode(execution_id="run-1", type=NodeType.PROMPT,
                         status=Status.SUCCESS, output="plan ready")
    await engine.save(step, state=b"...serialized agent state...")

    # A tool with real side effects runs (e.g. charging a card).
    charge = ExecutionNode(execution_id="run-1", type=NodeType.TOOL,
                           status=Status.SUCCESS, output={"receipt": "R-1"})
    await engine.save(charge)

    # 💥 The process crashes. On restart, recover from the durable log:
    recovery = RecoveryEngine(CheckpointEngine(SQLiteStore("agent.db")))
    plan = await recovery.plan("run-1")
    print("resume from :", plan.resume_node_id)        # last durable checkpoint
    print("replay safe :", len(plan.replay_nodes))     # idempotent work to redo
    print("skip effects:", len(plan.skipped_nodes))    # card is NOT re-charged


asyncio.run(main())

resume from : d482c31e-...
replay safe : 0
skip effects: 1          # the card is never charged twice

The examples/ directory has five runnable demos (crash recovery, MOCK_TOOLS debugging, cost tracking, and the LangGraph adapter) — none require an LLM or network.

Performance

Checkpointing is on the hot path of every agent step, so it has to be fast. It is.

Metric	Result	Notes
Checkpoint write (p50)	~0.3 ms	50 KB compressed state blob
Checkpoint write (p95)	~0.8 ms
Checkpoint write (p99)	~1 ms	~50× under the 50 ms budget
Hot recovery read	~4 µs	vs ~190 µs cold — ~40× faster
Compression	60–99%	typical agent state (histories, docs)

Measured on a dev laptop with the default 50 ms overhead budget, 50 KB blobs, 2000 writes — the same configuration you get out of the box. Reproduce with python benchmarks/benchmark.py.

The overhead budget is enforced in code: a checkpoint write that would exceed it is dropped and logged as missed rather than ever blocking your agent thread.

How it works

ExecutionNode ──► CheckpointStore (Tier 2: durable, append-only)
      ▲                  ▲
      │                  │
 Adapters          CheckpointEngine ──► HotCache (Tier 1: LRU + TTL)
 (LangGraph/             │
  CrewAI/           RecoveryEngine ──► RecoveryPlan (replay vs. skip)
  OpenAI)           ReplaySession  ──► FULL / FROM_NODE / MOCK_TOOLS / COUNTERFACTUAL

Every execution is a DAG of ExecutionNode records. The log is never mutated, only appended to — so any point in time can be reconstructed deterministically. TOOL nodes default to non-idempotent, which is how recovery knows never to re-run side effects. See docs/concepts.md for the full model.

CLI

livingai list   --db agent.db          # execution ids
livingai show   run-1 --db agent.db    # the node graph
livingai replay run-1 --db agent.db --mode MOCK_TOOLS

Documentation

Quickstart · Concepts · Checkpointing · Recovery · Replay · CLI · Adapters · Migrating from other checkpointers · API Reference

Quality

108 tests, 100% line coverage — including crash-simulation and stress tests (10k-node graphs, concurrent writers, write contention).
mypy --strict clean across all source files; ships py.typed.
CI matrix on Python 3.9–3.12 with a 100%-coverage gate.

pip install -e ".[dev]"
python -m pytest -q                    # run the suite
mypy --strict livingai                 # type check
python benchmarks/benchmark.py         # reproduce the numbers above

Design principles

Principle	How
Zero-dependency core	Standard library only (`sqlite3`, `asyncio`, `zlib`, `dataclasses`, `uuid`).
Append-only log	Every write inserts a new row; nothing is mutated or deleted.
Framework-agnostic	No framework imports in the core; framework data lives in `metadata`.
Async-first I/O	Storage is `async`; sync SQLite runs off the event loop.
Bounded overhead	Cold writes run under `asyncio.wait_for`; overruns are dropped, never blocking the agent.

Roadmap

Shipped: core data model, checkpoint engine, recovery engine, replay engine, CLI, LangGraph / CrewAI / OpenAI adapters, benchmarks, docs, Redis store, PostgreSQL store.

Optional backends — swap the default SQLite store for Redis or PostgreSQL with a single import (no core changes required):

pip install "livingai[redis]"     # hot Redis store
pip install "livingai[postgres]"  # PostgreSQL cold store

from livingai.stores.redis import RedisStore
from livingai.stores.postgres import PostgresStore

# Redis
engine = CheckpointEngine(RedisStore(url="redis://localhost:6379"))

# PostgreSQL
store = PostgresStore(dsn="postgresql://user:pass@localhost/livingai")
await store.initialize()          # creates tables once
engine = CheckpointEngine(store)

A Docker Compose dev stack (Postgres + Redis) ships with the repo:

docker compose up -d    # starts postgres:5432 + redis:6379

Next: FastAPI cloud backend (5 endpoints), cloud client (CloudSync), web replay dashboard.

Contributing

See CONTRIBUTING.md — development setup, running tests, code style, and how to add a new framework adapter or storage backend.

License

Apache-2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.1

Jul 2, 2026

0.4.0

Jul 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

livingai-0.4.1.tar.gz (55.8 kB view details)

Uploaded Jul 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

livingai-0.4.1-py3-none-any.whl (32.9 kB view details)

Uploaded Jul 2, 2026 Python 3

File details

Details for the file livingai-0.4.1.tar.gz.

File metadata

Download URL: livingai-0.4.1.tar.gz
Upload date: Jul 2, 2026
Size: 55.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for livingai-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`74d114d127729abfdbe32a48c5d616e13c6db9a825941672ed4f125ba207aedc`
MD5	`5c06013868bdba85c913f09f9fd37946`
BLAKE2b-256	`b506472b91c30afedbf09cf1110b57e650f18e8f6e51fcc7647fefe89b652988`

See more details on using hashes here.

File details

Details for the file livingai-0.4.1-py3-none-any.whl.

File metadata

Download URL: livingai-0.4.1-py3-none-any.whl
Upload date: Jul 2, 2026
Size: 32.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for livingai-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`148df4cb46d736dbd7263b2386768c8f75efe86478cb779c496ad47931e27b81`
MD5	`4b284ef504717eeb28a6b7ae04c3a85c`
BLAKE2b-256	`d55cbb5d2827a0754400ae116d5adb50b2f55bb6414d8f01dabd1a76ba7c5686`

See more details on using hashes here.

livingai 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Living AI

The problem

The solution

Why it's different

Install

Crash recovery in 18 lines

Performance

How it works

CLI

Documentation

Quality

Design principles

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes