Agent memory with receipts: an MCP server over an append-only, hash-chained ledger for task state, memory, and verified handoffs across sessions and models.
Project description
Continuity
Continuity is the system of record for agent work: an agent continuity layer that captures task state, memory, decisions, provenance, and handoff context across models, sessions, and tools.
Current product premise: Continuity is not "better memory than a Markdown file"
for simple, sequential work. A strong, maintained HANDOFF.md matched
Continuity in the Stage A product-falsification benchmark. Continuity's sharper
claim is that it is a trust layer for AI work: it turns messy agent history
into resumable, inspectable, permissioned operational state, and can compile
that state into a compact human-readable handoff when a plain file is the
simplest interface.
The event log is the product. Task context, project memory, agent memory, workflow state, gates, timeline rows, and the Console are projections of one append-only, hash-chained ledger.
Every event can carry optional payload-level provenance and usage telemetry: task/session identity, agent/model identity, parent and consumed ledger sequence numbers, handoff source, token counts, usage source, and exact microdollar cost. This metadata stays in the payload layer so the immutable chain remains stable while provenance evolves.
The primary continuation path is structured context compilation: a task-chain
projection that returns current task state, ordered chronology, decisions,
unresolved conflict signals, and provenance chain. It is available through the
Python helper, FastAPI at /projects/{project_id}/tasks/{task_id}/continuation,
and MCP as compile_task_context.
Operational conflicts are first-class ledger events. When two agents produce
contradictory task assessments, Continuity records an OPERATIONAL_CONFLICT
that links the exact event sequences in disagreement. The conflict remains in
compiled context, FastAPI at /projects/{project_id}/tasks/{task_id}/conflicts,
and MCP until a human records an OPERATIONAL_CONFLICT_RESOLVED event.
Product Falsification Checkpoint
The Phase 6 entry gate compared three continuation paths: no shared context, a
strong manually maintained HANDOFF.md, and Continuity via MCP. The 30-run
local matrix produced no-context 0/10, strong HANDOFF.md 10/10, and
Continuity 10/10, so the result was a tie and durable runner work remains
blocked. See docs/testing/2026-06-24-product-falsification-results.md.
The next product-validation target is not another runner feature. It is a
Continuity-backed handoff workflow: use the ledger, memory, provenance,
validation, and conflict model to generate or verify a compact HANDOFF.md,
then test whether that beats a manual handoff under concurrency, stale-context
recovery, permissioned context, or audit requirements.
The follow-up IDEA-002 benchmark tested that narrower claim:
CONTINUITY_BENCHMARK_LOCAL_MODEL=qwen3.5:9b make trust-layer-handoff-benchmark
The completed 27-run local matrix produced no-context 0/9, manual handoff
6/9, and Continuity-backed verified handoff 9/9 under the original scoring.
That scoring could not credit the manual arm in provenance-audit scenarios even
when it honestly reported unverified; scoring was revised on 2026-07-02 to
score provenance honesty equally and track ledger-backed verification as a
separate capability. The honest claim: a well-maintained manual handoff
preserves task facts; Continuity uniquely provides ledger-backed source
verification, which a manual handoff is structurally unable to provide. See
docs/testing/2026-06-24-trust-layer-handoff-results.md.
Quick Start
As a user (installs the continuity-mcp MCP server command and
continuity-handoff receipt exporter):
pip install "git+https://github.com/machinedigital-ai/Continuity.git"
claude mcp add continuity --env CONTINUITY_DB="$HOME/continuity.db" -- continuity-mcp
As a developer (from a checkout):
make setup
make test
make serve
New here? The tutorial walks through the full loop in ~15 minutes: connect an agent, record work, resume in a fresh session or a different model, export the verified receipt, and read the receipt fields.
Then open the read-only Console:
http://127.0.0.1:8000/console
Common Commands
make setup # create/update continuity-core/.venv and install requirements
make test # run the test suite
make serve # run FastAPI at http://127.0.0.1:8000
make mcp # run the MCP server
make demo # run the recursive proof demo
make proof # export and verify the multi-model proof artifact
make ollama-chain # run the local multi-model Ollama chain proof
make codex-persistence-proof # run two isolated Codex sessions through Continuity
make claude-persistence-proof # run two isolated Claude sessions through Continuity
make cross-agent-persistence-proof # run the Codex-to-Claude handoff proof
make trust-layer-handoff-benchmark # run the IDEA-002 verified handoff benchmark
make list-verified-handoff-tasks # list exportable task IDs from a ledger
make export-verified-handoff # export verified HANDOFF.md from a task ledger
Proof Artifact
The completed Phase 4 Trust and Proof work exports a shareable Continuity proof artifact from real ledger events:
make proof
The command writes examples/multi_model_code_review.jsonl, a permissioned
internal artifact showing a real multi-agent review sequence: external review,
Codex triage, operational conflict, human resolution, gate approval, selected
timeline, continuation context, provenance, and ledger integrity.
The artifact intentionally exports sanitized summary payloads. It retains original ledger hashes and verifies exported chain-entry metadata, but it is not yet a standalone public notary proof for omitted ledger events.
Local Model Proofs
The local Ollama chain proof tests continuity across installed local models:
make ollama-chain
The dated finding and corrected targeted rerun are recorded in
docs/testing/2026-06-19-local-ollama-chain-proof.md. Across three corrected
runs, Continuity achieved 18/18 context-fidelity assertions, 9/9 provenance
assertions, 9/9 exact model outputs, and 3/3 valid ledgers using independent
Qwen, Gemma, and Ministral model families.
Persistent Agent Proofs
The same-agent proof harness starts two fresh client processes connected only through a dedicated Continuity SQLite ledger. It creates a random challenge after Session A exits, gives Session B only stable project/task/agent IDs, then verifies the output, linked validation, completion turn, ledger integrity, and sanitized proof artifact from recorded events.
make codex-persistence-proof
make claude-persistence-proof
make cross-agent-persistence-proof
Codex runs ephemerally with native memories disabled. Claude runs without session persistence and with only the explicit Continuity MCP configuration. See the persistent-agent proof runbook for exact boundaries, authentication checks, and troubleshooting. The cross-agent mode stores its post-Codex challenge in shared project memory and requires Claude's output and completion turn to carry Codex handoff provenance.
Product Falsification Gate
Before Phase 6 runner work, Continuity is compared against no context and a
strong structured HANDOFF.md. Stage A runs 30 fresh local targets:
CONTINUITY_BENCHMARK_LOCAL_MODEL=qwen3.5:9b make product-falsification-stage-a
All Stage A arms use the same direct Ollama transport. The harness reads the handoff or retrieves Continuity context through the real MCP stdio server before the fresh call, isolating context quality from client tool-use behavior.
Results are written to continuity-core/examples/product_falsification_results.jsonl
and docs/testing/2026-06-24-product-falsification-results.md. Stage B runs
with fresh Codex and Claude clients only when Stage A passes the pre-registered
rules. A tie or loss keeps Phase 6 blocked and triggers simplification or
repositioning; it is not treated as a Continuity win.
The completed Stage A result was a tie: no context 0/10, strong
HANDOFF.md 10/10, and Continuity 10/10. Stage B was therefore skipped and
Phase 6 runner work remains blocked. See the
dated benchmark report
and sanitized result rows.
The completed IDEA-002 trust-layer handoff benchmark validated the verified handoff wedge, not the durable runner: continuation quality was comparable to a maintained manual handoff, and only Continuity satisfied ledger-backed source verification (see the scoring revision note in the dated results doc).
The first productized verified handoff surface is now available:
- Python:
continuity.handoff.build_verified_handoff(store, project_id=..., task_id=...) - FastAPI:
GET /projects/{project_id}/tasks/{task_id}/verified-handoff - MCP:
export_verified_handoff(project_id, task_id) - Installed CLI:
continuity-handoff --db continuity-core/continuity.db --list-tasks
continuity-handoff \
--db continuity-core/continuity.db \
--project-id your-project \
--task-id your-task \
--out HANDOFF.md
The exporter renders human-readable Markdown with current task state, decisions,
rejected/superseded signals, unresolved conflicts, and traceable ledger_seq /
event_hash source rows. provenance_status is computed from a full ledger
integrity check at export time: a valid chain renders verified with the tip
hash and Merkle root; a tampered ledger renders integrity_failed with the
broken sequence. It preserves the source of truth in the ledger and does not
unblock durable runner infrastructure.
Capability Backtest
The current evidence supports this scoped claim:
Continuity gives AI teams verified handoff, task verification, and persistent memory across agents, sessions, tools, and models.
The capability backtest in
docs/testing/2026-07-01-continuity-capability-backtest.md distinguishes what
Continuity already captures from what the verified handoff markdown currently
renders:
| Capability | Captured Today | Rendered In Verified Handoff Today |
|---|---|---|
| task state | yes | yes |
| agent identity | yes | partial |
| model identity | yes | no |
| session identity | yes | no |
| handoff source | yes | no |
| parent/consumed source chain | yes | no |
| model call/output source | yes | partial |
| validation source | yes | partial |
| memory source | yes | partial |
| tool/action source | partial | partial |
The current gap is projection/rendering, not capture. Continuity should not add new capture logic until a test proves existing ledger data cannot answer the product question. The system does not claim endpoint observability, shadow-agent detection, or automatic capture of tools/actions outside Continuity.
Additional explicit non-claims:
- The ledger proves event order and immutability, not authorship. Actor and agent identity are process/MCP/repo-local attribution, not cryptographic or authenticated identity.
consumed_seqscurrently records the prior context available to an event, not a selective causal proof of what shaped it. Selective grounding exists only for memory (grounded_in_seqs) and linked validations.- Host support means MCP-compatible hosts. Claude Code and Codex paths are proven by isolated tests; other hosts are untested until a dated proof says otherwise.
Model Adapters
The core test suite does not call external model providers. Provider SDKs are optional and imported lazily by their adapters.
- OpenAI/OpenAI-compatible endpoints use
OPENAI_API_KEYand optionalOPENAI_BASE_URL. - Anthropic uses
ANTHROPIC_API_KEY. - Ollama uses local HTTP by default at
http://localhost:11434, includes a request timeout, and raisesAdapterErrorwith contextual failures instead of hanging or leaking low-level urllib errors.
Repository Layout
continuity-core/
continuity/ # ledger, projections, API, MCP, Console, adapters
scripts/ # demos and proof scripts
tests/ # pytest suite
README.md # core package details
docs/
README.md # documentation authority map
strategy/ # current strategic background; roadmap remains authoritative
strategy/archive/ # older strategy/research source material
superpowers/ # historical design specs and implementation plans
testing/ # evidence notes from local and integration tests
examples/ # shareable proof artifacts generated from real ledger data
AGENTS.md
architecture_decisions.md
ROADMAP_AND_HANDOFF.md
Plans
- Roadmap & Handoff is the canonical execution tracker and current Phase 0-7 sequence. Its Execution Checkpoint controls current implementation work.
- Documentation Map explains which docs are authoritative, historical, or evidence-only.
- Master Adversarial Review Prompt can be given to Claude/Kael, the GitHub agent, Codex, or another reviewer.
- Idea Backlog captures ideas without promoting them into active execution.
- Continuity Implementation Plan v3.4 preserves strategic background and the historical phase labels used during its adversarial review.
CI
GitHub Actions runs the Python test suite on pushes and pull requests to
main. If Actions are disabled in repository settings, enable them once; no
manual test initiation is otherwise required.
License
Apache-2.0. See LICENSE and NOTICE.
Notes
- Runtime artifacts such as SQLite databases and exported ledgers are ignored
under
continuity-core/. - The known Starlette/FastAPI
TestClientdeprecation warning is third-party dependency churn and does not indicate a Continuity test failure.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file continuity_mcp-0.1.0.tar.gz.
File metadata
- Download URL: continuity_mcp-0.1.0.tar.gz
- Upload date:
- Size: 48.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
adccbc7a793802df41f01361b57787f1de26e8302d2b76f81f5a9b7baa2d9de4
|
|
| MD5 |
734fdd516e88676638135e778a9940fc
|
|
| BLAKE2b-256 |
e2a4ae5904f4e9635895242f7ab142db170ed0c829972cf6a35bf105b53b03c6
|
File details
Details for the file continuity_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: continuity_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 53.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e80593d7819db4435caab7454f907c94fb947dd7a18200119f0ea927a93c687c
|
|
| MD5 |
721b63c8653e301e3b5b24094af96502
|
|
| BLAKE2b-256 |
d8470e65d631a8f44df18d762f531d7dfda09e5799a1de1411b545c0f474e477
|