Skip to main content

Agent memory with receipts: an MCP server over an append-only, hash-chained ledger for task state, memory, and verified handoffs across sessions and models.

Project description

Continuity

Continuity is the system of record for agent work: an agent continuity layer that captures task state, memory, decisions, provenance, and handoff context across models, sessions, and tools.

Current product premise: Continuity is not "better memory than a Markdown file" for simple, sequential work. A strong, maintained HANDOFF.md matched Continuity in the Stage A product-falsification benchmark. Continuity's sharper claim is that it is a trust layer for AI work: it turns messy agent history into resumable, inspectable, permissioned operational state, and can compile that state into a compact human-readable handoff when a plain file is the simplest interface.

The event log is the product. Task context, project memory, agent memory, workflow state, gates, timeline rows, and the Console are projections of one append-only, hash-chained ledger.

Every event can carry optional payload-level provenance and usage telemetry: task/session identity, agent/model identity, parent and consumed ledger sequence numbers, handoff source, token counts, usage source, and exact microdollar cost. This metadata stays in the payload layer so the immutable chain remains stable while provenance evolves.

The primary continuation path is structured context compilation: a task-chain projection that returns current task state, ordered chronology, decisions, unresolved conflict signals, and provenance chain. It is available through the Python helper, FastAPI at /projects/{project_id}/tasks/{task_id}/continuation, and MCP as compile_task_context.

Operational conflicts are first-class ledger events. When two agents produce contradictory task assessments, Continuity records an OPERATIONAL_CONFLICT that links the exact event sequences in disagreement. The conflict remains in compiled context, FastAPI at /projects/{project_id}/tasks/{task_id}/conflicts, and MCP until a human records an OPERATIONAL_CONFLICT_RESOLVED event.

Product Falsification Checkpoint

The Phase 6 entry gate compared three continuation paths: no shared context, a strong manually maintained HANDOFF.md, and Continuity via MCP. The 30-run local matrix produced no-context 0/10, strong HANDOFF.md 10/10, and Continuity 10/10, so the result was a tie and durable runner work remains blocked. See docs/testing/2026-06-24-product-falsification-results.md.

The next product-validation target is not another runner feature. It is a Continuity-backed handoff workflow: use the ledger, memory, provenance, validation, and conflict model to generate or verify a compact HANDOFF.md, then test whether that beats a manual handoff under concurrency, stale-context recovery, permissioned context, or audit requirements.

The follow-up IDEA-002 benchmark tested that narrower claim:

CONTINUITY_BENCHMARK_LOCAL_MODEL=qwen3.5:9b make trust-layer-handoff-benchmark

The completed 27-run local matrix produced no-context 0/9, manual handoff 6/9, and Continuity-backed verified handoff 9/9 under the original scoring. That scoring could not credit the manual arm in provenance-audit scenarios even when it honestly reported unverified; scoring was revised on 2026-07-02 to score provenance honesty equally and track ledger-backed verification as a separate capability. The honest claim: a well-maintained manual handoff preserves task facts; Continuity uniquely provides ledger-backed source verification, which a manual handoff is structurally unable to provide. See docs/testing/2026-06-24-trust-layer-handoff-results.md.

Quick Start

As a user (installs the continuity-mcp MCP server command and continuity-handoff receipt exporter):

pip install "git+https://github.com/machinedigital-ai/Continuity.git"
claude mcp add continuity --env CONTINUITY_DB="$HOME/continuity.db" -- continuity-mcp

As a developer (from a checkout):

make setup
make test
make serve

New here? The tutorial walks through the full loop in ~15 minutes: connect an agent, record work, resume in a fresh session or a different model, export the verified receipt, and read the receipt fields.

Then open the read-only Console:

http://127.0.0.1:8000/console

Common Commands

make setup   # create/update continuity-core/.venv and install requirements
make test    # run the test suite
make serve   # run FastAPI at http://127.0.0.1:8000
make mcp     # run the MCP server
make demo    # run the recursive proof demo
make proof   # export and verify the multi-model proof artifact
make ollama-chain # run the local multi-model Ollama chain proof
make codex-persistence-proof  # run two isolated Codex sessions through Continuity
make claude-persistence-proof # run two isolated Claude sessions through Continuity
make cross-agent-persistence-proof # run the Codex-to-Claude handoff proof
make trust-layer-handoff-benchmark # run the IDEA-002 verified handoff benchmark
make list-verified-handoff-tasks   # list exportable task IDs from a ledger
make export-verified-handoff       # export verified HANDOFF.md from a task ledger

Proof Artifact

The completed Phase 4 Trust and Proof work exports a shareable Continuity proof artifact from real ledger events:

make proof

The command writes examples/multi_model_code_review.jsonl, a permissioned internal artifact showing a real multi-agent review sequence: external review, Codex triage, operational conflict, human resolution, gate approval, selected timeline, continuation context, provenance, and ledger integrity.

The artifact intentionally exports sanitized summary payloads. It retains original ledger hashes and verifies exported chain-entry metadata, but it is not yet a standalone public notary proof for omitted ledger events.

Local Model Proofs

The local Ollama chain proof tests continuity across installed local models:

make ollama-chain

The dated finding and corrected targeted rerun are recorded in docs/testing/2026-06-19-local-ollama-chain-proof.md. Across three corrected runs, Continuity achieved 18/18 context-fidelity assertions, 9/9 provenance assertions, 9/9 exact model outputs, and 3/3 valid ledgers using independent Qwen, Gemma, and Ministral model families.

Persistent Agent Proofs

The same-agent proof harness starts two fresh client processes connected only through a dedicated Continuity SQLite ledger. It creates a random challenge after Session A exits, gives Session B only stable project/task/agent IDs, then verifies the output, linked validation, completion turn, ledger integrity, and sanitized proof artifact from recorded events.

make codex-persistence-proof
make claude-persistence-proof
make cross-agent-persistence-proof

Codex runs ephemerally with native memories disabled. Claude runs without session persistence and with only the explicit Continuity MCP configuration. See the persistent-agent proof runbook for exact boundaries, authentication checks, and troubleshooting. The cross-agent mode stores its post-Codex challenge in shared project memory and requires Claude's output and completion turn to carry Codex handoff provenance.

Product Falsification Gate

Before Phase 6 runner work, Continuity is compared against no context and a strong structured HANDOFF.md. Stage A runs 30 fresh local targets:

CONTINUITY_BENCHMARK_LOCAL_MODEL=qwen3.5:9b make product-falsification-stage-a

All Stage A arms use the same direct Ollama transport. The harness reads the handoff or retrieves Continuity context through the real MCP stdio server before the fresh call, isolating context quality from client tool-use behavior.

Results are written to continuity-core/examples/product_falsification_results.jsonl and docs/testing/2026-06-24-product-falsification-results.md. Stage B runs with fresh Codex and Claude clients only when Stage A passes the pre-registered rules. A tie or loss keeps Phase 6 blocked and triggers simplification or repositioning; it is not treated as a Continuity win.

The completed Stage A result was a tie: no context 0/10, strong HANDOFF.md 10/10, and Continuity 10/10. Stage B was therefore skipped and Phase 6 runner work remains blocked. See the dated benchmark report and sanitized result rows.

The completed IDEA-002 trust-layer handoff benchmark validated the verified handoff wedge, not the durable runner: continuation quality was comparable to a maintained manual handoff, and only Continuity satisfied ledger-backed source verification (see the scoring revision note in the dated results doc).

The first productized verified handoff surface is now available:

  • Python: continuity.handoff.build_verified_handoff(store, project_id=..., task_id=...)
  • FastAPI: GET /projects/{project_id}/tasks/{task_id}/verified-handoff
  • MCP: export_verified_handoff(project_id, task_id)
  • Installed CLI:
continuity-handoff --db continuity-core/continuity.db --list-tasks

continuity-handoff \
  --db continuity-core/continuity.db \
  --project-id your-project \
  --task-id your-task \
  --out HANDOFF.md

The exporter renders human-readable Markdown with current task state, decisions, rejected/superseded signals, unresolved conflicts, and traceable ledger_seq / event_hash source rows. provenance_status is computed from a full ledger integrity check at export time: a valid chain renders verified with the tip hash and Merkle root; a tampered ledger renders integrity_failed with the broken sequence. It preserves the source of truth in the ledger and does not unblock durable runner infrastructure.

Capability Backtest

The current evidence supports this scoped claim:

Continuity gives AI teams verified handoff, task verification, and persistent memory across agents, sessions, tools, and models.

The capability backtest in docs/testing/2026-07-01-continuity-capability-backtest.md distinguishes what Continuity already captures from what the verified handoff markdown currently renders:

Capability Captured Today Rendered In Verified Handoff Today
task state yes yes
agent identity yes partial
model identity yes no
session identity yes no
handoff source yes no
parent/consumed source chain yes no
model call/output source yes partial
validation source yes partial
memory source yes partial
tool/action source partial partial

The current gap is projection/rendering, not capture. Continuity should not add new capture logic until a test proves existing ledger data cannot answer the product question. The system does not claim endpoint observability, shadow-agent detection, or automatic capture of tools/actions outside Continuity.

Additional explicit non-claims:

  • The ledger proves event order and immutability, not authorship. Actor and agent identity are process/MCP/repo-local attribution, not cryptographic or authenticated identity.
  • consumed_seqs currently records the prior context available to an event, not a selective causal proof of what shaped it. Selective grounding exists only for memory (grounded_in_seqs) and linked validations.
  • Host support means MCP-compatible hosts. Claude Code and Codex paths are proven by isolated tests; other hosts are untested until a dated proof says otherwise.

Model Adapters

The core test suite does not call external model providers. Provider SDKs are optional and imported lazily by their adapters.

  • OpenAI/OpenAI-compatible endpoints use OPENAI_API_KEY and optional OPENAI_BASE_URL.
  • Anthropic uses ANTHROPIC_API_KEY.
  • Ollama uses local HTTP by default at http://localhost:11434, includes a request timeout, and raises AdapterError with contextual failures instead of hanging or leaking low-level urllib errors.

Repository Layout

continuity-core/
  continuity/       # ledger, projections, API, MCP, Console, adapters
  scripts/          # demos and proof scripts
  tests/            # pytest suite
  README.md         # core package details

docs/
  README.md         # documentation authority map
  strategy/         # current strategic background; roadmap remains authoritative
  strategy/archive/ # older strategy/research source material
  superpowers/      # historical design specs and implementation plans
  testing/           # evidence notes from local and integration tests

examples/           # shareable proof artifacts generated from real ledger data

AGENTS.md
architecture_decisions.md
ROADMAP_AND_HANDOFF.md

Plans

CI

GitHub Actions runs the Python test suite on pushes and pull requests to main. If Actions are disabled in repository settings, enable them once; no manual test initiation is otherwise required.

License

Apache-2.0. See LICENSE and NOTICE.

Notes

  • Runtime artifacts such as SQLite databases and exported ledgers are ignored under continuity-core/.
  • The known Starlette/FastAPI TestClient deprecation warning is third-party dependency churn and does not indicate a Continuity test failure.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

continuity_mcp-0.1.0.tar.gz (48.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

continuity_mcp-0.1.0-py3-none-any.whl (53.5 kB view details)

Uploaded Python 3

File details

Details for the file continuity_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: continuity_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 48.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for continuity_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 adccbc7a793802df41f01361b57787f1de26e8302d2b76f81f5a9b7baa2d9de4
MD5 734fdd516e88676638135e778a9940fc
BLAKE2b-256 e2a4ae5904f4e9635895242f7ab142db170ed0c829972cf6a35bf105b53b03c6

See more details on using hashes here.

File details

Details for the file continuity_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: continuity_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 53.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for continuity_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e80593d7819db4435caab7454f907c94fb947dd7a18200119f0ea927a93c687c
MD5 721b63c8653e301e3b5b24094af96502
BLAKE2b-256 d8470e65d631a8f44df18d762f531d7dfda09e5799a1de1411b545c0f474e477

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page