Skip to main content

Local-first reliability ledger for AI coding-agent work

Project description

Chimera Memory

Local-first reliability ledger for AI coding-agent work.

Chimera Memory records what an agent tried, which command checked it, what happened, and what receipt proves it. It runs entirely on your machine.


What it records

Each wrapped verification command produces a claim with:

  • session_id — which work session it belongs to
  • agent_id / model_version / harness_id — who ran it and in what tool
  • task_type — what kind of work (test, lint, type, docs, …)
  • VALIDATED or CONTRADICTED — did reality agree with the prediction?
  • stdout_excerpt / stderr_excerpt — bounded output witness when available
  • git state at time of claim

Sessions, claims, and outcomes are stored in .chimera-memory/ as append-only JSONL files. Each new claim gets an integrity chain entry in .chimera-memory/integrity.jsonl.


Getting started

See docs/strategy/chimera-memory-first-run-quickstart.md for a step-by-step guide.

Key limitations (v0.6):

  • M2B drift scoring: not built
  • Model ranking or routing: not built
  • Hosted/cloud sync: not built
  • Evidence write import: dry-run only
  • Windows: not tested

Run from the repo root:

uv run chimera-memory --help

Standalone local install (v0.6+)

As of v0.6, local wheel builds work outside the monorepo. Build and install:

# From the repo root, build local wheels for the two public packages
uv build packages/chimera-memory-types --out-dir /tmp/cm-dist
uv build packages/chimera-memory       --out-dir /tmp/cm-dist

# In any Python 3.12+ environment
pip install /tmp/cm-dist/*.whl
chimera-memory --help

Runtime dependencies installed automatically: pydantic, filelock.

Note: Public PyPI publishing has not happened yet. This is local packaging readiness only. Hosted/cloud sync, team SaaS, and remote substrate writes are not implemented. Reliability is not model routing. M2B is not implemented.


Quickstart

# 1. Start a session
uv run chimera-memory session start \
  --branch feat/my-branch \
  --task-label "fix-type-errors" \
  --agent kiro \
  --model claude-sonnet-4.6 \
  --harness-id kiro-cli

# 2. Wrap real verification commands
uv run chimera-memory wrap --task-type type -- \
  uv run mypy packages/chimera-memory/src --ignore-missing-imports
uv run chimera-memory wrap --task-type test -- \
  uv run pytest packages/chimera-memory/tests -q
uv run chimera-memory wrap --task-type lint -- \
  uv run ruff check packages/chimera-memory

uv run chimera-memory session end --status PASSED

# 3. Review
uv run chimera-memory failures          # see what failed, with witness output
uv run chimera-memory status            # dogfood gate progress
uv run chimera-memory receipt latest --markdown   # share-ready proof artifact

The -- separator is required before non-pytest commands to tell the argument parser where the wrapped command begins.


Common commands

chimera-memory session start   --branch ... --task-label ... --agent ... --model ... --harness-id ...
chimera-memory wrap            --task-type <type> [-- <command>]
chimera-memory session end     --status PASSED|FAILED|MIXED|INTERRUPTED
chimera-memory failures        [--json]
chimera-memory status          [--json]
chimera-memory verify          [--json]
chimera-memory receipt latest  [--json] [--markdown]
chimera-memory receipt show    <session_id> [--json] [--markdown]
chimera-memory export          --clean-only --output <path>
chimera-memory session list
chimera-memory report          # raw reliability groups (use status for gate progress)

Demo: real failure-fix loop

During development, mypy found a real type error:

chimera_memory/cli.py:164: error: Value of type "object" is not indexable  [index]

Chimera Memory recorded the mypy run as CONTRADICTED, with the error stored as stdout_excerpt. The code was fixed. The same mypy command ran again and settled as VALIDATED. A single session receipt showed both runs.

uv run mypy ...  [type] → CONTRADICTED
uv run mypy ...  [type] → VALIDATED

This is the core loop: reality contradicted the agent, the fix was applied, and the correction was verified — all in one session, with attribution.


Reliability command

chimera-memory reliability
chimera-memory reliability --json

Read-only. Reports raw validation rates from settled clean claims, grouped by agent/model/task. Does not rank models, route work, or make autonomy decisions. Failure-quality classification (organic vs synthetic) is not yet stored in claim metadata — rates include all CONTRADICTED outcomes.


Bridge pipeline (dry-run only)

Export clean evidence events and preview engine ingestion without any writes:

# Export settled, attributed claims as JSONL
uv run chimera-memory export --clean-only --output /tmp/cm-clean-events.jsonl

# Validate the export schema
uv run python -m tools.chimera_memory_ingest_dry_run /tmp/cm-clean-events.jsonl

# Map to engine evidence candidates (no database/substrate writes)
uv run python -m tools.chimera_memory_engine_adapter_dry_run /tmp/cm-clean-events.jsonl

Both bridge tools are dry-run only. writes_performed is always false.


Two-model evidence (local v0.2)

As of v0.2, the store contains evidence from two real AI coding agents:

Agent Model Claims Real failures
kiro claude-sonnet-4.6 123 2 organic
codebuff mimo-v2.5-pro 12 2 real

manual and planning-agent entries also exist for workflow/planning tasks.

M2 comparative reliability scoring is not yet built. The data is structurally ready for it once codebuff accumulates ≥25 claims with ≥5 organic failures.


Integrity

New claims are hash-chained into .chimera-memory/integrity.jsonl. Run:

uv run chimera-memory verify
uv run chimera-memory verify --json

Historical records created before the integrity layer was added are reported as LEGACY_UNSIGNED. This is honest — they are not broken, just unchained. verify reports BROKEN only if a chained record's hash doesn't match or a claim was appended without a corresponding integrity entry.

Single-process assumption: the store is designed for single-process local use. Truly concurrent writes from separate processes can race; this is not hardened against. Local CLI use is always single-process.


Known invocation gotcha

The mypy task type wrap must be invoked with only the adapter tool (not the importer) to avoid a "source file found twice" error:

# Correct — adapter transitively pulls in importer
uv run chimera-memory wrap --task-type type -- \
  uv run mypy packages/chimera-memory/src \
    tools/chimera_memory_engine_adapter_dry_run.py \
    --ignore-missing-imports

# Incorrect — causes "source file found twice" mypy error
uv run chimera-memory wrap --task-type type -- \
  uv run mypy packages/chimera-memory/src \
    tools/chimera_memory_ingest_dry_run.py \
    tools/chimera_memory_engine_adapter_dry_run.py \
    --ignore-missing-imports

Release status

  • Local wheel proof exists. The 2-package install (chimera-memory + chimera-memory-types) works in a fresh venv outside the monorepo.
  • Public PyPI: not published. TestPyPI and private registry publishing have not been done.
  • License decision pending. No open-source license has been formally assigned yet.
  • Witness output is redacted by default. chimera-memory wrap redacts secrets (API keys, tokens, passwords, private keys) from captured stdout/stderr before storing. Review receipts before sharing.
  • Command argument redaction. The command string in receipts is also redacted for common secret patterns. However, do not pass literal secret values as command arguments — use environment variables instead (e.g. MY_TOKEN=secret uv run mypy ...).

Platform support

  • macOS and Linux are the target platforms. Tested on macOS arm64.
  • Windows is not supported. Write-path locking uses filelock (cross-platform), but Windows has not been tested end-to-end.
  • Linux Docker smoke test is pending (Docker unavailable in current environment).

What is not built

The following are explicitly out of scope for v0.6:

  • M2B statistical drift/trend analysis (advisory heuristic only)
  • Routing, autonomy decisions, or model ranking
  • Cloud/hosted sync or team sharing
  • Dashboard, GitHub Action, or CI PR comments
  • ORIAS, trading/finance verticals
  • GraphSource/substrate writes

Current limitations

  • M2 drift/model comparison is not fully built. An advisory drift heuristic exists (chimera-memory drift), but statistical M2B drift/trend analysis is not built. status shows segment counts, not trends.
  • Useful reliability patterns require real failure variance. A store of only VALIDATED claims proves capture works, not that any agent is reliable.
  • The workflow is CLI/manual, not always-on. You must start sessions and wrap commands explicitly.
  • Local-first. Nothing leaves your machine. Data lives in .chimera-memory/ inside the repo.
  • Synthetic failures should not be treated as product evidence. Only naturally occurring failures count.
  • Single-process writes only. The integrity chain is not safe for concurrent multi-process writes.

More detail

See the full demo quickstart with the failure-fix loop walkthrough: ../../docs/strategy/chimera-memory-demo-quickstart-2026-06-04.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chimera_memory-0.1.0.tar.gz (121.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chimera_memory-0.1.0-py3-none-any.whl (76.8 kB view details)

Uploaded Python 3

File details

Details for the file chimera_memory-0.1.0.tar.gz.

File metadata

  • Download URL: chimera_memory-0.1.0.tar.gz
  • Upload date:
  • Size: 121.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for chimera_memory-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d187a7338baeef526af85517ee1d1b811453efa034cd2f762d9df5ce4952534a
MD5 10dd04f3de15b7b446e5baaef410ee4d
BLAKE2b-256 5595489582b5edbab386a7790333ed4e9e127a063b6edad422ea1d6b44244bfc

See more details on using hashes here.

File details

Details for the file chimera_memory-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: chimera_memory-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 76.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for chimera_memory-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6d26290f43d3897ff00476721b8348faaba7f80010d78033e2ef7c02918518e9
MD5 84176d9d19aaa900d11c914c26685442
BLAKE2b-256 a6b5476d47fa0e5669575a81f1bba53b23583f75882fff4e5fbe23c3e564eb44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page