Claim Memory Graph: a lightweight memory layer for inspectable LLM judge and reviewer decisions.
Project description
cmg - Claim Memory Graph
cmg is a lightweight memory layer for inspectable long-running LLM-as-a-judge and reviewer-agent workflows. It makes unsupported or sycophantic decision shifts easier to detect and audit.
It is built for applications that ask models to judge, review, rank, triage, or decide over time: LLM-as-a-judge systems, code reviewers, evaluator pipelines, support triage, arbitration flows, and multi-agent review loops.
Instead of treating every answer as an isolated blob of text, cmg records a
small append-only graph of:
| Node | Meaning |
|---|---|
Support |
Evidence supplied by the application: rubric items, logs, diffs, test output, user facts, retrieved documents. |
Commitment |
A concrete claim the model makes and the support it cites. |
Decision |
A verdict that cites active commitments. |
Invalidation |
An explicit retraction that says what changed and which prior claim should be retired. |
The model's prose still passes through unchanged. cmg removes optional hidden
annotation blocks from the user-visible response, persists the graph, and returns
Violation records when a new operation no longer lines up with the active claim
history.
Why this exists
LLM judges and reviewers often give a final verdict without leaving a durable trace of what the verdict depended on. That becomes a problem when the next turn, another reviewer, or a user pushback changes the answer.
cmg is meant to answer practical questions:
- Which evidence did the judge cite before choosing a verdict?
- Which claims were still active when the reviewer approved or rejected?
- Did the model reverse its verdict without saying what changed?
- Did a later verdict silently drop a still-active reason?
- Can we log these transitions and inspect them after an eval run or review?
The goal is not to force a model to be correct. The goal is to make its explicit claims and verdict transitions observable.
What it does not claim
cmg is not a benchmark, a scorer, a policy engine, or a replacement for your
evaluator. It does not prove that a verdict is true, reveal hidden chain of
thought, or block model output by itself.
It gives your application a structured memory layer and deterministic telemetry. Your app decides whether a violation should be logged, shown to a human, used to ask for a corrected retraction, or ignored.
How it fits
flowchart LR
Evidence["Evidence / rubric / task data"] --> App["Your app"]
App --> LLM["LLM judge or reviewer"]
LLM -->|"prose + optional cmg ops"| CMG["cmg layer"]
CMG -->|"visible_text"| App
CMG --> Graph[("Claim Memory Graph")]
Graph -->|"active commitments<br/>last decision"| LLM
CMG -->|"Violation records"| Obs["logs / evals / review UI"]
The layer is deliberately small:
- zero runtime dependencies in the core package;
- JSONL storage by default, with a pluggable storage protocol;
- optional OpenAI and Anthropic helpers;
- async, streaming, and sync wrapper APIs;
- deterministic checks that observe, not block.
Install
pip install claim-memory-graph
# Optional provider helpers:
pip install 'claim-memory-graph[openai]'
pip install 'claim-memory-graph[anthropic]'
The PyPI distribution is claim-memory-graph; the Python import package is
cmg. The core package supports Python 3.10+ and uses only the standard library.
Quickstart
import asyncio
from pathlib import Path
from cmg import ClaimGraph, JsonlStorage, arun_turn, build_annotation_system_prompt
async def main() -> None:
async with ClaimGraph(JsonlStorage(Path("review.cmg.jsonl"))) as graph:
support = (await graph.add_support("Unit test test_total fails after the patch")).node
async def reviewer(messages: list[dict[str, str]]) -> str:
return (
"Request changes: the patch appears to break an existing total calculation.\n"
"```cmg\n"
'{"ops": [{"op": "commitment", '
'"content": "the patch breaks the total calculation", '
f'"refs": ["{support.node_id}"]}]}}\n'
"```"
)
result = await arun_turn(
graph,
reviewer,
[
{"role": "system", "content": build_annotation_system_prompt()},
{
"role": "user",
"content": f"Review the patch. Evidence {support.node_id}: {support.content}",
},
],
)
commitment_ids = [
applied.node.node_id
for applied in result.applied
if applied.node.kind == "commitment"
]
if commitment_ids:
await graph.add_decision("request_changes", commitment_ids)
print(result.visible_text)
print([v.code for v in graph.violations()])
asyncio.run(main())
For a full integration guide, see docs/user-guide.md.
Judge and reviewer workflow
A typical integration uses one graph per conversation, review, eval item, or arbitration case.
- Pre-seed
Supportnodes for facts the model may cite: rubric text, retrieved evidence, code diffs, test output, prior answers, tool results, or human notes. - Prompt the model with
build_annotation_system_prompt()and the relevant support IDs. - Ask the model for short, explicit commitments tied to those support IDs.
- Add or request a
Decisionthat cites active commitment IDs. - Send
Violationrecords to logs, metrics, traces, or a review UI.
Important wire-format detail: refs must point to IDs that already exist in the
graph. If a model creates a new commitment in one annotation block, it cannot cite
that freshly minted k-... ID later in the same block because the SDK creates the
ID during ingest. For verdict workflows, use one of these patterns:
- two-pass judge: first collect commitments, then ask for a decision after state injection exposes the new commitment IDs;
- app-owned decision: parse the model's verdict from your normal response format
and call
graph.add_decision(...)with the commitment IDs that were just applied; - app-owned commitments: create commitments from known structured evidence, then ask the model to choose a decision over those IDs.
Violation signals
The headline signals are:
| Code | Meaning |
|---|---|
verdict_flip_without_invalidation |
A new decision changed verdict while prior commitments remained active. |
silent_commitment_drop |
A later decision kept the same verdict but stopped citing an active prior commitment. |
unknown_ref |
An operation cited an ID that is not in the graph. |
wrong_ref_kind |
A commitment cited a non-support, or a decision cited a non-commitment. |
ref_not_active |
A decision cited a commitment that had already been invalidated. |
empty_refs |
A commitment, decision, or invalidation omitted required refs. |
Every operation is still appended. Violations are observations that make drift and unsupported reversals visible to the application.
Wire format for annotations
The model may annotate a response with either a fenced block:
```cmg
{"ops": [{"op": "commitment", "content": "...", "refs": ["s-..."]}]}
```
or a self-closing tag:
<cmg ops='[{"op":"commitment","content":"...","refs":["s-..."]}]'/>
Both are optional. If the model emits plain prose, the response still passes through and no graph operation is applied.
Storage
JsonlStorage(path) writes one JSON record per line with a schema version on
every record. For production systems, implement the Storage protocol with:
append_node(node);append_violation(violation);iter_records();aclose().
That is enough to back the graph with sqlite, postgres, object storage, or an in-memory test store.
Eval framework adapters
cmg fits into existing eval frameworks as a judge-side diagnostic layer. In
DeepEval, wrap it in a custom BaseMetric; in Inspect AI, use it inside a custom
scorer and store CMG fields in Score.metadata. The eval framework still owns
datasets, pass/fail scoring, aggregation, and reporting. CMG adds per-item graph
logs, cited commitments, parse warnings, and violation codes, making it easier to
debug why a judge selected a verdict or why it changed position under a second
review pass.
Where it is useful
- LLM-as-a-judge pipelines that need an audit trail for verdicts.
- AI code review tools that need to explain why a patch was approved or rejected.
- Multi-reviewer arbitration where each judge should cite evidence.
- Eval harnesses that want to detect capitulation under pushback.
- Support, moderation, or triage agents that should not silently abandon prior claims.
- Scientific or analytical agents that track hypotheses, evidence, and retractions.
License
Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file claim_memory_graph-0.1.0.tar.gz.
File metadata
- Download URL: claim_memory_graph-0.1.0.tar.gz
- Upload date:
- Size: 1.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e559816e5d0b2ee523d8230bd8bfee3882f02da1d7edd53781221d669bc57d71
|
|
| MD5 |
b091e110bb08c65615882226a937b0ea
|
|
| BLAKE2b-256 |
2554e3cf9a7a48330eec275bfd90d5fd31126eeb82cb004efffb4ac07b270dd2
|
Provenance
The following attestation bundles were made for claim_memory_graph-0.1.0.tar.gz:
Publisher:
publish.yml on MatteoLeonesi/claim-memory-graph-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
claim_memory_graph-0.1.0.tar.gz -
Subject digest:
e559816e5d0b2ee523d8230bd8bfee3882f02da1d7edd53781221d669bc57d71 - Sigstore transparency entry: 1640651375
- Sigstore integration time:
-
Permalink:
MatteoLeonesi/claim-memory-graph-sdk@d9288782babbf74922068edaae4e6e8a23b59138 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/MatteoLeonesi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d9288782babbf74922068edaae4e6e8a23b59138 -
Trigger Event:
release
-
Statement type:
File details
Details for the file claim_memory_graph-0.1.0-py3-none-any.whl.
File metadata
- Download URL: claim_memory_graph-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bac189c56a5bcd7782e2c29bb2445f57abb6a1c954777a3a9c03d55f6e1431f1
|
|
| MD5 |
0456a6b201f9e0c6b6df8ad2cff06d62
|
|
| BLAKE2b-256 |
3d5a6283ac17af3bfcb4fa10e1bf26d34356856846db3420d95a81d125421fc9
|
Provenance
The following attestation bundles were made for claim_memory_graph-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on MatteoLeonesi/claim-memory-graph-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
claim_memory_graph-0.1.0-py3-none-any.whl -
Subject digest:
bac189c56a5bcd7782e2c29bb2445f57abb6a1c954777a3a9c03d55f6e1431f1 - Sigstore transparency entry: 1640651438
- Sigstore integration time:
-
Permalink:
MatteoLeonesi/claim-memory-graph-sdk@d9288782babbf74922068edaae4e6e8a23b59138 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/MatteoLeonesi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d9288782babbf74922068edaae4e6e8a23b59138 -
Trigger Event:
release
-
Statement type: