Claim Memory Graph: a lightweight memory layer for inspectable LLM judge and reviewer decisions.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ML0037

These details have not been verified by PyPI

Project description

cmg - Claim Memory Graph

Claim Memory Graph banner

cmg is a lightweight memory layer for inspectable long-running LLM-as-a-judge and reviewer-agent workflows. It makes unsupported or sycophantic decision shifts easier to detect and audit.

It is built for applications that ask models to judge, review, rank, triage, or decide over time: LLM-as-a-judge systems, code reviewers, evaluator pipelines, support triage, arbitration flows, and multi-agent review loops.

Instead of treating every answer as an isolated blob of text, cmg records a small append-only graph of:

Node	Meaning
`Support`	Evidence supplied by the application: rubric items, logs, diffs, test output, user facts, retrieved documents.
`Commitment`	A concrete claim the model makes and the support it cites.
`Decision`	A verdict that cites active commitments.
`Invalidation`	An explicit retraction that says what changed and which prior claim should be retired.

The model's prose still passes through unchanged. cmg removes optional hidden annotation blocks from the user-visible response, persists the graph, and returns Violation records when a new operation no longer lines up with the active claim history.

Why this exists

LLM judges and reviewers often give a final verdict without leaving a durable trace of what the verdict depended on. That becomes a problem when the next turn, another reviewer, or a user pushback changes the answer.

cmg is meant to answer practical questions:

Which evidence did the judge cite before choosing a verdict?
Which claims were still active when the reviewer approved or rejected?
Did the model reverse its verdict without saying what changed?
Did a later verdict silently drop a still-active reason?
Can we log these transitions and inspect them after an eval run or review?

The goal is not to force a model to be correct. The goal is to make its explicit claims and verdict transitions observable.

What it does not claim

cmg is not a benchmark, a scorer, a policy engine, or a replacement for your evaluator. It does not prove that a verdict is true, reveal hidden chain of thought, or block model output by itself.

It gives your application a structured memory layer and deterministic telemetry. Your app decides whether a violation should be logged, shown to a human, used to ask for a corrected retraction, or ignored.

How it fits

flowchart LR
    Evidence["Evidence / rubric / task data"] --> App["Your app"]
    App --> LLM["LLM judge or reviewer"]
    LLM -->|"prose + optional cmg ops"| CMG["cmg layer"]
    CMG -->|"visible_text"| App
    CMG --> Graph[("Claim Memory Graph")]
    Graph -->|"active commitments<br/>last decision"| LLM
    CMG -->|"Violation records"| Obs["logs / evals / review UI"]

The layer is deliberately small:

zero runtime dependencies in the core package;
JSONL storage by default, with a pluggable storage protocol;
optional OpenAI and Anthropic helpers;
async, streaming, and sync wrapper APIs;
deterministic checks that observe, not block.

Install

pip install claim-memory-graph

# Optional provider helpers:
pip install 'claim-memory-graph[openai]'
pip install 'claim-memory-graph[anthropic]'

The PyPI distribution is claim-memory-graph; the Python import package is cmg. The core package supports Python 3.10+ and uses only the standard library.

Quickstart

import asyncio
from pathlib import Path

from cmg import ClaimGraph, JsonlStorage, arun_turn, build_annotation_system_prompt


async def main() -> None:
    async with ClaimGraph(JsonlStorage(Path("review.cmg.jsonl"))) as graph:
        support = (await graph.add_support("Unit test test_total fails after the patch")).node

        async def reviewer(messages: list[dict[str, str]]) -> str:
            return (
                "Request changes: the patch appears to break an existing total calculation.\n"
                "```cmg\n"
                '{"ops": [{"op": "commitment", '
                '"content": "the patch breaks the total calculation", '
                f'"refs": ["{support.node_id}"]}]}}\n'
                "```"
            )

        result = await arun_turn(
            graph,
            reviewer,
            [
                {"role": "system", "content": build_annotation_system_prompt()},
                {
                    "role": "user",
                    "content": f"Review the patch. Evidence {support.node_id}: {support.content}",
                },
            ],
        )

        commitment_ids = [
            applied.node.node_id
            for applied in result.applied
            if applied.node.kind == "commitment"
        ]
        if commitment_ids:
            await graph.add_decision("request_changes", commitment_ids)

        print(result.visible_text)
        print([v.code for v in graph.violations()])


asyncio.run(main())

For a full integration guide, see docs/user-guide.md.

Judge and reviewer workflow

A typical integration uses one graph per conversation, review, eval item, or arbitration case.

Pre-seed Support nodes for facts the model may cite: rubric text, retrieved evidence, code diffs, test output, prior answers, tool results, or human notes.
Prompt the model with build_annotation_system_prompt() and the relevant support IDs.
Ask the model for short, explicit commitments tied to those support IDs.
Add or request a Decision that cites active commitment IDs.
Send Violation records to logs, metrics, traces, or a review UI.

Important wire-format detail: refs must point to IDs that already exist in the graph. If a model creates a new commitment in one annotation block, it cannot cite that freshly minted k-... ID later in the same block because the SDK creates the ID during ingest. For verdict workflows, use one of these patterns:

two-pass judge: first collect commitments, then ask for a decision after state injection exposes the new commitment IDs;
app-owned decision: parse the model's verdict from your normal response format and call graph.add_decision(...) with the commitment IDs that were just applied;
app-owned commitments: create commitments from known structured evidence, then ask the model to choose a decision over those IDs.

Violation signals

The headline signals are:

Code	Meaning
`verdict_flip_without_invalidation`	A new decision changed verdict while prior commitments remained active.
`silent_commitment_drop`	A later decision kept the same verdict but stopped citing an active prior commitment.
`unknown_ref`	An operation cited an ID that is not in the graph.
`wrong_ref_kind`	A commitment cited a non-support, or a decision cited a non-commitment.
`ref_not_active`	A decision cited a commitment that had already been invalidated.
`empty_refs`	A commitment, decision, or invalidation omitted required refs.

Every operation is still appended. Violations are observations that make drift and unsupported reversals visible to the application.

Wire format for annotations

The model may annotate a response with either a fenced block:

```cmg
{"ops": [{"op": "commitment", "content": "...", "refs": ["s-..."]}]}
```

or a self-closing tag:

<cmg ops='[{"op":"commitment","content":"...","refs":["s-..."]}]'/>

Both are optional. If the model emits plain prose, the response still passes through and no graph operation is applied.

Storage

JsonlStorage(path) writes one JSON record per line with a schema version on every record. For production systems, implement the Storage protocol with:

append_node(node);
append_violation(violation);
iter_records();
aclose().

That is enough to back the graph with sqlite, postgres, object storage, or an in-memory test store.

Eval framework adapters

cmg fits into existing eval frameworks as a judge-side diagnostic layer. In DeepEval, wrap it in a custom BaseMetric; in Inspect AI, use it inside a custom scorer and store CMG fields in Score.metadata. The eval framework still owns datasets, pass/fail scoring, aggregation, and reporting. CMG adds per-item graph logs, cited commitments, parse warnings, and violation codes, making it easier to debug why a judge selected a verdict or why it changed position under a second review pass.

Where it is useful

LLM-as-a-judge pipelines that need an audit trail for verdicts.
AI code review tools that need to explain why a patch was approved or rejected.
Multi-reviewer arbitration where each judge should cite evidence.
Eval harnesses that want to detect capitulation under pushback.
Support, moderation, or triage agents that should not silently abandon prior claims.
Scientific or analytical agents that track hypotheses, evidence, and retractions.

License

Apache-2.0.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ML0037

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Jun 13, 2026

This version

0.1.0

May 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claim_memory_graph-0.1.0.tar.gz (1.8 MB view details)

Uploaded May 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

claim_memory_graph-0.1.0-py3-none-any.whl (21.6 kB view details)

Uploaded May 27, 2026 Python 3

File details

Details for the file claim_memory_graph-0.1.0.tar.gz.

File metadata

Download URL: claim_memory_graph-0.1.0.tar.gz
Upload date: May 27, 2026
Size: 1.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for claim_memory_graph-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e559816e5d0b2ee523d8230bd8bfee3882f02da1d7edd53781221d669bc57d71`
MD5	`b091e110bb08c65615882226a937b0ea`
BLAKE2b-256	`2554e3cf9a7a48330eec275bfd90d5fd31126eeb82cb004efffb4ac07b270dd2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for claim_memory_graph-0.1.0.tar.gz:

Publisher: publish.yml on MatteoLeonesi/claim-memory-graph-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: claim_memory_graph-0.1.0.tar.gz
- Subject digest: e559816e5d0b2ee523d8230bd8bfee3882f02da1d7edd53781221d669bc57d71
- Sigstore transparency entry: 1640651375
- Sigstore integration time: May 27, 2026
Source repository:
- Permalink: MatteoLeonesi/claim-memory-graph-sdk@d9288782babbf74922068edaae4e6e8a23b59138
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/MatteoLeonesi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d9288782babbf74922068edaae4e6e8a23b59138
- Trigger Event: release

File details

Details for the file claim_memory_graph-0.1.0-py3-none-any.whl.

File metadata

Download URL: claim_memory_graph-0.1.0-py3-none-any.whl
Upload date: May 27, 2026
Size: 21.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for claim_memory_graph-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bac189c56a5bcd7782e2c29bb2445f57abb6a1c954777a3a9c03d55f6e1431f1`
MD5	`0456a6b201f9e0c6b6df8ad2cff06d62`
BLAKE2b-256	`3d5a6283ac17af3bfcb4fa10e1bf26d34356856846db3420d95a81d125421fc9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for claim_memory_graph-0.1.0-py3-none-any.whl:

Publisher: publish.yml on MatteoLeonesi/claim-memory-graph-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: claim_memory_graph-0.1.0-py3-none-any.whl
- Subject digest: bac189c56a5bcd7782e2c29bb2445f57abb6a1c954777a3a9c03d55f6e1431f1
- Sigstore transparency entry: 1640651438
- Sigstore integration time: May 27, 2026
Source repository:
- Permalink: MatteoLeonesi/claim-memory-graph-sdk@d9288782babbf74922068edaae4e6e8a23b59138
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/MatteoLeonesi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d9288782babbf74922068edaae4e6e8a23b59138
- Trigger Event: release

claim-memory-graph 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

cmg - Claim Memory Graph

cmg is a lightweight memory layer for inspectable long-running LLM-as-a-judge and reviewer-agent workflows. It makes unsupported or sycophantic decision shifts easier to detect and audit.

Why this exists

What it does not claim

How it fits

Install

Quickstart

Judge and reviewer workflow

Violation signals

Wire format for annotations

Storage

Eval framework adapters

Where it is useful

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance