Token optimization layer for multi-agent LangGraph systems — 37–69% token savings via MESI cache coherence

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

agent-coherence

CCSStore is a drop-in token optimization layer for multi-agent LangGraph systems. It cuts shared-artifact token costs by 47–74% on realistic workloads — via MESI cache coherence, one import change.

pip install "agent-coherence[langgraph]"

# Before
from langgraph.store.memory import InMemoryStore
store = InMemoryStore()

# After — one import change, no other code changes
from ccs.adapters import CCSStore
store = CCSStore(strategy="lazy")

$ python -m examples.langgraph_planner.main

Example: 4-agent planning pipeline

  planner: wrote plan
  researcher: read plan 4×
  executor: read plan 4×
  reviewer: read plan 4×

  CCSStore Benchmark Summary
  ──────────────────────────────────────
  Baseline tokens (no cache):      1476
  CCSStore tokens:                  378
  Tokens saved:                    1098
  Token reduction:                74.4%
  Cache hit rate:                75.0%  (12 get ops)

Saving 1,098 tokens at $3/MTok = $0.003 per run. At 1,000 runs/day: $3/day on a ~120-token plan artifact.

Baseline: tokens you would pay if every agent re-read every shared artifact from scratch — equivalent to a graph without cross-agent caching. This is what InMemoryStore effectively does.

Savings scale with artifact size. On a codebase-review workload with 2 KB source files and 3 reviewers: 16,820 tokens saved, $50/day at 1,000 runs. See examples/shared_codebase/.

📄 Paper on arXiv (2603.15183) — formal protocol, TLA+ verification, simulation results
📊 Real benchmarks — measured on actual LangGraph graphs
🔧 User guide — strategies, telemetry, examples

How it works

When multiple agents share working context — a plan, a codebase, a research document — most orchestration frameworks rebroadcast the full artifact to every agent on every step. On workloads with four or more agents and non-trivial artifacts, synchronization tokens dominate total cost.

CCSStore solves this with the same approach multi-core CPUs have used since 1984: MESI cache coherence. Each shared artifact sits in one of four states per agent — Modified, Exclusive, Shared, or Invalid. Agents read from local cache when valid; only invalid cache entries trigger a network fetch.

Reads hit the local cache at zero token cost when the artifact hasn't changed.
Writes commit to a coordinator, which sends lightweight invalidation signals (~12 tokens) to peers instead of rebroadcasting the full artifact.
Consistency is single-writer-multiple-reader per artifact, with bounded staleness — peers re-fetch on next read.

Five synchronization strategies ship out of the box: lazy (default), eager, lease (TTL-based), access_count, and broadcast.

Quick start

Namespace convention: namespace[0] is the agent identity; namespace[1:] is the artifact scope. Two agents writing to ("planner", "shared") and ("reviewer", "shared") address the same artifact.

Inline benchmark mode — measure token savings on your own workload without any external tooling:

store = CCSStore(strategy="lazy", benchmark=True)
# ... run your graph ...
store.print_benchmark_summary()

Observability — pass on_metric to receive per-operation events:

from ccs.adapters import CCSStore, StoreMetricEvent

events = []
store = CCSStore(strategy="lazy", on_metric=events.append)
# each StoreMetricEvent carries: operation, cache_hit, tokens_consumed, tokens_saved_estimate, tick

Telemetry — export to OpenTelemetry or LangSmith with one parameter:

store = CCSStore(strategy="lazy", telemetry="opentelemetry")
store = CCSStore(strategy="lazy", telemetry="langsmith")

Graceful degradation — fall back to a plain dict instead of raising on coherence errors:

store = CCSStore(strategy="lazy", on_error="degrade")
# first degradation emits CoherenceDegradedWarning; store.is_degraded returns True after

See docs/ccsstore.md for the full guide: namespace convention, strategies, observability, telemetry, graceful degradation, examples, and API reference.

Low-level adapter API

For CrewAI, AutoGen, or custom integrations, use the before_node / commit_outputs surface directly:

from ccs.adapters.langgraph import LangGraphAdapter

adapter = LangGraphAdapter(strategy_name="lazy")
for name in ("planner", "researcher", "executor"):
    adapter.register_agent(name)
plan = adapter.register_artifact(name="plan.md", content="v1")

context = adapter.before_node(agent_name="planner", artifact_ids=[plan.id], now_tick=1)
adapter.commit_outputs(
    agent_name="planner",
    writes={plan.id: context[plan.id]["content"] + "\nStep 1"},
    now_tick=2,
)

Full example: examples/multi_agent_planning.py.

Running the examples

python -m examples.langgraph_planner.main  # 4-agent planning, 74% savings, benchmark output
python -m examples.shared_codebase.main    # 4-agent code review, 16,820 tokens saved, $50/day
python -m examples.code_review.main        # 3-agent, SHARED state demo
python -m examples.research_pipeline.main  # 4-agent, 3 artifacts, 60% hit rate

Real-workload benchmarks

Measured on real LangGraph StateGraph executions using GenericFakeChatModel with no live LLM API calls, so the results are reproducible in CI. Run them yourself:

python benchmarks/langgraph_real/bench_planner.py
python benchmarks/langgraph_real/bench_code_review.py
python benchmarks/langgraph_real/bench_high_churn.py

Workload	Agents	Reads:Writes	Hit rate	Baseline tokens	CCSStore tokens	Savings
Planning (read-heavy)	4	12:1	75%	4,160	1,301	69%
Code review (moderate)	3	8:3	60%	5,320	2,835	47%
High-churn (write-heavy)	4	8:4	50%	3,250	2,317	29%

How to read these numbers

Savings scale with read/write ratio. Every write triggers invalidation, which forces the next read to be a miss.

Read-heavy workloads (planners, reviewers, summarizers, retrievers): 60–70% savings.
Mixed workloads: 40–55% savings.
Write-heavy workloads: 25–35% savings.

Where the paper's 84–95% figures come from

The arXiv paper reports 84–95% reduction in simulation under controlled assumptions: sparse reads, high steps-per-artifact ratios, and low artifact volatility. Those numbers represent the protocol's theoretical ceiling.

The real-workload numbers above represent what teams see on real LangGraph graphs today. Both are honest measurements of different things:

	Simulation (paper)	Real LangGraph (this repo)
What's measured	Protocol-only token cost	Full graph execution token cost
Workload	Synthetic, controlled volatility	Realistic agent patterns
Best case	95% (Planning)	69% (Planning)
Worst case	84% (High-churn)	29% (High-churn)

If you want to reproduce the simulation results from the paper, see REPRODUCE.md.

What this means for adoption

If your multi-agent workload has a read/write ratio above roughly 3:1 — which most planning, research, review, and analysis pipelines do — expect 50–70% savings in production. If your workload is write-heavy, expect 25–35%. Either way, the integration is a one-line import change.

What CCSStore is — and isn't

CCSStore is:

A drop-in BaseStore replacement for LangGraph
A token optimization layer for multi-agent workloads built on MESI cache coherence
A way to detect stale-read bugs that trace-only tools can't see
Built on a TLA+-verified protocol

CCSStore is not:

A prompt compiler
A replacement for LangSmith or Braintrust
A guaranteed 95% savings tool
A general-purpose key-value store

Architecture

agent-coherence is structured as four composable layers:

Protocol (ccs.core, ccs.strategies) — MESI state machine and synchronization strategies. No framework dependencies.
Coordinator (ccs.coordinator) — Authority service tracking directory state and publishing invalidations. Runs in-process or out-of-process.
Event bus (ccs.bus) — Pluggable transport for invalidation signals. Ships with an in-memory bus; production deployments can swap in Redis, Kafka, NATS, or gRPC streams.
Adapters (ccs.adapters) — Framework integrations for LangGraph, CrewAI, and AutoGen. Each ~100 lines; adding a new framework is straightforward.

Each layer is independently useful and independently replaceable.

Guarantees

The protocol is specified in TLA+ and model-checked with TLC. Verified properties:

Safety — Single-writer-multiple-reader per artifact and monotonic versions
Token Coherence Theorem — Lower bound on savings vs. broadcast for any workload with write probability < 1
Liveness — Every invalidated cache eventually reaches a valid state

See Section 6 of the paper for the formal model and proof details.

Why not just...

...use mem0 or Letta? Retrieval-based memory does not solve concurrency. Two agents retrieving the same artifact and writing independently will clobber each other. agent-coherence is a coherence protocol, not a memory store — it composes with any retrieval backend.

...use LangGraph's BaseStore? BaseStore provides persistence, not concurrency safety. The LangGraph docs explicitly warn users to handle concurrent writes themselves. agent-coherence is the layer that does that.

...use A2A? A2A is the transport layer — how agents send tasks to each other. agent-coherence is the artifact coherence layer — how agents share state while they work. They compose.

...use Anthropic / OpenAI prompt caching? Provider-side caching reduces per-agent prompt overhead but does not address inter-agent artifact synchronization. The two are complementary: prompt caching keeps the prefix cheap; coherence keeps the shared artifacts lean.

FAQ

The paper says 84–95%. Why does the README say 47–74%?

Two different measurements, both honest. The paper measures protocol-only overhead in simulation under controlled assumptions. The 47–74% is what you measure running CCSStore on a real LangGraph graph. Use 47–74% for ROI expectations. The 84–95% describes the protocol's theoretical ceiling under ideal conditions.

Why does the high-churn workload only save 29%?

Write-heavy workloads are the protocol's lower bound by design. Every write triggers invalidation, which forces the next read to be a miss.

Will I see the simulation numbers in production?

Almost certainly not. The simulation isolates the protocol from real-world factors like LangGraph's framework overhead, prompt construction, and extra artifact reads. If your workload has a read/write ratio above 3:1, expect 50–70% in production.

Can I get higher savings than 69%?

Yes, but it requires architectural changes beyond CCSStore — for example, partial-read APIs so agents fetch only the artifact fragments they need. CCSStore v0.2 operates at the whole-artifact level.

Status

v0.2 ships inline benchmarking, expanded telemetry, and the shared-codebase example.

Shipped in v0.2:

Inline benchmark mode — CCSStore(benchmark=True) + print_benchmark_summary()
Degradation visibility — CoherenceDegradedWarning, is_degraded, degradation_count
Expanded telemetry — OTel: tokens saved, cache hit/miss counters, degraded-mode gauge; LangSmith: per-run token_reduction_pct, cache_hit_rate, tokens_saved_estimate
Shared-codebase example — 4-agent code review pipeline with benchmark output
Production benchmarks on real LangGraph deployments (benchmarks/langgraph_real/)
Telemetry exporters: OpenTelemetry and LangSmith (ccs.adapters.telemetry)
Graceful degradation (on_error="degrade")

Coming next:

Optimistic-locking strategy for high-contention workloads
Async coordinator for large agent fleets
Persistent backend (PostgresStore compatibility)

This is an alpha release. APIs may change before v1.0.

Paper

Token Coherence: Adapting MESI Cache Protocols to Minimize Synchronization Overhead in Multi-Agent LLM Systems arXiv:2603.15183

BibTeX

@article{parakhin2026token,
  title   = {Token Coherence: Adapting MESI Cache Protocols to Minimize
             Synchronization Overhead in Multi-Agent LLM Systems},
  author  = {Parakhin, Vladyslav},
  journal = {arXiv preprint arXiv:2603.15183},
  year    = {2026}
}

License

Apache-2.0. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

MrVlad

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.1

May 6, 2026

0.3.0

May 5, 2026

This version

0.2.0

Apr 26, 2026

0.1.0

Mar 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_coherence-0.2.0.tar.gz (56.7 kB view details)

Uploaded Apr 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_coherence-0.2.0-py3-none-any.whl (62.6 kB view details)

Uploaded Apr 26, 2026 Python 3

File details

Details for the file agent_coherence-0.2.0.tar.gz.

File metadata

Download URL: agent_coherence-0.2.0.tar.gz
Upload date: Apr 26, 2026
Size: 56.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_coherence-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`b66dd2ee457c676ffdd8153eedad5e1e97b3e11c575433467f42ce7ae77bc376`
MD5	`dffaa2428ff0712f746accd03bea2f59`
BLAKE2b-256	`1a9c674ffeaf2f083dc1190ade2ea5e67a708944230960d608488e85e413ee11`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_coherence-0.2.0.tar.gz:

Publisher: publish.yml on hipvlady/agent-coherence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agent_coherence-0.2.0.tar.gz
- Subject digest: b66dd2ee457c676ffdd8153eedad5e1e97b3e11c575433467f42ce7ae77bc376
- Sigstore transparency entry: 1391474527
- Sigstore integration time: Apr 26, 2026
Source repository:
- Permalink: hipvlady/agent-coherence@2f7e1f0308919247b8fb9b0a3c3fdf107e23802f
- Branch / Tag: refs/tags/v02
- Owner: https://github.com/hipvlady
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2f7e1f0308919247b8fb9b0a3c3fdf107e23802f
- Trigger Event: push

File details

Details for the file agent_coherence-0.2.0-py3-none-any.whl.

File metadata

Download URL: agent_coherence-0.2.0-py3-none-any.whl
Upload date: Apr 26, 2026
Size: 62.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agent_coherence-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dcd9437b43c3eeacd86aaaa20a5387f9e8a688177ffb02600462c938afac153e`
MD5	`52d6ec692f6b94d52c1ca7f56f37d2ee`
BLAKE2b-256	`44e7bd55c44a84804d7fb11a7e05c51e3dabb83d24f954045d71e0ab3508934d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agent_coherence-0.2.0-py3-none-any.whl:

Publisher: publish.yml on hipvlady/agent-coherence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agent_coherence-0.2.0-py3-none-any.whl
- Subject digest: dcd9437b43c3eeacd86aaaa20a5387f9e8a688177ffb02600462c938afac153e
- Sigstore transparency entry: 1391474532
- Sigstore integration time: Apr 26, 2026
Source repository:
- Permalink: hipvlady/agent-coherence@2f7e1f0308919247b8fb9b0a3c3fdf107e23802f
- Branch / Tag: refs/tags/v02
- Owner: https://github.com/hipvlady
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2f7e1f0308919247b8fb9b0a3c3fdf107e23802f
- Trigger Event: push

agent-coherence 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

agent-coherence

How it works

Quick start

Low-level adapter API

Running the examples

Real-workload benchmarks

How to read these numbers

Where the paper's 84–95% figures come from

What this means for adoption

What CCSStore is — and isn't

Architecture

Guarantees

Why not just...

FAQ

The paper says 84–95%. Why does the README say 47–74%?

Why does the high-churn workload only save 29%?

Will I see the simulation numbers in production?

Can I get higher savings than 69%?

Status

Paper

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance