Real-time memory health monitor and context rot detector for AI agents
Project description
DriftWatch ๐งญ
Real-time memory health monitoring for AI agents. Detect context rot before your agent goes off the rails.
โโ DriftWatch โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Goal: "Conduct a comprehensive research survey on Python performance" โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Health Score โ Signal Breakdown โ
โ โ โ
โ โโโโโโโโโโโ 0.72 โ Goal Coherence โโโโโโโโโโโ 0.81 โ
โ [HEALTHY] โ Entropy โโโโโโโโโโโ 0.68 โ
โ โ Memory Delta โโโโโโโโโโโ 0.54 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Turn 12 โ Tokens: 48,230 / 200,000 (24%) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Recent: T08 0.79โ T09 0.76โ T10 0.68โ T11 0.61โ T12 0.72โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The problem
Long-running AI agents don't fail all at once โ they drift. By the time your agent produces clearly wrong output, it has been silently degrading for dozens of turns. Context rot is the progressive loss of reasoning quality that starts at 60โ70% context fill, not at 100%.
A 2025โ2026 industry analysis found that ~65% of enterprise AI agent failures were caused by context drift or memory loss during multi-step reasoning โ not by raw context exhaustion. The degradation is measurable, predictable, and preventable. DriftWatch does all three.
Install
pip install agent-driftwatch
Or from source:
git clone https://github.com/your-org/driftwatch
cd driftwatch
pip install -e .
30-second start
import os
import anthropic
import driftwatch
# Wrap your existing Anthropic client โ one line change
client = driftwatch.wrap(
anthropic.Anthropic(),
goal="Explain the key principles of clean code and give Python examples",
threshold=0.55, # trigger action below this health score
on_drift="alert", # "checkpoint" | "compact" | "alert" | callable
dashboard=True, # Rich live terminal panel
)
messages = []
topics = [
"What are the most important principles of clean code?",
"Can you give a Python example of the Single Responsibility Principle?",
"How does dependency injection improve testability?",
"What's the difference between early return and guard clauses?",
"Give me a before/after refactor of a messy Python function.",
]
for turn, question in enumerate(topics, start=1):
messages.append({"role": "user", "content": question})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
messages=messages,
)
messages.append({"role": "assistant", "content": response.content[0].text})
event = client.drift_history[-1]
print(f"Turn {turn} | health={event.health_score:.3f} | tokens={event.token_count:,}")
Output:
Turn 1 | health=0.914 | tokens=1,240
Turn 2 | health=0.882 | tokens=2,890
Turn 3 | health=0.856 | tokens=4,780
Turn 4 | health=0.824 | tokens=7,120
Turn 5 | health=0.793 | tokens=9,870
How it works
DriftWatch computes a composite health score (0.0โ1.0) after every turn by combining three independently validated signals:
| Signal | What it measures | Method |
|---|---|---|
| Goal Coherence | How closely the agent's response aligns with the original task intent | Cosine similarity between goal embedding and last-turn embedding (all-MiniLM-L6-v2) |
| Repetition Entropy | Whether the agent is looping or executing diverse actions | Shannon entropy over tool call names / word bigrams in a sliding window |
| Memory Delta | Whether the agent is introducing new facts or just repeating prior context | New-fact ratio via embedding centroid comparison |
The composite score is a configurable weighted average:
health_score = 0.50 ร goal_coherence
+ 0.30 ร repetition_entropy
+ 0.20 ร memory_delta
Color thresholds:
- ๐ข
>= 0.70โ Healthy - ๐ก
0.55โ0.70โ Warning (drift beginning) - ๐ด
< 0.55โ Drift detected
Research basis: arXiv:2601.04170 (Rath, Jan 2026) โ "Agent Drift: Quantifying Behavioral Degradation in Multi-Agent LLM Systems" โ formally defines semantic drift, coordination drift, and behavioral drift, and introduces the Agent Stability Index (ASI) composite metric that DriftWatch implements.
Auto-compaction
When on_drift="compact", DriftWatch automatically triggers Anthropic's
compact-2026-01-12 API to summarise the conversation before continuing:
client = driftwatch.wrap(
anthropic.Anthropic(),
goal="Analyse this codebase for dead code",
on_drift="compact", # โ auto-compaction on drift
)
Under the hood, when health_score < threshold:
# DriftWatch calls this automatically:
response = client.beta.messages.create(
betas=["compact-2026-01-12"],
model=model,
max_tokens=1024,
messages=messages,
context_management={
"edits": [{
"type": "compact_20260112",
"pause_after_compaction": True,
"instructions": "Preserve: original goal, all tool call results, "
"decisions made, files modified. "
"Discard: repeated tool outputs, exploratory tangents.",
}]
},
)
The compacted summary replaces the conversation history, token count resets, and health scores recover โ all transparently. Your agent loop code doesn't change at all.
on_drift handlers
| Handler | Behaviour |
|---|---|
"checkpoint" |
Save messages + DriftEvent log to checkpoint_dir/ |
"compact" |
Trigger Anthropic compaction, then save checkpoint |
"alert" |
Print a warning to stderr and continue |
"none" |
Monitor silently, take no action |
callable |
Call fn(client, event) โ fully custom handler |
def my_handler(client, event):
send_slack_alert(f"Agent drift detected! health={event.health_score:.2f}")
client.save_checkpoint(messages)
client = driftwatch.wrap(anthropic.Anthropic(), goal="...", on_drift=my_handler)
CLI
Replay a session
Visualise a saved event log as a turn-by-turn health timeline:
driftwatch replay ./dw_checkpoints/events.jsonl
DriftWatch Replay โ events.jsonl
Turn โ Health โ GC โ Entropy โ MemDelta โ Tokens โ Status
โโโโโโโผโโโโโโโโโผโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโ
1 โ 0.92 โ 0.95 โ 0.88 โ 0.93 โ 1,240 โ โ healthy
2 โ 0.89 โ 0.91 โ 0.85 โ 0.91 โ 2,890 โ โ healthy
...
10 โ 0.52 โ 0.58 โ 0.35 โ 0.42 โ 28,700 โ โ DRIFT
11 โ 0.48 โ 0.54 โ 0.28 โ 0.38 โ 31,200 โ โ DRIFT
12 โ 0.44 โ 0.49 โ 0.22 โ 0.33 โ 33,800 โ โ
compacted
13 โ 0.83 โ 0.85 โ 0.78 โ 0.88 โ 4,200 โ โ healthy
Generate a session report
driftwatch report ./dw_checkpoints/events.jsonl --format md
# DriftWatch Session Report
| Metric | Value |
|--------|-------|
| Total turns | 20 |
| Average health | 0.741 |
| First drift turn | T10 |
| Worst health turn | T12 (0.438) |
| Drift events (< 0.55) | 3 |
| Compaction events | 1 |
Or as JSON:
driftwatch report ./dw_checkpoints/events.jsonl --format json
Try the fixture
driftwatch replay tests/fixtures/demo_session.jsonl
Configuration reference
client = driftwatch.wrap(
anthropic.Anthropic(),
goal="...", # required: the semantic anchor
threshold=0.55, # health score that triggers on_drift
on_drift="checkpoint", # handler (see table above)
checkpoint_dir="./dw_checkpoints", # where to save files
dashboard=True, # Rich live UI (auto-suppressed in CI)
max_context_tokens=200_000, # context window for token % display
weights={ # override composite signal weights
"goal_coherence": 0.50,
"repetition_entropy": 0.30,
"memory_delta": 0.20,
},
log_path=None, # custom JSONL log path
)
DriftEvent schema
Every turn produces a DriftEvent (Pydantic model):
@dataclass
class DriftEvent:
turn: int # monotonically increasing (1-based)
timestamp: datetime # UTC
goal_coherence: float # Signal 1: [0.0, 1.0]
repetition_entropy: float # Signal 2: [0.0, 1.0]
memory_delta: float # Signal 3: [0.0, 1.0]
health_score: float # weighted composite: [0.0, 1.0]
token_count: int # input_tokens from API usage
triggered_checkpoint: bool # True if on_drift handler fired
notes: str # optional annotation
Access the full history:
for event in client.drift_history:
print(f"T{event.turn}: {event.health_score:.3f}")
Roadmap
- OpenAI SDK support
- LangGraph integration (
DriftWatchCallbackHandler) - Multi-agent drift โ coordination drift signal across agent network
- GitHub Actions reporter (
driftwatch-action) - Prometheus metrics endpoint
-
driftwatch watch <script.py>โ subprocess injection (CLI v0.2) - Grafana dashboard template
Architecture
driftwatch/
โโโ signals.py โ 3 drift signal classes (offline, no API key)
โโโ engine.py โ composite scorer + DriftEvent schema
โโโ wrapper.py โ Anthropic SDK intercept layer
โโโ checkpoint.py โ save/restore + compaction API
โโโ dashboard.py โ Rich live terminal UI
โโโ cli.py โ Typer CLI (replay, report, watch)
DriftWatch is an observer โ it never modifies the response your code
receives from the Anthropic SDK. It intercepts only to evaluate and log.
The sole exception is on_drift="compact", which updates your messages
list in place after compaction (your agent continues seamlessly).
Contributing
git clone https://github.com/your-org/driftwatch
cd driftwatch
pip install -e ".[dev]"
python -m pytest tests/ -v
All signal tests run without an API key. PRs welcome!
Citation
If you use DriftWatch in academic research, please cite the foundational work this library is built on:
@article{rath2026agentdrift,
title = {Agent Drift: Quantifying Behavioral Degradation in Multi-Agent LLM Systems},
author = {Rath, et al.},
journal = {arXiv preprint arXiv:2601.04170},
year = {2026}
}
Related papers:
- arXiv:2505.02709 โ "Technical Report: Evaluating Goal Drift in Language Model Agents" โ defines GD_actions and GD_inaction metrics
- arXiv:2510.00615 โ "ACON: Optimizing Context Compression for Long-horizon LLM Agents" โ validates 26โ54% peak token reduction with smart compression
License
MIT โ see LICENSE.
Built with โค๏ธ for the AI engineering community.
If DriftWatch saved your agent, give it a โญ
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_driftwatch-0.1.0.tar.gz.
File metadata
- Download URL: agent_driftwatch-0.1.0.tar.gz
- Upload date:
- Size: 28.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9954f4f77b770d1ca0d32e3e4a4ed2e45f9e318504ea94c09050b70b5b26305a
|
|
| MD5 |
62341c3345741f6b4c298cfaae0fbd2d
|
|
| BLAKE2b-256 |
3fb9c488e3dcc93ae17bf521786329d0ef8987bcaf07606a93895d6d25ed51a6
|
File details
Details for the file agent_driftwatch-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agent_driftwatch-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da9ca13d87533ddd20fd4789eb12e382e6ef6ff71a237e4777095bf6723d200c
|
|
| MD5 |
6c3fe522ddbe402a1268ff12582439a8
|
|
| BLAKE2b-256 |
528de71785afdf7a1b6ea4ea24e6e294cbed4f29c2b29099fc3cf9ebd5150a62
|