Skip to main content

Diagnose context degradation in LLM agents — find where memory breaks and how to fix it

Project description

context-lens

Tells you where your LLM agent's memory breaks, at what token count, and how to fix it.

pip install reguliq-diagnostics

Quickstart

from context_lens.engine.measurement import measure_context_health
from context_lens.reporter import Reporter

# 1. Probe your agent's context window
result = measure_context_health(
    agent_name="my-rag-agent",
    haystack=my_background_text,
    needle="The Q3 revenue was $4.2M",
    question="What was Q3 revenue?",
    expected="4.2M",
)

# 2. Run all 6 classifiers
report = Reporter().run(result)

# 3. View results
report.summary()          # terminal output
report.save("report.html")  # open in browser

What it finds

Pattern What it means Severity
beginning_anchored Model retrieves facts only from the first 15% of context HIGH
cliff_detector Accuracy drops >20% between adjacent token counts HIGH
distractor_confusion Near-miss facts in context cause wrong answers HIGH
tool_burial Accuracy collapses after 3rd+ sequential tool call MEDIUM
instruction_drift System-prompt constraints weaken over conversation turns MEDIUM
recency_bias Model ignores everything except the last 20% of context MEDIUM

Demo

ReguliQ (production LangGraph agent) — healthy

Instrumented with real LangGraph callbacks. Peak context: 965 tokens. At that scale, Claude Haiku retrieves with 100% accuracy.

context-lens: ReguliQ
  score: A  |  mean accuracy: 100.0%  |  5 classifiers run
  no patterns detected — context health looks good

View reguliq_report.html

Synthetic unhealthy agent — context degradation detected

Beginning-anchored retrieval + cliff at 30K tokens.

context-lens: my-rag-agent (synthetic)
  score: F  |  mean accuracy: 35.0%  |  5 classifiers run
  cliff: 30,000 tokens
  4 pattern(s) detected:
    [MEDIUM] beginning_anchored  conf=0.50
    [MEDIUM] cliff_detector      conf=0.58
    [HIGH  ] tool_burial         conf=0.62
    [HIGH  ] instruction_drift   conf=0.62

View unhealthy_report.html


Architecture

context_lens/
├── engine/
│   ├── probes.py          # NIAH probe injection + needle-in-haystack runs
│   ├── measurement.py     # sweeps positions × token counts, returns MeasurementResult
│   └── snapshots.py       # ContextSnapshot capture for live agents
│
├── classifiers/           # 6 pattern detectors (detect() + recommend())
│   ├── beginning_anchored.py
│   ├── cliff_detector.py
│   ├── distractor_confusion.py
│   ├── tool_burial.py
│   ├── instruction_drift.py
│   └── recency_bias.py
│
├── instrumentation/
│   └── langgraph.py       # LangGraphInstrumentor — wraps any compiled graph
│
├── reporter.py            # Reporter.run() → ReportData (score + recommendations)
│
└── report/
    ├── renderer.py        # renders ReportData → self-contained HTML (no CDN)
    └── template.html      # dark terminal theme, SVG charts, zero dependencies

How it works

your agent          context-lens
──────────          ────────────────────────────────────
LangGraph    ──►  LangGraphInstrumentor
   graph           │  captures token counts per node
                   ▼
             measure_context_health()
                   │  plants NIAH probes at each
                   │  position × token count cell
                   ▼
             MeasurementResult
                   │  accuracy_by_position()
                   │  accuracy_by_token_count()
                   ▼
             Reporter.run()
                   │  runs all 6 classifiers
                   │  computes A-F grade
                   ▼
             ReportData.save("report.html")

Installation

# Core (probing + classifiers + HTML report)
pip install reguliq-diagnostics

# LangGraph instrumentation
pip install "reguliq-diagnostics[langgraph]"

# Development
pip install "reguliq-diagnostics[dev]"

Running the demos

# Unhealthy agent (synthetic — no API key needed)
python examples/unhealthy_agent_demo.py

# ReguliQ (requires API keys + ReguliQ repo)
python examples/reguliq_demo.py

# ReguliQ with Phase 3 baseline only (no API calls)
python examples/reguliq_demo.py --synthetic

Dev

# Setup (Windows)
uv venv && .venv\Scripts\activate
uv pip install -e ".[dev,langgraph]"

# Test
pytest tests/ -v --cov=context_lens

# Build
uv build

209 tests passing


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reguliq_diagnostics-0.1.0.tar.gz (149.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reguliq_diagnostics-0.1.0-py3-none-any.whl (40.2 kB view details)

Uploaded Python 3

File details

Details for the file reguliq_diagnostics-0.1.0.tar.gz.

File metadata

  • Download URL: reguliq_diagnostics-0.1.0.tar.gz
  • Upload date:
  • Size: 149.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for reguliq_diagnostics-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1a3e239f56dd3bdfe98cdb8b62ec9d5d6ee2acc7454f199f91ac2ad3b058f85c
MD5 c046453e6e69f885dace21f1611217b7
BLAKE2b-256 111dcfa16f3389e98efa2e8c24f532985e0ef645017a29e11e6148c44b340d76

See more details on using hashes here.

File details

Details for the file reguliq_diagnostics-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for reguliq_diagnostics-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 67221858ab43a4114ab7ab953b6f0e90c4e8e4c0344c8a99769dca889c028b0f
MD5 fb9c437507870644079ea64bcc07f6fe
BLAKE2b-256 3518a47647982d1fbea7aa2326fda0e54eb39e43836293937f89d25f78d81073

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page