Skip to main content

Find out where your coding agent starts degrading. Personal context-rot analytics from your own agent sessions.

Project description

contextrot logo

contextrot

Your coding agent gets worse as its context fills.
contextrot proves it on your own sessions — and tells you exactly what to change.

PyPI version PyPI Downloads Python versions CI License: MIT

Quick start

uvx contextrot

or, with plain pip (Python 3.9+ — including the stock python3 on macOS):

pip3 install contextrot
contextrot

That's it. No config, no API keys, no uploads. contextrot reads the session transcripts your agent CLI already keeps on disk and answers a question no other tool answers:

At what context fill does my agent start failing, what's causing it, and what is it costing me?

contextrot terminal report: verdict, rot curve by context fill with confidence intervals, context composition, and prescriptions

Every report leads with a plain verdict — one of four honest answers:

Verdict Meaning
Context rot detected your failure rate climbs significantly as context fills
! Edge rot flat until near the window limit, then it climbs — compact before you get there
No measurable rot your failure rate stays flat; your setup is working
? Not enough data keep using your agent and re-run

A tool that can say "you're fine" is a tool you can trust when it says you're not.

Why a benchmark can't tell you this

Research (Chroma's context-rot report, several 2026 papers) shows LLM output quality degrades as input context grows — even far below the window limit. But that research runs synthetic tasks in lab conditions. Your degradation point depends on your projects, your MCP setup, your model, your prompting style.

contextrot measures it where it actually matters: in your own sessions.

How it works

Agent CLIs like Claude Code log every session to local JSONL transcripts. Each step carries token accounting and behavioral evidence. contextrot extracts five independent failure signals per step and correlates them with context fill at that moment:

Signal What it catches
Edit failures the agent tried to edit code and missed — the clearest "lost track of file state" event
Retry loops the same tool call repeated after an error: paying twice for one action
Re-reads re-reading files it already read — content scrolled out of effective attention
Self-corrections "I apologize, let me fix that"
Tool errors any failed tool call

Statistics are kept honest: Wilson 95% confidence intervals, per-signal breakdowns, visible n-counts, and a degradation threshold that only gets declared when a bucket's confidence floor clears the baseline — one noisy bucket can't scare you. Full method: docs/methodology.md.

Commands

contextrot                      # full report, last 30 days
contextrot --days 90            # more history = tighter statistics
contextrot -p myproject         # one project only
contextrot --html report.html   # shareable single-file report (still 100% local)
contextrot --json               # every number, recomputable
contextrot sessions             # list what was parsed

How is this different from…

Tool Question it answers What it can't tell you
ccusage "How much did I spend?" anything about output quality — use both, they're complementary
Claude Code /context "What's in my window right now?" no outcomes, no history, no correlation
Langfuse / Phoenix / MLflow "How is the app I built behaving?" require instrumentation; contextrot analyzes the agent you use, zero setup
Chroma's research "Do models degrade on benchmarks?" nothing about your workload — contextrot is the personal-data counterpart

FAQ

The report says $2,000+ but I'm on a $20/month subscription. Is it broken? No — that figure is the token value of your usage priced at API list rates, labeled as such in the report. It exists because tokens are the resource that fills your context window and burns your rate limits, and dollars are the only unit everyone reads instantly. Two honest readings: it's what your usage would cost pay-per-token (enjoy your subscription), and the "burned in degraded steps" share is the fraction of that resource going to rework. It is not, and never claims to be, your bill.

Why is the token flow so large? Agents re-send the entire conversation to the model on every step. A 100-step session at 100k context ≈ 10M tokens flowing through — mostly cache reads. That's normal; it's also exactly why context bloat matters.

Correlation isn't causation, right? Right, and the report says so on its face. Deep-context steps are also later-in-task steps. contextrot is an observational diagnostic with conservative statistics, not a lab experiment — see methodology.

What about my privacy? contextrot makes zero network calls. Local files in, terminal/local HTML out. Grep the codebase for an HTTP client — there isn't one.

Supported agents

Agent Status
Claude Code ✅ today
Codex CLI planned — adapter wanted
OpenCode planned — adapter wanted
Gemini CLI planned — adapter wanted
OpenTelemetry GenAI spans planned

An adapter is one small file with a fixture and a test — it's the paved first-contribution path.

Roadmap

  • contextrot fix — apply prescriptions interactively (disable unused MCP servers, trim CLAUDE.md) with before/after measurement
  • More agent adapters + OTel ingestion
  • Opt-in, anonymized aggregate stats → the State of Context Rot report: real-workload degradation curves across the community (off by default, aggregate-only, documented schema)

Contributing

See CONTRIBUTING.md. Most valuable first PR: an adapter for the agent CLI you use.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextrot-0.1.6.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contextrot-0.1.6-py3-none-any.whl (31.5 kB view details)

Uploaded Python 3

File details

Details for the file contextrot-0.1.6.tar.gz.

File metadata

  • Download URL: contextrot-0.1.6.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contextrot-0.1.6.tar.gz
Algorithm Hash digest
SHA256 58eef939be6d9b59c1ae28748281fd8472fd5e3deaa1ff5d5a668b90a328abf9
MD5 93b327287b8d8b10f3987042b4db47a1
BLAKE2b-256 63e66485f9f83c751d55e9442423f3559451dd43aa424ffe54738008fe11dd29

See more details on using hashes here.

Provenance

The following attestation bundles were made for contextrot-0.1.6.tar.gz:

Publisher: release.yml on Priyanshu-byte-coder/contextrot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file contextrot-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: contextrot-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 31.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contextrot-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 20ef28bedda07b45ad4e71a5bd727c9ae4c2eb65eef3ef2ac40cedbae02260e7
MD5 9cc320306b676f6bf6d1c4714fb8c7de
BLAKE2b-256 a44b03fb4b965c58b62085d4bfcdb3067d685eec7c9f2ace4297a16c865cf952

See more details on using hashes here.

Provenance

The following attestation bundles were made for contextrot-0.1.6-py3-none-any.whl:

Publisher: release.yml on Priyanshu-byte-coder/contextrot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page