Skip to main content

PreCompact ledger and per-component context profiler for Claude Code

Project description

claude-code-context-profiler (ccprofile)

ci python license

A local PreCompact ledger and per-component context profiler for Claude Code.

Stop Claude Code from forgetting file paths, error codes, and decisions after auto-compaction.

ccprofile is a local hook bundle and CLI for Anthropic's Claude Code that intervenes before auto-compaction runs. It extracts a structured session ledger — files read, files modified, errors seen, user preferences — and re-injects it as a persistent system note that survives compaction.

It also ships ccprofile profile: a ccusage-style attribution view of where your 200K-token context is going right now, with size × staleness scored eviction candidates.

Headline

On 12 real Claude Code sessions, the structured ledger beats a real Anthropic-Haiku-driven compaction summary by +33.5 pp [+24.5, +46.1] overall recall at the same token budget. 95% CI excludes zero across five paired comparisons. See Benchmark results for the full table and docs/benchmark-v0.md for the report.

Quickstart

# From PyPI (CLI is named `ccprofile`; the unrelated `ccprofile` PyPI
# package is something else — install the distribution name shown here):
pip install claude-code-context-profiler

# Or from source:
git clone https://github.com/ks7585/claude-code-context-profiler.git
cd claude-code-context-profiler
python -m venv .venv && . .venv/bin/activate
pip install -e .

# Where is your context going right now?
ccprofile profile

# Install the PreCompact hook (preview first; --apply writes settings.json
# with a timestamped backup):
ccprofile install
ccprofile install --apply

# Reproduce the benchmark on your own ~/.claude/projects/ corpus:
ccprofile bench run --sweep-k 50,100,200,500,1000 --bootstrap-iters 10000

The profile view

ccprofile profile screenshot

Why

Anthropic's native auto-compaction is a token-saving win, but it routinely loses the things engineers care most about: specific file paths, error codes, decisions, and "approaches we already ruled out." Users work around this with /compact preserve the coding patterns we established, custom CLAUDE.md notes, and external memory plugins. None of those are measured.

ccprofile does two things:

  1. Intervention. A PreCompact hook extracts the structured ledger before compaction sees the conversation, so compaction's lossy summarizer operates on less noise and the ledger is re-injected after compaction completes.
  2. Measurement. A paired-run benchmark over real Claude Code session transcripts compares oracle / simulated baseline / real-API baseline / ledger conditions with lexical and LLM-graded semantic probes and percentile-bootstrap CIs.

What it isn't

  • Not a memory framework. Cross-session memory is solved (Anthropic's memory tool, claude-mem, mem0, Letta). ccprofile operates inside one session.
  • Not a generic observability platform. Langfuse and ccusage already do that.
  • Not a competitor to native compaction. It complements it.

Benchmark results

Measured on 12 real Claude Code sessions under five evaluation conditions, with 10,000-resample percentile bootstrap CIs (sessions paired across conditions). Full report: docs/benchmark-v0.md. Harness: bench/README.md.

Condition Overall recall ± 95% CI file_read file_modified error_class Avg context tokens
oracle (no compaction) 99.0% [98.3, 100.0] 100.0% [100.0, 100.0] 100.0% [100.0, 100.0] 94.9% [91.9, 100.0] 114,389
baseline (simulated lossy compaction) 28.1% [12.7, 67.6] 35.3% [13.2, 73.6] 13.8% [5.8, 53.3] 47.5% [26.4, 77.8] 25,724
real_baseline (Claude-Haiku-driven compaction) 29.1% [13.2, 68.5] 36.2% [13.7, 74.7] 15.2% [6.3, 54.8] 47.5% [26.4, 77.8] 26,606
ledger (synthetic baseline + ccprofile) 62.6% [43.5, 98.8] 84.5% [70.0, 100.0] 45.7% [24.4, 100.0] 59.3% [32.2, 95.6] 26,728
real_ledger (real baseline + ccprofile) 62.6% [43.5, 98.8] 84.5% [70.0, 100.0] 45.7% [24.4, 100.0] 59.3% [32.2, 95.6] 27,611

Headline deltas (paired bootstrap, 95% CI):

Comparison Δ overall What it means
ledgerbaseline +34.5 pp [+24.6, +49.3] Ledger vs the simulated lower bound
real_ledgerreal_baseline +33.5 pp [+24.5, +46.1] Ledger vs a real Anthropic-style auto-compactor
baselinereal_baseline −1.0 pp [−4.1, 0.0] Synthetic vs real summary — statistically equivalent
ledgerreal_baseline +33.5 pp [+24.5, +46.1] Ledger beats the real compactor

All four ledger-vs-non-ledger CIs exclude zero. The "but a real compactor would close this gap" objection is empirically refuted: at the same token budget, a model-driven summary recovers only ~1 pp more facts than the synthetic header.

Robustness: ledger wins across all five keep_last_k settings swept (K ∈ {50, 100, 200, 500, 1000}). Even at K=1000, the delta CI is +24.8 pp [+10.4, +33.5].

Cost: ledger overhead ~1,005 tokens (3.9% of baseline context). Extraction latency: ~41 ms per session.

Semantic probe (LLM-graded)

Per fact, claude-haiku-4-5-20251001 is asked "did the agent X earlier in this session? YES / NO" against each condition's post-compaction context. Positive questions come from ground truth; negatives from a deterministic distractor pool.

Category baseline real_baseline ledger real_ledger
file_read 35.8% 37.7% 86.8% 83.0%
file_modified 51.2% 41.5% 78.0% 78.0%
error_class 51.3% 43.6% 66.7% 71.8%
aggregate 45.1% 40.6% 78.2% 78.2%
specificity 100% 100% 100% 100%

Two-probe convergence: lexical and semantic methodologies agree on the same effect, at similar magnitude. Specificity = 100% on every cell — the model never accepts a distractor, confirming the YES answers are grounded in the context rather than the model's priors.

A notable finding: on file_modified and error_class, the real_baseline semantic recall is lower than the synthetic baseline. The model compactor writes generic prose ("the agent fixed several errors") that obscures specific facts, while the synthetic header at least preserves counts. The structured ledger preserves exact strings, which a downstream LLM can recover.

Reproduce on your own session corpus

# Lexical-only headline (no API key needed):
ccprofile bench run \
    --sweep-k 50,100,200,500,1000 \
    --bootstrap-iters 10000 --seed 0

# Add the LLM-graded semantic probe:
ccprofile bench run \
    --semantic-probe --probe-concurrency 2 \
    --bootstrap-iters 10000 --seed 0

# Add the real Anthropic-API-driven compaction baseline
# (requires Anthropic tier 2 — input is ~150K tokens per session):
ccprofile bench run \
    --semantic-probe --real-baseline \
    --exclude <current_session_id> \
    --bootstrap-iters 10000 --seed 0

Documentation

Status

Pre-alpha but functional.

PyPI distribution name: claude-code-context-profiler. The short package name ccprofile on PyPI is held by an unrelated single-release "Claude Code permission profile manager" project and is not this package.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_code_context_profiler-0.0.1.tar.gz (72.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

claude_code_context_profiler-0.0.1-py3-none-any.whl (56.8 kB view details)

Uploaded Python 3

File details

Details for the file claude_code_context_profiler-0.0.1.tar.gz.

File metadata

File hashes

Hashes for claude_code_context_profiler-0.0.1.tar.gz
Algorithm Hash digest
SHA256 61fa5e91af472b0e3b68b12a5ceb6aa8b78e0180e93855e8f0bead7b3c67c7de
MD5 ff3367f422d954dfbcc77f9dd41d98a7
BLAKE2b-256 72e31064f44dcd926b12acfdeb672420cd0261bb8daafb3674900dd71c05593a

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_code_context_profiler-0.0.1.tar.gz:

Publisher: release.yml on ks7585/claude-code-context-profiler

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file claude_code_context_profiler-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for claude_code_context_profiler-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fee284dbfa2c78443197fd01769ecb380cd7ce48ea0c716046d76d58c5292860
MD5 d2f82cfc4714444d6abc6635d83456ec
BLAKE2b-256 7f4f7bc9a38099d86d734afac8f4565a664eb12fb841bf70ff76adaa87a6f1ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_code_context_profiler-0.0.1-py3-none-any.whl:

Publisher: release.yml on ks7585/claude-code-context-profiler

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page