Skip to main content

Read-only linter for .claude/ and CLAUDE.md — measures token budget and audits agent/skill quality.

Project description

claude-config-auditor

tests

A non-destructive linter and cost auditor for .claude/ and CLAUDE.mdwe annotate, we don't rewrite. Measures the token cost your Claude Code config pays on every session, and audits agent / skill quality (missing descriptions, overlapping routes, broken YAML).

Think of it as ESLint for your context window.

Why this exists

Claude Code loads CLAUDE.md, every .claude/agents/*.md, and every skill's SKILL.md into the context window on every session start. That's a fixed per-session tax — and most projects don't know how big theirs is.

The ecosystem has a lot of "session handoff" and "state management" tools. It doesn't have a linter for the config itself. This is that linter.

  • "How many tokens does my CLAUDE.md actually cost?"
  • "Are two of my agents describing the same job?" (Anthropic's docs say agents are selected based on their description; the exact ranking algorithm isn't published, but overlapping descriptions create routing ambiguity in practice.)
  • "Is my SKILL.md description too vague for Claude to ever invoke it?"

Cost snapshot — five popular frameworks

The headline claude-audit --html numbers from a clean install of five popular Claude Code frameworks, audited with --semantic on:

Framework Always loaded Window Files Findings
BMAD 2,304 tok 1.2% 45 0
wshobson (4 plugins) 2,378 tok 1.2% 41 0
claude-flow (now ruflo) 3,631 tok 1.8% 194 4 warn
SuperClaude 3,725 tok 1.9% 52 0
Claude-Code-Game-Studios 17,267 tok 8.6% 138 0

"Always loaded" is what Claude actually pulls into the main session at startup — full CLAUDE.md + full rules/ + only the YAML frontmatter of every agent and skill. "Window" is that figure expressed as a percentage of a 200k reference context window. "Findings" is the count of agent / skill / health issues the auditor flagged.

The 1.2% → 8.6% spread is informative on its own. A lean framework audits quietly; a heavy one flags exactly where to thin down. Full HTML reports — open in any browser, no network — live under case-studies/.

Produce your own row in this table:

pipx install git+https://github.com/emreyildirim/claude-config-auditor.git
claude-audit ~/your-project --html report.html

Open report.html in any browser. No network calls, nothing modified — just your numbers next to the five above.

What it does

Two modes — audit is the default and is read-only; fix is opt-in and prompts before every change.

audit (default, read-only)

  • Counts tokens for CLAUDE.md, every agent, every skill, every rule, and every slash command.
  • Splits the cost into always-loaded (full CLAUDE.md + full rules + agent/skill frontmatter) and on-demand (agent/skill bodies + slash commands, loaded when the agent runs, the skill is invoked, or the user types /<command>). The always-loaded number is what actually competes for context-window space at session start.
  • Reports both numbers and the always-loaded share of a typical 200k window.
  • Lints agent and skill frontmatter (missing fields, descriptions that are too short or too long, malformed YAML). Per-file eager footprint checks (AGT007 / SKL005) flag the kind of bloat that costs you on every session, not just files that happen to be large.
  • Detects overlapping agent description fields by simple word-overlap.
  • Recognises common third-party framework installs (BMAD, claude-flow, agent-pack / skill-pack / command-pack shapes) and adds that context to relevant findings so a missing CLAUDE.md is read as "intentional, ignore if it suits you" rather than scolding.
  • Outputs a human-readable terminal report, JSON (--json), or a self-contained HTML report with charts (--html).
  • Never modifies any files. Verified by an automated mtime/size snapshot test.

fix (opt-in, prompts before every change)

  • Walks fixable findings one by one: rationale → unified diff → +/- summary → explicit y/n/a/q prompt.
  • Two proposers shipped today: agent-description fixes for AGT003AGT008 (annotates frontmatter with # TODO YAML comments — Claude ignores them so behaviour is unchanged the moment the fix applies), and CLAUDE.md archive (moves stale sections into a sibling CLAUDE.archive.md using conservative veto heuristics).
  • Every accepted change is backed up with SHA-256 manifests; claude-audit revert restores any session and refuses to overwrite hand-edited files unless --force is passed.
  • --dry-run previews without writing; --apply-all batches approval (still prints every diff) for non-interactive use.

What it does not do

  • It does not call the Claude API. Everything is offline.
  • It does not hook into a live session.
  • It does not silently modify your files. audit is read-only; fix is opt-in and prompts before every change.

Design principles — why we annotate, not rewrite

Unlike LLM-autofix linters in this space, this tool refuses to invent agent / skill description text on a developer's behalf. The fix mode either annotates (inserts a discoverable # TODO marker above the field that needs work) or moves content mechanically (relocating a stale CLAUDE.md section into a sibling archive). It never produces new prose. Five reasons that's the contract:

  1. Predictability. A given finding produces the same diff every time. The auditor is a function, not a probabilistic generator — claude-audit fix --dry-run today and a month later show the same output for the same input.
  2. Reversibility. Every applied change is backed up with a SHA-256 manifest, and revert refuses to overwrite hand-edits unless --force is passed. The smaller and more mechanical the change, the more meaningful "revert" is — a one-line YAML comment reverts cleanly; an LLM rewrite that touched a dozen tokens does not.
  3. No API key, no network, no per-run cost. The tool runs offline by default. Nothing about your config is uploaded to a third party. (The optional --accurate flag described below is the single, explicit opt-out and it never changes default behaviour.)
  4. No model-version drift. Heuristics are pinned and live in this repo. An LLM-based linter's output depends on whichever model version it happens to call — same project, different month, different fix.
  5. Human-in-the-loop by design. Writing an agent's description: is a product decision (it shapes how Claude routes to it). A linter should surface the problem, not impersonate the developer making that call.

The roadmap sticks to this contract. The default fix behaviour does not change.

Install

Requires Python 3.10+ on the machine running the auditor. The target project can be in any language — the auditor only reads Markdown and YAML, never executes target code (see FAQ below).

Recommended: pipx (one-time install, works everywhere)

pipx installs Python CLI tools into isolated virtual environments and puts the executables on your PATH. The claude-audit command then works from any directory, regardless of which project venv you happen to have active.

# One-time: install pipx itself if you don't have it.
brew install pipx        # macOS
# or:  python3 -m pip install --user pipx  &&  pipx ensurepath

# Install the auditor (tiktoken comes with it as a hard dependency).
pipx install git+https://github.com/emreyildirim/claude-config-auditor.git

After that, claude-audit --help works from any project directory.

From source (for contributing)

git clone https://github.com/emreyildirim/claude-config-auditor.git
cd claude-config-auditor

python3 -m venv .venv
source .venv/bin/activate
pip install -e '.[dev]'    # editable install + pytest (tiktoken comes from the base deps)

pytest                      # run the test suite
claude-audit --help         # verify it works

Use as a pre-commit hook

The repo ships a .pre-commit-hooks.yaml so the auditor can be wired into the pre-commit framework directly. Add to your project's .pre-commit-config.yaml:

repos:
  - repo: https://github.com/emreyildirim/claude-config-auditor
    rev: main   # pin to a tag or SHA in real usage
    hooks:
      - id: claude-audit
        args: [--fail-on, error]

Then pre-commit install registers the hook. It runs only when files under .claude/ or CLAUDE.md change, exits non-zero on blocking findings (so the commit is rejected), and stays silent otherwise. The --fail-on flag controls how strict the gate is: error (default recommendation) blocks only on real problems; warning is stricter; never makes the hook informational.

Use

The tool has three subcommands. The first two — audit (default) and any of the flag-only invocations — are strictly read-only. The third — fix — is opt-in and asks before every change.

audit — read-only report

# Audit the current directory.
claude-audit

# Audit a specific project.
claude-audit ~/code/my-project

# Machine-readable output.
claude-audit --json > report.json

# Standalone HTML report with charts (opens offline, no network).
claude-audit ~/code/my-project --html report.html

# Fail in CI when there are blocking issues.
claude-audit --fail-on warning

# Custom CLAUDE.md token budget (default 5000).
claude-audit --budget 3000

# Route token counts through Anthropic's count_tokens endpoint for
# ground-truth accuracy. Opt-in only; requires ANTHROPIC_API_KEY in
# the environment. Per-file results are cached at
# ~/.cache/claude-config-auditor/ so repeat audits don't re-hit the API.
ANTHROPIC_API_KEY=sk-ant-... claude-audit --accurate

# Pin the model used for count_tokens (default: claude-sonnet-4-5).
claude-audit --accurate --accurate-model claude-haiku-4-5

# Re-evaluate the AGT008 word-overlap candidates with semantic
# embeddings. Requires the [semantic] extras package and downloads
# an ~80MB MiniLM model the first time it runs.
pip install 'claude-config-auditor[semantic]'
claude-audit --semantic

audit never modifies any file. The default behaviour stays this way forever — Phase 2 was deliberately put behind an explicit subcommand.

fix — propose and apply changes (Phase 2, opt-in)

Walks you through fixable findings. For each one you see:

  • a one-line rationale,
  • a unified diff of what would change (red for removals, green for adds),
  • a per-file +/- summary,
  • and an explicit prompt: [y]es / [n]o / [a]ll-remaining / [q]uit.

Everything applied is backed up under .claude-config-auditor/backups/<timestamp>/ inside the target. Revert is one command away.

# Preview without prompting or writing anything (safest first step).
claude-audit fix . --dry-run

# Interactive: see diff, answer y/n/a/q per change.
claude-audit fix .

# Batch approval (still prints every diff, just skips the per-change
# prompt). Required when stdin is not an interactive terminal (CI).
claude-audit fix . --apply-all

# Put backups somewhere else (default is inside the target).
claude-audit fix . --backup-dir ~/backups/

What fix can currently propose:

  • Annotate weak / overlapping agent descriptions (codes AGT003AGT008). Inserts # TODO (claude-audit, AGTxxx) YAML comments above the description: field. Claude ignores these at load time, so behaviour is unchanged the moment the fix applies; the TODO marks where you need to revise.
  • Move stale CLAUDE.md sections into a sibling archive (code HLT001). Conservative heuristics: protected headings (Rules, Conventions, …) and sections with operational language ("always", "must", "before using", …) are never archived. Each moved section leaves a pointer in the source so the outline survives.

The two proposers ship intentionally different philosophies. agent_description annotates — the TODO marker is a grep-friendly hint, the description text itself is left untouched on purpose, because the auditor will not invent wording on a developer's behalf. claude_md_archive edits — a section is physically moved into a sibling file because the operation is mechanical and reversible. A future Phase 3 may add LLM-assisted description rewriting on top of the existing annotation; for now, the split is the contract.

Example dry-run output is at examples/sample-fix-output.md.

revert — undo a fix run

# Enumerate backup sessions for a target.
claude-audit revert . --list

# Restore the most recent session.
claude-audit revert .

# Restore a specific session by id.
claude-audit revert . 2026-05-19T11-56-05Z-7ee17f

# If you've hand-edited the fix's output and want to overwrite anyway.
claude-audit revert . --force

revert checks each file's SHA-256 against what was on disk when the fix completed. If a file has drifted (you edited it between apply and revert), the revert is refused — your later edits are not silently destroyed. --force opts out of this check.

Safety guarantees (still true with fix in the picture)

  • audit never touches a file. The automated test test_auditor_does_not_modify_target snapshots every file's mtime and size before and after a full run and asserts they're identical.
  • fix never empties or deletes a file. Proposals can edit existing files or create new ones; an "empty after" is rejected at the data model.
  • Every applied change is backed up plain-text and discoverable. .claude-config-auditor/backups/<id>/manifest.json lists every file touched, with SHA-256 before and after.
  • The HTML report writer refuses to write inside the audited target.

Example output

  • Terminal: examples/sample-report.md — annotated terminal output from clean and broken fixtures.
  • HTML dashboard: examples/sample-audit.html — full HTML report (download to open offline, or view raw on GitHub). Light/dark theme, expandable file list, severity-coloured findings, hover tooltips on every metric.

Quick terminal taste:

Always-loaded session footprint
  ~256 tokens  (0.1% of 200k (typical Claude Code default))
  + ~154 tokens on-demand (agent/skill bodies, loaded when invoked)
  The always-loaded figure is paid on every Claude Code session.

By category  (eager / on-demand / total)
  claude.md    1 file(s)   ~80 / — / ~80
  agent        2 file(s)   ~117 / ~108 / ~225
  skill        1 file(s)   ~59 / ~46 / ~105
  command      2 file(s)   ~0 / ~140 / ~140

Findings  0 error  0 warning  0 info
  No issues found.

Slash commands appear with ~0 eager weight because Claude Code does not pull .claude/commands/*.md into context until the user types /<command>.

Case studies

Five real audits against popular Claude Code frameworks (BMAD, claude-flow / ruflo, SuperClaude, wshobson, Claude-Code-Game-Studios) live under case-studies/. Each file is the raw HTML report claude-audit --html --semantic produced on a fresh install — same metric tuning the rest of the README refers to. Use them as a baseline when re-running the auditor against a new release of one of these frameworks, or as a sanity check that the tool produces sensible numbers on a project you trust.

Working with the JSON output

--json writes a machine-readable report to stdout. Pipe it to a file or another tool. The full schema is in examples/sample-report.md; a few common recipes follow.

Save a report:

claude-audit ~/some-project --json > report.json

Pretty-print and inspect top-level keys (requires jq):

claude-audit ~/some-project --json | jq 'keys'

Just the headline numbers:

claude-audit ~/some-project --json | jq '{
  always_loaded: .eager_load_total_tokens,
  on_demand: .on_demand_total_tokens,
  window_pct: .percent_of_window
}'

Top 10 biggest files:

claude-audit ~/some-project --json | jq '.files[:10] | map({
  path: .relpath,
  tokens: .tokens,
  category
})'

Only the errors (blocking findings):

claude-audit ~/some-project --json |
  jq '.findings | map(select(.severity == "error"))'

Count findings by code (useful in CI dashboards):

claude-audit ~/some-project --json |
  jq '[.findings[] | .code] | group_by(.) |
      map({code: .[0], count: length})'

Fail the build only when there are AGT008 overlaps:

overlaps=$(claude-audit ~/some-project --json |
           jq '[.findings[] | select(.code=="AGT008")] | length')
if [ "$overlaps" -gt 0 ]; then
  echo "Agent description overlaps detected — fix before merging."
  exit 1
fi

Python (no jq needed):

import json, subprocess
data = json.loads(subprocess.check_output(
    ["claude-audit", "/path/to/project", "--json"]
))
big = [f for f in data["files"] if f["tokens"] > 5_000]
for f in big:
    print(f["tokens"], f["relpath"])

FAQ

Does this work on non-Python projects? React, Vue, Go, Ruby, Rust…?

Yes — any project, any stack. The auditor only reads Markdown (CLAUDE.md) and YAML (the frontmatter inside .claude/agents/*.md and .claude/skills/*/SKILL.md). It never executes the target project's code, never invokes npm / cargo / go / bundle, never parses application source files.

The only requirement is that the machine running the auditor has Python 3.10+ available. If you install via pipx, that Python lives inside the pipx-managed venv and does not interact with whatever runtime your project uses.

Concrete examples — all work without any extra setup:

my-react-app/    ← npm/Vite project; node_modules/ and .next/ are skipped
my-go-api/       ← go.mod project; vendor/ is skipped
rails-monolith/  ← Ruby on Rails; vendor/, tmp/ are not traversed
rust-cli/        ← Cargo project; target/ is skipped
iOS-app/         ← Xcode project; Pods/, Library/ are skipped

The scanner's directory skip-list covers Rust (target), Go/Ruby/PHP (vendor), JS/TS (node_modules, .next, .nuxt, .turbo, .svelte-kit), iOS (Pods), Android (.gradle, .mvn), Terraform (.terraform), IDE state (.idea, .vscode), and the usual Python caches.

Will the auditor modify any of my files?

Only if you explicitly run claude-audit fix. The default invocation (claude-audit, claude-audit --json, claude-audit --html …) is strictly read-only — the automated test test_auditor_does_not_modify_target snapshots every file's mtime and size before and after a full audit run and fails the suite if anything changes. The HTML report writer also refuses to write inside the audited target directory; you must pass an output path elsewhere.

The Phase 2 fix subcommand (opt-in, never the default) can modify files, but only after:

  1. Showing each change as a unified diff with a per-file +/- summary.
  2. Asking for explicit [y]es / [n]o / [a]ll-remaining / [q]uit approval per proposal. --apply-all batches the approval but still prints every diff first; nothing is ever applied silently.
  3. Writing a full backup with SHA-256 manifests under .claude-config-auditor/backups/<session>/, so claude-audit revert can restore the project to its pre-fix state.

If you never type the literal word fix, no file in your project is ever written to.

Why are the token counts called "estimates"?

Anthropic does not publish the Claude 3+/4 tokenizer's vocabulary, so no fully-offline tool can compute an exact count. By default the auditor uses tiktoken with the cl100k_base encoding (OpenAI's GPT-4 tokenizer) — empirically within ~5-10% of Anthropic's count for the Markdown/YAML content the tool actually scans. If tiktoken cannot be imported (a stripped-down CI image, a no-network install), the auditor falls back to a character-based heuristic at ~4.5 chars/token (tuned against five popular Claude Code frameworks in May 2026). The report explicitly names which method was used so the uncertainty is visible.

To force the heuristic even when tiktoken is installed (useful for benchmarking or for cross-machine comparisons where tiktoken versions differ), set the env var:

CLAUDE_AUDIT_TOKENIZER=heuristic claude-audit ~/my-project

If Anthropic publishes a vendored tokenizer or a count_tokens model suitable for offline use, we'll wire it in and the numbers will sharpen.

Roadmap

  • Phase 1 — shipped: read-only audit for .claude/ and CLAUDE.md.
  • Phase 2 — shipped: opt-in fix mode (annotates weak agent descriptions, archives stale CLAUDE.md sections) and revert with drift detection.
  • Phase 2.5 — shipped: opt-in --accurate flag routes token counts through Anthropic's public count_tokens endpoint. Requires ANTHROPIC_API_KEY in the environment (hard error if missing — the flag is an explicit opt-in and refuses to silently fall back). Each unique (text, model) is counted once and cached under ~/.cache/claude-config-auditor/ so repeat audits don't re-hit the API. Default tokenizer remains tiktoken cl100k_base; nothing about the offline contract changes unless the flag is passed.
  • Phase 3 — shipped: opt-in semantic agent overlap (AGT008-S). Default AGT008 stays at info (word-overlap heuristic). Installing the extras package and passing --semantic re-evaluates every Jaccard candidate pair against cosine similarity over MiniLM sentence embeddings (sentence-transformers/all-MiniLM-L6-v2, ~80MB local model, no network call at audit time after the one-off download). Pairs whose cosine ≥ 0.82 are upgraded to warning and carry both scores in the message; pairs below the threshold are dropped from the report. Default install behaviour is unchanged.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_config_auditor-0.1.0.tar.gz (102.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

claude_config_auditor-0.1.0-py3-none-any.whl (77.0 kB view details)

Uploaded Python 3

File details

Details for the file claude_config_auditor-0.1.0.tar.gz.

File metadata

  • Download URL: claude_config_auditor-0.1.0.tar.gz
  • Upload date:
  • Size: 102.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for claude_config_auditor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 74acc6eb2ce4b812d45e6e6d6d7078ebd9c0bf7d933395731bf27ab0ac844c84
MD5 51ca81feba5dd2ecf9d767e519d9926d
BLAKE2b-256 812a55c55592ab97b8c803c2decdb57890b94fe65b40be398cfad534d57b3431

See more details on using hashes here.

File details

Details for the file claude_config_auditor-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for claude_config_auditor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 53e87a690c73674d7f4ea35bf3630043144c5debe80d2b4de697cee458fea7d1
MD5 f1466b920c1b3efa53fdc9ffc2491ff7
BLAKE2b-256 5a7df0464857d100c585f56f502212645048c7f08cb9464b31e4c6472957eada

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page