A read-only linter and A-F maturity grader for coding-agent harnesses (Claude Code, Codex).
Project description
Harness Scorecard
A read-only linter and A–F maturity grader for coding-agent harnesses. Point it at a
Claude Code or Codex setup — Claude Code's hooks, permissions, rules/*.md, agents, and
CLAUDE.md, or Codex's config.toml (sandbox, approval policy, trust levels), hooks.json,
and AGENTS.md — and it returns a graded scorecard: the overall maturity grade, the specific
gaps, and the guards that are missing, each with rationale. The harness type is auto-detected.
"Harness engineering" became a named discipline in 2026 and everyone is assembling harnesses with no way to tell if theirs is any good. The rubric is the product: every check traces to a documented red-team failure mode, not generic advice.
What it looks like
$ harness-scorecard scan examples/sample-harness
Harness Scorecard v1.0.0
Target: examples/sample-harness (claude-code)
GRADE: F overall 0.28 / 1.00
Scored 10 of 10 rubric dimensions (0 specced, pending).
Capability gates tripped (grade capped):
- HS-D5-01 caps at C (Harness config write/read protected)
D1 Secret protection & credential isolation 0.44 [weight 5]
[PASS] HS-D1-01 Sensitive credential paths denied for read [GATE->D]
All core credential paths are denied for read.
- covered: ~/.ssh, ~/.aws, ~/.gnupg, 1Password/op, gcloud, .env files
[FAIL] HS-D1-02 Sensitive-read Bash backstop
No Bash-level backstop for sensitive reads; deny lists cover only the Read tool.
fix: Add a PreToolUse Bash hook that re-blocks reads of sensitive files.
… (+4 more checks)
D4 Destructive-action & git safety 0.63 [weight 5]
[PASS] HS-D4-01 Push to protected branch effectively blocked [GATE->C]
Push to a protected branch is blocked by the effective floor.
- hook:git-safety
- permissions.deny
[PASS] HS-D4-02 Catastrophic deletion blocked
Catastrophic deletion is blocked by the effective floor.
- hook:block-dangerous-cmds
- hook:dangerous
[FAIL] HS-D4-03 Destructive DB ops on non-local hosts blocked
No effective guard against destructive DB operations on non-local hosts.
- defaultMode=bypassPermissions: autoMode.hard_deny is INERT
fix: Add a PreToolUse Bash db-guard hook that blocks destructive ops on non-local hosts.
… (+2 more checks)
… (+8 more dimensions)
That one line — defaultMode=bypassPermissions: autoMode.hard_deny is INERT — is the whole
thesis rendered live: a rich hard_deny block earns nothing because the mode makes it
inert. The sample above (examples/sample-harness) is
deliberately incomplete to show the findings; run it yourself, or point the tool at your own
~/.claude — a mature harness scores an A.
What makes the grade real
Most config "linters" credit a harness for declaring a rule. This one models the effective enforcement floor. The headline example:
autoMode.hard_denyis inert whenpermissions.defaultMode == "bypassPermissions".
A naive scorer reads a rich hard_deny block and awards an A. Harness Scorecard reads the
mode, discounts the inert block, and grades against what actually fires — permissions.deny
globs plus the PreToolUse hooks. See docs/rubric.md for the full model,
including capability gates that cap the grade when a critical hole is present (you can't
score an A with readable credentials, no matter how many cheap checks pass).
It's honest about its own limits, too. A harness that funnels every guard through one opaque dispatcher script hides its logic from static analysis, so the named-guard checks under-credit it. Rather than silently mark it down, the report emits a caveat — "a low score here may be a static-analysis limit, not a missing guard" — so the grade is never misread as "insecure."
Usage
# Grade a harness directory (e.g. your ~/.claude)
harness-scorecard scan ~/.claude
# JSON for tooling, plus a self-contained HTML scorecard
harness-scorecard scan ~/.claude --format json --html scorecard.html
# SARIF 2.1.0 for CI / GitHub code scanning, failing the run below grade C
harness-scorecard scan ~/.claude --sarif harness.sarif --min-grade C
--min-grade {A,B,C,D,F} sets the bar (default B). Exit codes: 0 meets the bar ·
1 below the bar · 2 no harness found.
Track drift over time
diff compares two scorecards and reports what changed — which checks flipped, which
dimension scores moved, and whether a capability gate newly trips. Each argument is either a
live harness directory or a saved JSON report (scan --json), so the same command covers a CI
regression gate, a before/after audit, or drift between two snapshots:
# Record a baseline, then later fail if the harness grade regresses below it
harness-scorecard scan ~/.claude --json baseline.json
harness-scorecard diff baseline.json ~/.claude # exit 1 if the grade dropped
# Compare two saved snapshots, machine-readable
harness-scorecard diff old.json new.json --format json
Exit codes: 0 no regression (same or better grade) · 1 grade regressed · 2 invalid input.
Gate and dimension moves are reported for context; the letter grade is what fails the gate.
GitHub Action
Grade your harness in CI and upload the findings to code scanning:
- uses: saagpatel/harness-scorecard@v1
with:
path: .claude
min-grade: B
The action writes SARIF and uploads it (requires security-events: write) even when the grade
fails the build, so findings always reach code scanning. Commit a baseline.json and pass
baseline: to also fail the job on any grade regression — a PR that weakens the harness can't
merge:
- uses: saagpatel/harness-scorecard@v1
with:
path: .claude
baseline: .github/harness-baseline.json # fail if the grade drops below this
A complete workflow — permissions, weekly scheduling, SARIF upload — is in
examples/github-workflow.yml.
Guarantees
- Read-only. It never writes to the harness it audits.
- Privacy-preserving. All output redacts secrets, tokens, emails, and absolute home paths. Nothing leaves the machine.
- Dependency-free runtime. The scorer ships stdlib-only — a tool that grades supply-chain hygiene should carry the smallest surface itself.
Scope (v1)
Implements all ten rubric dimensions end-to-end for both Claude Code and Codex: secret
protection, egress/exfiltration control, tool-surface & inbound-injection defense,
destructive-action & git safety, harness self-protection & integrity, verification gates,
subagent isolation & governance, recovery/rollback safety, memory/provenance hygiene, and
observability/audit trail (the critical gated trio is D1/D4/D5). Each harness has its own
adapter and check suite over the shared scoring engine; the bypass-aware effective floor maps
to Codex's sandbox_mode = "danger-full-access" + approval_policy = "never" just as it does
to Claude Code's bypassPermissions. The rubric is versioned and emitted in every report.
Development
uv sync --frozen # install dev tooling from the lockfile
uv run --no-sync python -m unittest discover -s tests # tests (stdlib runner, zero extra deps)
uv run --no-sync ruff check src/ tests/ # lint
uv run --no-sync ty check src/ # type check
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file harness_scorecard-1.1.0.tar.gz.
File metadata
- Download URL: harness_scorecard-1.1.0.tar.gz
- Upload date:
- Size: 45.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.13 {"installer":{"name":"uv","version":"0.11.13","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b5009212f6583824042f5c42fa90b4d26d0e9e9db3792d2b1f78c54da37e392
|
|
| MD5 |
2736c99b69643e37d14dbf06f6cc073d
|
|
| BLAKE2b-256 |
ca70b785c712e1f7f24bb78d0235c13edfa74eaccdd1748c4d15d87a27f7eaf3
|
File details
Details for the file harness_scorecard-1.1.0-py3-none-any.whl.
File metadata
- Download URL: harness_scorecard-1.1.0-py3-none-any.whl
- Upload date:
- Size: 65.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.13 {"installer":{"name":"uv","version":"0.11.13","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
714f4ff781199e72ac066e5c13fff4b436e40f84a13b8b340090065fe101e74b
|
|
| MD5 |
f942647b9c63af6feaff4d111c9f71c1
|
|
| BLAKE2b-256 |
9ca3c016c8cc31c59e0f368e96685e74caf50be4fc8679740be8d717191169b3
|