Deterministic security gate + bounded AI remediation prompt generator. NIST SSDF / OWASP ASVS / CWE Top 25 anchored.
Project description
secure-code-agent
Deterministic security gate + bounded AI remediation prompts for repos with AI coding agents in the loop. Anchored to NIST SSDF · OWASP ASVS · OWASP Top 10 · MITRE CWE Top 25 · OpenSSF Scorecard · SARIF 2.1.0.
pip install secure-code-agent
secure-code-agent --fail-on-gate \
--output secure-code-report.md \
--prompt-output secure-code-remediation-prompt.md \
--sarif-output secure-code.sarif
The sibling of maintainability-agent. Same shape: deterministic CI gate · plain-file outputs · per-host skill bundle. Different concern: security, not maintainability.
Why this exists
AI coding agents ship code at human-review-saturating speed. Point them at a security finding and the documented anti-patterns are:
| Anti-pattern | What the agent actually does |
|---|---|
| Crypto roulette | "Replace MD5 with SHA-256" → rewrites the hashing module to use a library it saw in training data. |
| Auth-flow rewrite | "Fix the IDOR" → refactors the session model. Now you have an unaudited new auth path. |
| Validation softening | "Make the tests pass after the fix" → weakens the regex / removes the bounds check. |
| Test deletion | "The security test is failing" → deletes the test. |
| Lint disable | "This rule fires repeatedly" → # nosec, # noqa, eslint-disable everywhere. |
| Scope creep | "I fixed the SQLi" → followed by 600 lines of unrelated refactoring. |
| Dependency thrash | "Bumping the vulnerable package" → introduces 12 unrelated new dependencies. |
| Silent behavior change | "It works now" → same input, different output. Downstream callers break. |
Existing scanners (Semgrep, Bandit, CodeQL, Snyk, Trivy) emit findings. None of them ship a bounded prompt back to the agent that says "fix only these specific findings, do not touch crypto/auth/validation/logging, preserve behavior."
That gap is what this tool fills.
The output that matters
Every other security scanner stops at "here's a list of findings." secure-code-agent generates a remediation prompt:
# Security remediation — bounded scope
You are fixing the security findings listed in §FINDINGS below.
This is a constrained task, not a refactor.
## Hard constraints (MUST NOT violate)
1. Fix only the findings listed in §FINDINGS. Do not touch unrelated
code, files, or modules.
2. Do not change cryptographic algorithms, key derivation, IV/nonce
handling, padding modes, or random sources unless a finding in
§FINDINGS explicitly names them as the defect.
3. Do not change authentication flows, session handling, token
lifetime, cookie attributes, or authorization gates unless a
finding in §FINDINGS explicitly names them.
4. Do not weaken input validation, output encoding, sanitization,
bounds checks, regex strictness, or rate limits to make existing
tests pass.
5. Do not disable, delete, or skip security tests. Do not remove
`@_limiter.limit`, `@require_auth`, `@require_csrf`, or similar
decorators.
6. Do not silence linter warnings via `# nosec`, `# noqa`, `# type:
ignore`, `eslint-disable`, `sonar-disable`, or equivalent.
7. Do not introduce new third-party dependencies. Prefer stdlib or
already-vendored libraries.
8. Preserve behavior. Same inputs must produce the same outputs
unless a finding explicitly proves the current behavior is unsafe.
9. Add a focused test that exercises the specific security boundary
you fixed. The test must FAIL on the pre-fix code and PASS on
the post-fix code. No "TODO: add test later".
10. Keep the patch small. If you find yourself rewriting a function
rather than patching it, stop and report the structural issue.
## §FINDINGS
...
Hand the prompt to Claude Code, Codex, Cursor, Copilot, or any agent. The agent now has explicit boundaries. The full template + rationale lives in docs/remediation.md.
Standards anchored, not invented
Every finding maps to five public standards. Operators see which standard is failing, not just which scanner shouted.
| Source | What we use it for |
|---|---|
| NIST SSDF SP 800-218 | Process practice id (e.g. PW.5.1) |
| OWASP Top 10 (2021) | Risk bucket (e.g. A03:2021-Injection) |
| OWASP ASVS 5.0 | Verification requirement (e.g. V5.3) |
| MITRE CWE Top 25 (2025) | Canonical weakness id — the dedupe key |
| OpenSSF Scorecard | Repo + supply-chain hygiene |
| SARIF 2.1.0 | Output format (and external scanner ingest) |
When Semgrep, CodeQL, and Bandit fire on the same SQL-injection sink with three different rule ids, they all map to CWE-89 and the scorer counts one underlying weakness. Not three.
Architecture (orchestrator, not engine)
┌────────────────────────────────────────────────────────────────────┐
│ secure-code-agent CLI │
│ │
│ Config → Scanners (subprocess) → Findings → Scoring → Renderers │
│ │
│ ┌──────────────────┐ │
│ │ Markdown report │ │
│ │ JSON │ │
│ │ SARIF 2.1.0 │ │
│ │ PR comment │ │
│ │ Remediation 🪄 │ │
│ │ Agent standards │ │
│ └──────────────────┘ │
└────────────────────────────────────────────────────────────────────┘
│
│ Scanners (subprocess, version-isolated):
│
├── Bandit (Python SAST)
├── Semgrep (multi-language SAST + SARIF ingest)
├── pip-audit (Python SCA)
├── npm audit (Node SCA)
├── Gitleaks (secret scanning, history-aware)
├── TruffleHog (verified secret scanning)
├── Trivy (containers / IaC / k8s / vuln / secret)
├── Checkov (Terraform / CloudFormation / Helm / k8s)
├── Hadolint (Dockerfile lint)
├── OSV-Scanner (multi-ecosystem SCA via osv.dev)
├── OpenSSF Scorecard (repo hygiene + supply chain)
├── eslint-plugin-security (JS/TS SAST) [v0.3]
├── CodeQL SARIF (ingest GitHub-hosted analysis)
└── Built-in regex rules (high-confidence, low-FP)
We don't reimplement SAST. We invoke best-in-class scanners as subprocesses, parse their canonical output, normalize across CWE/OWASP/ASVS/SSDF, and produce one ranked view.
Full architecture in docs/design.md.
Audit categories (9 buckets, 1 grade)
Findings roll up to nine canonical categories. The grade is driven by the worst category — one CRITICAL secret in git history shouldn't be offset by a clean dependency tree.
| Category | Examples |
|---|---|
secrets |
Hardcoded API keys, tokens in history, .env committed |
dependencies |
CVE in pinned dep, yanked package, abandoned upstream |
code_vulnerabilities |
SQLi, XSS, command-injection, path-traversal, SSRF, XXE, deserialization |
auth_authz |
Missing auth gate, IDOR, broken access control, JWT misuse |
crypto |
Weak alg, hardcoded IV, ECB, MD5/SHA-1 for security, missing constant-time |
supply_chain |
Unpinned action, missing SBOM, no signed releases, low Scorecard |
config_iac |
World-readable S3, public security group, Dockerfile USER root, k8s privileged |
logging_observability |
Secrets in logs, PII in URLs, missing audit trail on auth events |
policy_docs |
Missing SECURITY.md, no responsible-disclosure path, no threat model |
Scoring math + worked examples in docs/scoring.md.
Hard gates
{
"gates": {
"fail_on_severity": ["critical", "high"],
"fail_on_category": ["secrets", "auth_authz"],
"fail_on_new": true,
"min_score": 4.0,
"require_scanners": ["bandit", "gitleaks"],
"max_unsuppressed": { "critical": 0, "high": 0, "medium": 10 }
}
}
Any tripped gate is a nonzero exit. Compose freely.
Suppressions you can't game
.scignore.yaml — every suppression requires a reason AND an expires date (max 365 days). Past-expiry suppressions become CRITICAL findings on their own. You can't ship reason: "we'll fix it later" forever.
- file: services/legacy_billing.py
rule_id: "*"
reason: "Slated for rewrite Q3 2026 — gated by initiative INV-44."
expires: "2026-09-30"
- rule_id: "B101"
paths: ["tests/"]
reason: "assert statements are legitimate in test code."
expires: "2027-05-13"
Wildcard rule (rule_id: "*") requires a file or paths scope — you cannot disable a rule globally.
Baseline + incremental adoption
secure-code-baseline.json fingerprints every current finding. On the next run:
- Findings present in baseline → acknowledged; don't trip
fail_on_new. - Findings missing from baseline → new; trip the gate.
Bumping a CRITICAL or HIGH finding into the baseline requires --bump-baseline --i-acknowledge-risk. The bump records the operator's git user.email per fingerprint so PR review can see who acknowledged what.
This lets legacy repos adopt the gate without a 200-finding day-one cleanup.
Quickstart
# Install
pip install secure-code-agent
# Initialize agent standards files for your AI coding tools
secure-code-agent --init-agent-standards \
--target codex --target claude-code --target cursor --target copilot
# Run an audit with hard-gate exit
secure-code-agent --config secure-code-agent.json \
--fail-on-gate \
--output secure-code-report.md \
--json-output secure-code-report.json \
--sarif-output secure-code.sarif \
--comment-output secure-code-pr-comment.md \
--prompt-output secure-code-remediation-prompt.md
# Audit only changed files since main
secure-code-agent --changed-only main...HEAD --fail-on-new
# Ingest external scanner SARIF (CodeQL, Snyk, Trivy, etc.)
secure-code-agent --sarif-import codeql-results.sarif \
--sarif-import snyk-results.sarif
Invokable skill / slash command
For agents that support invokable skills, this repo ships a portable skill under skills/secure-code-agent/. The SKILL.md body is the source of truth; per-host adapters live under agents/ and copilot/.
| Host | Install destination | Invocation |
|---|---|---|
| Codex / OpenAI | wired via skills/secure-code-agent/agents/openai.yaml |
per Codex's skills convention |
| Claude Code | cp -r skills/secure-code-agent ~/.claude/skills/ |
/secure-code-agent |
| GitHub Copilot (VS Code) | cp skills/secure-code-agent/copilot/secure-code-agent.prompt.md .github/prompts/ |
/secure-code-agent in Copilot Chat |
GitHub Action
- uses: marshallguillory86/secure-code-agent@v0.1.0
with:
config: secure-code-agent.json
changed-only: main...HEAD
fail-on-gate: true
The action uploads SARIF to GitHub Code Scanning by default. See action.yml and examples/github-actions/ for full workflows.
What this is NOT
- ❌ Not a SAST engine. We delegate to Semgrep / Bandit / CodeQL / etc. — we don't write yet another AST analyzer.
- ❌ Not a runtime defense. No WAF, no IDS, no agent in the request path. Static + supply-chain + config only.
- ❌ Not a SaaS. Findings live as files in your repo. No telemetry. No version-check ping.
- ❌ Not a license scanner. Pair with
pip-licenses/license-checkerseparately. - ❌ Not an exploit generator. No DAST, no fuzzing.
Design principles
- Deterministic first, AI optional. The audit never calls an LLM by default. The remediation prompt is a generated artifact you choose to hand to an agent.
- Bounded scope. The remediation prompt explicitly forbids touching crypto, auth, validation, logging, and tests.
- Standards-anchored. Five public standards (NIST / OWASP-x3 / CWE) — no invented taxonomy.
- CWE-deduped scoring. One underlying weakness = one finding, regardless of how many scanners found it.
- No vendor lock-in. Markdown, JSON, SARIF, plain files. Pipe anywhere.
- CI-first, local-first. Same binary in pre-commit, local CI, GitHub Actions, GitLab, Buildkite.
Full design philosophy in docs/design.md.
Documentation
docs/design.md— Architecture + non-goals + scanner protocoldocs/standards.md— NIST SSDF / OWASP / CWE / Scorecard / SARIF citationsdocs/scoring.md— Weighting model + worked examplesdocs/scanners.md— Per-scanner integrations + caveatsdocs/remediation.md— The prompt template + failure-mode rationaledocs/threat-model.md— What we defend against (and what we don't)
Versioning
- Semver. v0.x is pre-1.0 — the config schema may evolve. v1.0 locks it.
- SARIF 2.1.0 output is pinned and validated against the OASIS schema in CI.
Get in touch
- Bug reports / feature requests — GitHub Issues
- Security vulnerabilities in this tool — see
SECURITY.md
License
MIT — see LICENSE.
Built by Marshall Guillory. The companion to maintainability-agent — both tools encode a single thesis: AI agents need deterministic boundaries, not best-effort guardrails.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file secure_code_agent-0.2.0.tar.gz.
File metadata
- Download URL: secure_code_agent-0.2.0.tar.gz
- Upload date:
- Size: 50.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
691e91c1e55ff9d693d7f8dba972cca91f0cddbd280473f2da705732dab5302d
|
|
| MD5 |
265a2cb9c88ef2d8f2fc0fbcbeff43ff
|
|
| BLAKE2b-256 |
76626f78310d15bc74f90c64bba178dd0dafe0b38f254c081949458f15c26e33
|
File details
Details for the file secure_code_agent-0.2.0-py3-none-any.whl.
File metadata
- Download URL: secure_code_agent-0.2.0-py3-none-any.whl
- Upload date:
- Size: 66.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e00531b8535ad3bf8094d7dc87aaf790b2f7c0ea13832ee8c7ffd9e47413a231
|
|
| MD5 |
b85bdfc8656b8c3caf551590610d06be
|
|
| BLAKE2b-256 |
f8298dfc35be9e03ad2ef00dbcda2a17bc458fb1e8cb39cdd80893a41370a52a
|