AI-assisted code sanitization scanner with OWASP ASVS, NIST 800-53, and ASD STIG compliance mapping.
Project description
Sanicode
Sanicode scans code in 23 languages for input validation and sanitization gaps using field-sensitive taint analysis and a data flow knowledge graph backed by treeloom's Code Property Graph, then maps every finding to OWASP ASVS 5.0, NIST 800-53, ASD STIG v4r11, PCI DSS 4.0, FedRAMP, and CMMC 2.0. It also scans lockfiles for third-party dependency vulnerabilities via the OSV database and can generate CycloneDX 1.5 SBOMs. Output formats include SARIF (for GitHub Code Scanning), JSON, Markdown, and an HTML dashboard with an interactive knowledge graph.
Unlike pattern-only tools like Bandit or Semgrep, sanicode traces tainted data from source to sink across function boundaries with field-level precision — request.args and request.form["name"] are tracked as distinct taint keys, not flattened to request. Findings carry context about how untrusted input reaches a dangerous call and whether sanitization exists along the path.
Install
pip install sanicode
Requires Python 3.10+.
For a guided walkthrough with a sample vulnerable application, see the Getting Started Guide.
Quick start
Scan a codebase and generate a Markdown report:
sanicode scan .
Generate SARIF output for CI integration:
sanicode scan . -f sarif
Generate an HTML dashboard with an interactive knowledge graph:
sanicode scan . -f html
Generate a DISA STIG Viewer checklist for ATO packages:
sanicode scan . -f stig-checklist
Fail the build if high-severity findings exist:
sanicode scan . --fail-on high
Scan dependencies for known vulnerabilities:
sanicode deps .
Generate a CycloneDX SBOM alongside scan results:
sanicode scan . --sbom sbom.json
Reports are written to sanicode-reports/ by default.
CI/CD integration
GitHub Action
- uses: rdwj/sanicode@v0
with:
path: .
fail-on: high
format: sarif
Pre-commit hook
# .pre-commit-config.yaml
repos:
- repo: https://github.com/rdwj/sanicode
rev: v0.12.3
hooks:
- id: sanicode
See CI/CD Integration for GitLab CI, Jenkins, Azure DevOps, and Tekton/OpenShift Pipelines.
API server
Start the FastAPI server for remote or hybrid scan mode:
sanicode serve
This starts on port 8080 with Prometheus metrics at /metrics.
Endpoints
POST /api/v1/scan Submit a scan (async)
GET /api/v1/scan/{id} Poll scan status
GET /api/v1/scan/{id}/findings Retrieve findings (JSON or ?format=sarif)
GET /api/v1/scan/{id}/graph Retrieve knowledge graph
POST /api/v1/analyze Instant snippet analysis
GET /api/v1/compliance/map Compliance framework lookup
GET /api/v1/health Liveness check
GET /metrics Prometheus metrics
CLI commands
sanicode scan . # Scan codebase, generate reports
sanicode scan . -f sarif # SARIF output
sanicode scan . -f json -f sarif # Multiple formats
sanicode scan . -f html # HTML dashboard with interactive graph
sanicode scan . --fail-on high # Exit non-zero on high+ findings
sanicode serve # Start API server on :8080
sanicode report sanicode-reports/scan-result_*.json # Re-generate from saved results
sanicode report scan-result.json -s high # Filter by severity
sanicode report scan-result.json --cwe 89 # Filter by CWE
sanicode config setup # Interactive provider configuration wizard
sanicode config set llm.fast.model granite-nano # Script-friendly config
sanicode config set scan.include_extensions .py,.java # Restrict to specific languages
sanicode config test # Test configured LLM tiers
sanicode config --show # Show resolved configuration
sanicode config --init # Create starter sanicode.toml
sanicode graph . --export graph.json # Export knowledge graph
sanicode graph . --visualize graph.html # Standalone graph visualization
sanicode rules --list # List all detection rules
sanicode validate-rules custom.yaml # Validate custom rule YAML syntax
sanicode test-rules custom.yaml --fixture f.py # Test custom rules against a fixture
sanicode benchmark # Benchmark against Bandit and Semgrep
sanicode scan . -f stig-checklist # STIG Viewer checklist (.ckl) + summary
sanicode scan . -f poam # POA&M entries (CSV + JSON + summary)
sanicode report scan-result.json -f stig-checklist # STIG checklist from saved results
sanicode report scan-result.json -f poam # POA&M from saved results
sanicode enrich bandit.sarif semgrep.sarif # Enrich third-party SARIF with compliance
sanicode enrich *.sarif --merge -o merged.sarif # Merge and enrich multiple SARIF files
sanicode validate-llm # Benchmark LLM pipeline quality (precision/recall/F1 deltas)
sanicode deps . # Scan lockfiles for dependency vulnerabilities
sanicode deps . --format json # JSON output for CI pipelines
sanicode deps . --sbom sbom.json # Generate CycloneDX SBOM
sanicode scan . --no-deps # Skip dependency scanning
sanicode scan . --sbom sbom.json # Include SBOM with scan
sanicode scan . --offline # Skip OSV queries (air-gapped mode)
sanicode scan . --no-llm # Deterministic-only (skip all LLM stages)
sanicode scan . --cwe 89 # Filter findings to specific CWE IDs
sanicode scan . -f json -o result.json # Write JSON report to a single file
sanicode scan . -o results/before/ # Write reports to specific directory
sanicode diff before.json after.json # Compare two scan results (severity-weighted)
sanicode diff before.json after.json -f json # Machine-readable diff for CI pipelines
sanicode score /path/to/repos # Benchmark against unsanitary-code-examples
sanicode fips-check # Validate FIPS 140-2/140-3 compliance
sanicode baseline create # Snapshot current findings as accepted baseline
sanicode baseline audit .sanicode-baseline.json # Inspect baseline contents
sanicode config get llm.fast.model # Read a single config value
Detection rules
705 built-in rules across 23 languages, covering 109 CWEs including 100% of the MITRE Top 25.
Languages: Python, JavaScript/TypeScript, Go, Java, C, C++, C#, Ruby, PHP, Rust, Kotlin, Scala, Bash, SQL, Perl, Lua, MATLAB, R, F#, Julia, Fortran, COBOL, HTML/Jinja2 templates.
Categories include SQL injection, OS command injection, XSS (including Jinja2 template XSS), deserialization, path traversal, SSRF, weak cryptography, hardcoded credentials, insecure random, argument injection, CRLF/header injection, XPath/LDAP/XML injection, template injection, ReDoS, XXE, mass assignment, session and cookie security, sensitive data storage, auth/authz gaps, TLS bypass, memory safety (C/C++), and many more.
For the full live inventory, see the coverage scorecard. Custom YAML rules extend this set — place rule files in rules/ in your project root or ~/.config/sanicode/rules/ and validate with sanicode validate-rules.
Custom rules
id: CUSTOM001
cwe_id: 78
severity: high
pattern:
targets: [python]
ast_pattern: "call:subprocess.run"
args:
shell: "True"
Rule files are discovered from rules/ in the project root and ~/.config/sanicode/rules/. Run sanicode rules --validate custom.yaml to check syntax before deploying.
Taint analysis
Sanicode performs field-sensitive, dataflow-aware taint tracking at two levels:
- Intra-procedural: reaching-definitions analysis within each function body, with field-level precision. Attribute chains like
request.args.get("id")are tracked as dotted taint keys, not flattened to individual identifiers. Prefix matching ensures that taintingrequestimplicitly taintsrequest.args, but tainting onlyrequest.argsdoes not falsely taint unrelated attributes. - Inter-procedural: function summaries propagated across the call graph.
Taint paths produce high-confidence edges in the knowledge graph, giving the LLM (and human reviewers) evidence of whether untrusted data actually reaches a sink. Each finding carries a node_ref field identifying its CPG node ID (e.g. "sink_3"), allowing downstream tools to traverse the graph from any finding without resolving file/line positions themselves.
Dependency scanning
Sanicode discovers lockfiles (requirements.txt, package-lock.json, composer.lock) and queries the OSV database for known vulnerabilities. Findings are mapped to CWE-1395 (Dependency on Vulnerable Third-Party Component) with compliance cross-references to NIST SI-2/RA-5, PCI DSS 6.3.2, and FedRAMP baselines. CycloneDX 1.5 SBOMs can be generated alongside scan results.
Dependency scanning runs automatically during sanicode scan and can be used standalone via sanicode deps. Use --offline for air-gapped environments or --no-deps to skip it entirely.
Compliance frameworks
Findings map to six frameworks, covering 109 CWEs:
- OWASP ASVS 5.0 — V1: Encoding and Sanitization requirements (L1/L2/L3)
- NIST 800-53 — SI-10 (Information Input Validation), SI-15 (Information Output Filtering), and related controls
- ASD STIG v4r11 — APSC-DV-002510 (CAT I), APSC-DV-002520 (CAT II), APSC-DV-002530 (CAT II), and related checks. Use
--format stig-checklistto output a DISA STIG Viewer.cklfile with findings mapped directly to ASD STIG v4r11 checklist items, suitable for submission to STIG assessors. - PCI DSS 4.0 — Requirement 6 (Develop and Maintain Secure Systems and Software)
- FedRAMP — Baselines (Low, Moderate, High) derived from NIST 800-53 control selection. Findings indicate which FedRAMP authorization baselines are affected.
- CMMC 2.0 — Cybersecurity Maturity Model Certification practices (Level 2+) mapped from NIST 800-53 controls. Useful for DoD supply chain compliance assessments.
Configuration
Create a config file:
sanicode config --init
This writes a sanicode.toml in the current directory. Config is loaded from (in order):
--configflagsanicode.tomlin the current directory~/.config/sanicode/config.toml
Sanicode works fully without any configuration. LLM tiers are optional — without them, the tool runs in degraded mode using AST pattern matching, taint analysis, knowledge graph construction, and compliance lookups. LLM integration adds context-aware reasoning on top of these. Use --no-llm to force deterministic-only mode even when LLM tiers are configured — useful for CI pipelines where speed matters more than LLM-assisted reasoning.
LLM integration (optional)
Preset-based pipeline (recommended)
The simplest way to enable LLM analysis is a single preset. Each preset selects a model, provider, and analysis strategy tuned for that model tier:
[llm]
preset = "local-medium"
| Preset | Model | Strategy | F1 Score | Requirements |
|---|---|---|---|---|
cloud-haiku |
Claude Haiku 4.5 | augment | 1.000 | ANTHROPIC_API_KEY |
local-large |
gpt-oss:20b | augment | 0.970 | 13 GB RAM, Ollama |
local-medium |
granite3.3:8b | augment | 0.930 | 5 GB RAM, Ollama |
local-small |
mistral-nemo | review | 0.896 | 7 GB RAM, Ollama |
Two strategies are supported. augment: the LLM analyzes code independently using CPG context, and its findings are merged with deterministic results. review: the LLM reviews deterministic findings with CPG context — better suited to mid-tier models that benefit from scaffolding. When the two perspectives disagree, a minority report is attached to the finding so both views are preserved.
Strategy guidance: Strong models perform best with augment (independent reasoning with CPG context). Mid-tier models perform best with review (reviewing deterministic findings with CPG context). Models below ~7B parameters are not recommended — accuracy drops significantly. When adding a custom model, start with augment if the model is known for strong reasoning, or review otherwise.
Legacy tiers
The three-tier system (fast / analysis / reasoning) is still supported for backward compatibility and gives fine-grained control over which model handles classification, data flow reasoning, and compliance mapping. See the model sizing guide for details.
Supported providers for both approaches: Anthropic, OpenAI, Google, Azure, vLLM, Ollama, OpenShift AI. Run sanicode config setup for an interactive wizard.
Current status
v0.12.3: Fix -o flag file output — -o result.json with -f json or -f sarif now writes a single file instead of creating a directory (#242). Ruff format enforced in CI.
v0.12.2: Java scanning performance — reduced taint sources and deduplicated entry points inside annotated handlers, bringing a 15-file Java scan from 3.24s to 0.30s (10.7x faster) with near-linear scaling. Requires treeloom >=0.8.1 for batch taint convergence and graph index structures (#240, #241).
v0.12.1 fixes: --cwe filter now correctly applies to all output formats — previously accepted but had no effect on reports (#236). Default include_extensions changed from Python-only to all 23 supported languages — Java, Go, JavaScript, and other rules now fire without manual TOML editing (#237). config set scan.include_extensions and scan.exclude_patterns are now scriptable. LLM JSON parsing is more robust with trailing-comma repair and prose extraction (#239). Java scanning performance improved via tree-sitter query caching and iterative AST traversal (#238).
v0.12.0 highlights: Pass 3 tiebreaker for the augment/review LLM pipeline — when deterministic and LLM disagree, a third call produces a 2-1 majority decision (#224). sanicode diff command with severity-weighted scan comparison (#213, #218). HTML/Jinja2 template scanning with SC088/SC089 XSS detection rules (#212, #211). --output-dir, --cwe filter, and timestamped output filenames (#208, #215, #217). Static remediation templates on 8 high-impact rules (#214). CPG integration test harness with fixture contracts (#235). Typer dependency pinned to >=0.15.4 to fix CLI help generation.
v0.11.0 highlights: --no-llm flag for deterministic-only scans, node_ref CPG node mapping, CWE-913 enrichment fix, Apache-2.0 relicense, contributor artifacts, docs reorganization.
v0.10.1 — Documentation accuracy patch. Fixed stale language coverage claims in REFERENCE.md, updated llms-full.txt, regenerated coverage scorecard.
v0.10.0 highlights: CPG-backed knowledge graph via treeloom with cross-function data flow edges, augment/review LLM pipeline validated across 13 models, 703 detection rules across 22 languages (109 CWEs, 100% MITRE Top 25), CI-friendly stdout/stderr separation, FIPS compliance support, air-gapped deployment architecture, rule authoring SDK, and litellm security pin.
Plus everything from earlier releases: field-sensitive taint analysis, SBOM-aware dependency scanning via OSV, CycloneDX 1.5 SBOM generation, FedRAMP/CMMC 2.0 mappings, SARIF enrichment, POA&M generation, STIG checklist output, inter-procedural taint analysis, Grafana dashboards, MLflow integration, and CI/CD integration.
Documentation
Full reference documentation lives in docs/, covering CI/CD integration, air-gapped deployment, FIPS compliance, SARIF output, model sizing, and more.
Other directories: planning/ for in-flight designs, research/ for benchmarks and investigations, demos/ for CI example configs.
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sanicode-0.12.3.tar.gz.
File metadata
- Download URL: sanicode-0.12.3.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59fc2dfc84a90d87d33b07ff5cc34196bb66b47af9b06b1e2bb6e90faded48ab
|
|
| MD5 |
d37edb021840ad5a7ae2e0c86fd8a027
|
|
| BLAKE2b-256 |
5d318c22e2125d08531f21c467ca162a07b71ce0a63b096e58b2068e8bcca966
|
Provenance
The following attestation bundles were made for sanicode-0.12.3.tar.gz:
Publisher:
release.yml on rdwj/sanicode
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sanicode-0.12.3.tar.gz -
Subject digest:
59fc2dfc84a90d87d33b07ff5cc34196bb66b47af9b06b1e2bb6e90faded48ab - Sigstore transparency entry: 1328213292
- Sigstore integration time:
-
Permalink:
rdwj/sanicode@f425bbdd38deee27606b450f5ebfa9bdfb93b34f -
Branch / Tag:
refs/tags/v0.12.3 - Owner: https://github.com/rdwj
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f425bbdd38deee27606b450f5ebfa9bdfb93b34f -
Trigger Event:
push
-
Statement type:
File details
Details for the file sanicode-0.12.3-py3-none-any.whl.
File metadata
- Download URL: sanicode-0.12.3-py3-none-any.whl
- Upload date:
- Size: 1.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
492a8b2f8d0f9e068fe63533317b97da2fdb5341e26a1e62004216ac95f9a66e
|
|
| MD5 |
dd48e6c88f768dbbc2449034a0ef2bbc
|
|
| BLAKE2b-256 |
c604676d2d7e31e24bc91cbfe608401368c066ef037f787e752708b420b078b0
|
Provenance
The following attestation bundles were made for sanicode-0.12.3-py3-none-any.whl:
Publisher:
release.yml on rdwj/sanicode
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sanicode-0.12.3-py3-none-any.whl -
Subject digest:
492a8b2f8d0f9e068fe63533317b97da2fdb5341e26a1e62004216ac95f9a66e - Sigstore transparency entry: 1328213298
- Sigstore integration time:
-
Permalink:
rdwj/sanicode@f425bbdd38deee27606b450f5ebfa9bdfb93b34f -
Branch / Tag:
refs/tags/v0.12.3 - Owner: https://github.com/rdwj
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f425bbdd38deee27606b450f5ebfa9bdfb93b34f -
Trigger Event:
push
-
Statement type: