Skip to main content

AI-assisted code sanitization scanner with OWASP ASVS, NIST 800-53, and ASD STIG compliance mapping.

Reason this release was yanked:

temporarily private

Project description

Sanicode

Sanicode scans Python, JavaScript/TypeScript, and PHP codebases for input validation and sanitization gaps using taint analysis and a data flow knowledge graph, then maps every finding to OWASP ASVS 5.0, NIST 800-53, ASD STIG v4r11, and PCI DSS 4.0. Output formats include SARIF (for GitHub Code Scanning), JSON, Markdown, and an HTML dashboard with an interactive knowledge graph.

Unlike pattern-only tools like Bandit or Semgrep, sanicode traces tainted data from source to sink across function boundaries, so findings carry context about how untrusted input reaches a dangerous call and whether sanitization exists along the path.

Install

pip install sanicode

Requires Python 3.10+.

Quick start

Scan a codebase and generate a Markdown report:

sanicode scan .

Generate SARIF output for CI integration:

sanicode scan . -f sarif

Generate an HTML dashboard with an interactive knowledge graph:

sanicode scan . -f html

Fail the build if high-severity findings exist:

sanicode scan . --fail-on high

Reports are written to sanicode-reports/ by default.

CI/CD integration

GitHub Action

- uses: rdwj/sanicode@v0
  with:
    path: .
    fail-on: high
    format: sarif

Pre-commit hook

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/rdwj/sanicode
    rev: v0.3.1
    hooks:
      - id: sanicode

See docs/ci-cd-integration.md for GitLab CI, Jenkins, Azure DevOps, and Tekton/OpenShift Pipelines.

API server

Start the FastAPI server for remote or hybrid scan mode:

sanicode serve

This starts on port 8080 with Prometheus metrics at /metrics.

Endpoints

POST /api/v1/scan              Submit a scan (async)
GET  /api/v1/scan/{id}         Poll scan status
GET  /api/v1/scan/{id}/findings   Retrieve findings (JSON or ?format=sarif)
GET  /api/v1/scan/{id}/graph      Retrieve knowledge graph
POST /api/v1/analyze           Instant snippet analysis
GET  /api/v1/compliance/map    Compliance framework lookup
GET  /api/v1/health            Liveness check
GET  /metrics                  Prometheus metrics

CLI commands

sanicode scan .                              # Scan codebase, generate reports
sanicode scan . -f sarif                     # SARIF output
sanicode scan . -f json -f sarif             # Multiple formats
sanicode scan . -f html                      # HTML dashboard with interactive graph
sanicode scan . --fail-on high               # Exit non-zero on high+ findings
sanicode serve                               # Start API server on :8080
sanicode report scan-result.json             # Re-generate reports from saved results
sanicode report scan-result.json -s high     # Filter by severity
sanicode report scan-result.json --cwe 89    # Filter by CWE
sanicode config setup                        # Interactive provider configuration wizard
sanicode config set llm.fast.model granite-nano  # Script-friendly config
sanicode config test                         # Test configured LLM tiers
sanicode config --show                       # Show resolved configuration
sanicode config --init                       # Create starter sanicode.toml
sanicode graph . --export graph.json         # Export knowledge graph
sanicode graph . --visualize graph.html      # Standalone graph visualization
sanicode rules --list                        # List all detection rules
sanicode rules --validate custom.yaml        # Validate custom rule file
sanicode benchmark                           # Benchmark against Bandit and Semgrep

Detection rules

21 built-in rules across three languages:

Python (10 rules, SC001–SC010): path traversal, OS command injection, XSS, SQL injection, code injection, weak cryptography, insecure random, deserialization, hardcoded credentials, SSRF.

JavaScript/TypeScript (6 rules, SC200–SC205): path traversal, OS command injection, XSS, weak cryptography, insecure random, hardcoded credentials.

PHP (5 rules, SC100–SC104): OS command injection, XSS, SQL injection, deserialization, hardcoded credentials.

Custom YAML rules extend this set. Place rule files in rules/ in your project root or ~/.config/sanicode/rules/, and validate with sanicode rules --validate.

Custom rules

id: CUSTOM001
cwe_id: 78
severity: high
pattern:
  targets: [python]
  ast_pattern: "call:subprocess.run"
  args:
    shell: "True"

Rule files are discovered from rules/ in the project root and ~/.config/sanicode/rules/. Run sanicode rules --validate custom.yaml to check syntax before deploying.

Taint analysis

Sanicode performs dataflow-aware taint tracking at two levels:

  • Intra-procedural: reaching-definitions analysis within each function body.
  • Inter-procedural: function summaries propagated across the call graph.

Taint paths produce high-confidence edges in the knowledge graph, giving the LLM (and human reviewers) evidence of whether untrusted data actually reaches a sink.

Compliance frameworks

Findings map to four frameworks, covering 54 CWEs:

  • OWASP ASVS 5.0 — V1: Encoding and Sanitization requirements (L1/L2/L3)
  • NIST 800-53 — SI-10 (Information Input Validation), SI-15 (Information Output Filtering), and related controls
  • ASD STIG v4r11 — APSC-DV-002510 (CAT I), APSC-DV-002520 (CAT II), APSC-DV-002530 (CAT II), and related checks
  • PCI DSS 4.0 — Requirement 6 (Develop and Maintain Secure Systems and Software)

Configuration

Create a config file:

sanicode config --init

This writes a sanicode.toml in the current directory. Config is loaded from (in order):

  1. --config flag
  2. sanicode.toml in the current directory
  3. ~/.config/sanicode/config.toml

Sanicode works fully without any configuration. LLM tiers are optional — without them, the tool runs in degraded mode using AST pattern matching, taint analysis, knowledge graph construction, and compliance lookups. LLM integration adds context-aware reasoning on top of these.

LLM tiers (optional)

The config supports three tiers for different task complexities. Supported providers include cloud APIs (Anthropic, OpenAI, Google, Azure) and self-hosted inference (vLLM, Ollama, OpenShift AI). Run sanicode config setup for an interactive wizard that walks through provider selection and endpoint configuration.

Tier Purpose Recommended model
fast Classification, severity scoring Granite Nano, Mistral 7B
analysis Data flow context, taint reasoning Granite Code 8B
reasoning Compliance mapping, graph exploitability Llama 3.1 70B

Current status

v0.3.0 — Multi-language scanning (Python, JavaScript/TypeScript, PHP), 21 built-in detection rules, intra- and inter-procedural taint analysis, LLM graph reasoning, 54-CWE compliance database with four framework mappings, GitHub Action and pre-commit hook, custom YAML rules, and CI/CD integration guides for six platforms.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sanicode-0.3.3.tar.gz (259.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sanicode-0.3.3-py3-none-any.whl (178.1 kB view details)

Uploaded Python 3

File details

Details for the file sanicode-0.3.3.tar.gz.

File metadata

  • Download URL: sanicode-0.3.3.tar.gz
  • Upload date:
  • Size: 259.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sanicode-0.3.3.tar.gz
Algorithm Hash digest
SHA256 99387c66775026e3bd7b101020ee1b70db28e7e2e655d5f65221921f4fc4e0e2
MD5 3107dee47fe4031b7bb7ffff586e012f
BLAKE2b-256 7013c1745df2802811f83ba2c2d94ef0de58ad70bb18b79d324e82a1384b2480

See more details on using hashes here.

Provenance

The following attestation bundles were made for sanicode-0.3.3.tar.gz:

Publisher: release.yml on rdwj/sanicode

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sanicode-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: sanicode-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 178.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sanicode-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c15560dd9c818b02146208f496a05cccc504ac4aa60ab3db9b083410102bbf18
MD5 3af6d1127abaa23659a714354cdfe537
BLAKE2b-256 6e7cc7f98eebaa2b78c6a89eb9734e892dd4c934f2e23ae418500b62de1c112e

See more details on using hashes here.

Provenance

The following attestation bundles were made for sanicode-0.3.3-py3-none-any.whl:

Publisher: release.yml on rdwj/sanicode

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page