Skip to main content

AI-native code security scanner with cascade analysis and Firecracker-microVM DAST runtime validation

Project description

Argus Scanner

We don't flag what we can't exploit.

Argus is an AI-native code security scanner combining a cost-tiered LLM harness (Gemini Flash-Lite triage → Sonnet 4.6 → Opus 4.6 escalation) with runtime DAST detonation in a Firecracker microVM and sandbox-verified remediation. Whether the bug is in code your team wrote (SQL injection, auth bypass, deserialization, command injection, crypto misuse) or in code your stack quietly pulled in (a malicious package, a poisoned CLAUDE.md, a backdoored setup.py, a tampered ML checkpoint loader about to run on someone's machine) — Argus detonates it in the sandbox, captures the exploit firing, generates a patch, replays the same exploit against the patched source, and ships the result as a CI gate.

It targets the gap between "this looks suspicious" (pattern-matching SAST) and "this actually exploits something" (manual reverse engineering).

One scanner. Two threat models. Zero false-positive triage.

Open source. BYOK. Apache 2.0.

You pay your providers directly — Anthropic + Google for the cascade, Fly.io for the optional DAST sandbox. Argus collects nothing.


Quick Start

Get from install to first scan in under 60 seconds:

pip install argus-ai-scanner
export ANTHROPIC_API_KEY="your-anthropic-key"
export GEMINI_API_KEY="your-gemini-key"

# Single file
argus scan path/to/suspicious.py

# Whole repo (current directory)
argus scan-repo .

# CI mode — only files changed vs main, SARIF for GitHub Code Scanning
argus scan-repo . --diff origin/main --output sarif --output-file findings.sarif

# Pre-install supply-chain gate — scan a PyPI package + its dep closure
# BEFORE pip installs anything. Blocks day-zero malware at the ingestion boundary.
argus install requests
argus install -r requirements.txt --dry-run        # CI gate without installing
argus install litellm --strict-coverage             # extra-paranoid mode

Without DAST configured the CLI gracefully degrades to cascade-only verdicts. DAST mode (Firecracker sandbox) requires a Fly.io account — see docs/dast-setup.md.

Benchmark Performance

Adversarial regression suite, labeled by a 4-LLM consensus oracle. Methodology, sample size, and per-file breakdown: bench_results/v1_1_launch/launch_report.md.

                       Verdict-exact (higher = better)
Argus (cascade + DAST) ████████████████████  91.3%
Gemini 3.1 Pro         █████████████████░░░  82.6%
Grok 4.3               █████████████████░░░  82.6%
Opus 4.6               █████████████████░░░  78.3%
GPT 5.4                ████████████████░░░░  73.9%

How Argus works (the three pillars)

Argus has three pillars. The capability matrix below shows exactly what each pillar does for each file type.

Pillar 1 — Cascade harness (static + AI analysis)

Every recognized file flows through a cost-tiered model cascade. Deterministic preprocessing first (free, no models): SHA-256, multi-stage deobfuscation (base64 / hex / eval-chain), dependency graphing, attack-vector flagging, AI-file-pattern detection. Files with no outbound intent get dropped before a single token is spent.

Survivors route through a model cascade:

Cascade stage Model Cost / file Decides
Triage Gemini Flash-Lite ~$0.001 CLEAN / LOW / HIGH routing
Cheap analysis (LOW tier) Gemini Flash ~$0.02 findings on low-priority files
Default deep analysis (HIGH tier) Anthropic Claude Sonnet 4.6 ~$0.07 findings on high-priority files
High-stakes / borderline escalation Anthropic Claude Opus 4.6 ~$0.15 ~20% of HIGH files

The harness emits structured findings: CWE, line, severity, code, explanation, suggested fix, proof-of-concept, behavioral profile, attack chains, composite risk score. Aggregate cost is ~$4.65 per 100-file scan on a realistic workload mix; hard per-file + per-scan cost caps abort runs that exceed your declared budget.

Pillar 2 — DAST runtime detonation

When the harness flags suspicion at sufficient verdict tier, the file moves to a Firecracker microVM (minimal-v2, networked-v2, or ml_tools-v2 image profile) for two phases:

  • Phase A — exploit testing. Plan an exploit per harness finding, run it in the sandbox, capture syscalls / egress / filesystem writes, classify each finding as CONFIRMED / BLOCKED / UNREACHED / NOT_TESTED based on what actually happened.
  • Phase B — exploit discovery. Given accumulated evidence, propose NEW hypotheses the harness missed. A deterministic validator gates the proposals; survivors carry forward into the next iteration's Phase A. Up to 3 iterations or until convergence.

This is the layer that kills false positives — a "looks like SQL injection" pattern that the file's own escaping defends against gets BLOCKED, not flagged. And it surfaces what static analysis missed — Phase B has actually found new findings the harness didn't catch.

Pillar 3 — Remediation (fix-and-verify)

When Phase A confirms an exploit on text source (Python, JS / TS, shell), Argus generates a patched version, replays the same exploit attempts against the patched code in the same sandbox, and emits per-finding NEUTRALIZED / STILL_EXPLOITABLE / UNVERIFIABLE with sandbox-grounded evidence. You don't get a remediation suggestion; you get a remediation that's been tested.

Binary artifact policy. For ML artifacts (.pkl / .pt / .bin / .safetensors / .h5 / .onnx), Argus does NOT auto-patch the binary — the model can't emit valid bytecode-level patches and a corrupt patched pickle would mislead the replay. Instead, the remediation pillar emits structured guidance: regenerate the model from a clean training pipeline and serialize using safetensors (which is structurally incapable of carrying executable __reduce__ payloads). Status is UNVERIFIABLE with the guidance in fix_summary.

Opt-out: pass --no-remediation to skip this pillar entirely while keeping the harness + DAST active. Use for compliance scans, CI gates that don't allow source-modification suggestions, read-only audits, or to save ~$0.05/file in patch-generation tokens. The result still includes a structured phase_c block with skipped_reason: "phase_c_disabled_by_config" so downstream consumers can distinguish "remediation off" from "ran and found nothing to fix."


Coverage matrix

What each pillar does, per file type. ✅ = supported, ⚠️ = supported with policy nuance, ⏳ = roadmap, ❌ N/A = not applicable to this format.

File type Harness analysis DAST exploit testing DAST exploit discovery Remediation
Python (.py, .pyw, .pyi, .pth) ✅ patch + replay
JavaScript / TypeScript (.js, .mjs, .cjs, .jsx, .ts, .tsx) ✅ patch + replay
Shell (.sh, .bash, .zsh) ✅ patch + replay
Jupyter notebooks (.ipynb) ✅ cell-by-cell decomposition ⏳ roadmap ⏳ roadmap ⏳ roadmap
ML model artifacts (.pkl, .pickle, .pt, .bin, .safetensors, .h5, .hdf5, .keras, .onnx) ✅ pickletools disassembly ✅ load-detonation in sandbox ⚠️ guidance only (no auto-patch — see binary policy)
GitHub Actions workflows (.github/workflows/*.yml) ✅ deterministic CI-pattern sweep ⏳ roadmap ⏳ roadmap ⏳ roadmap
Supply-chain manifests (package.json, requirements.txt, Cargo.lock, go.mod, Gemfile, Pipfile, setup.py, pyproject.toml, pom.xml, build.gradle, *.csproj, etc.) ✅ parsed for deps + lifecycle hooks ❌ N/A (no runtime to detonate) ❌ N/A ❌ N/A
AI-agent config sentinels (CLAUDE.md, AGENTS.md, SKILL.md, .cursorrules, .clinerules, mcp.json, plugin.json, openapi.{yaml,json}, agent-config.{yaml,json,toml}, etc.) ✅ prompt-injection surface ❌ N/A ❌ N/A ❌ N/A
Other languages tagged for harness (Java, Kotlin, Scala, Go, Rust, Ruby, PHP, C#, C/C++, PowerShell, Lua, Perl, R, Swift, Terraform, HCL) ✅ generic harness analysis ⏳ roadmap ⏳ roadmap ⏳ roadmap

Per-finding verdicts (where the FP kill happens)

Every finding ships with one of these statuses:

Status Meaning
CONFIRMED Sandbox observed the exploit firing. PoC + event trace surfaced with the finding.
BLOCKED Attack was tested; the file's own code defended against it (sanitization, escaping, allowlist).
UNREACHED Attack was tested; the code path is genuinely unreachable.
NOT_TESTED Sandbox couldn't execute the test. Sub-reason: infra_stub / inconclusive / not_planned.

A CONFIRMED finding looks like this:

{
  "cwe": "CWE-200",
  "type": "data_exfiltration",
  "severity": "critical",
  "status": "CONFIRMED",
  "confidence": 1.0,
  "runtime_evidence": "Mock HTTP server at 127.0.0.1:8000 captured POST body containing 'FAKE_PRIVATE_KEY_CONTENT' and 'ssh-rsa AAAAFAKEKEY user@host'. The malware decoded its base64 payload and POSTed the contents of ~/.ssh/ to the rewritten C2 endpoint.",
  "proof_of_concept": "On any Unix host with SSH keys present, execution sends the full contents of ~/.ssh/ to the remote C2 server over HTTPS."
}

DAST cuts three ways: it confirms exploits with sandbox-captured evidence, refutes false positives with proof of non-exploitability, and verifies remediations by replaying the same exploits against the patched source.

Enterprise Invariants

Anthropic's Claude Security and OpenAI's Codex Security are enterprise-tier and vendor-cloud-only. Argus is the open alternative.

  • BYOK. You control LLM access; bills go to your API meter, not ours.
  • Zero telemetry. In cascade-only mode, nothing leaves your machine. In DAST mode, file content is sent only to a Fly.io app you own and control — never to Argus-operated infrastructure.
  • Local execution. Fully self-contained pipeline; no SaaS dependency.

CLI Reference

argus scan <file> — single-file scan

Flag Purpose
--output {json,markdown} Output format (default: json)
--no-dast Skip DAST verification (cascade-only)
--no-remediation Skip Phase C (fix-and-verify). Phase A + B still run; no patch is generated. Compliance / CI-gate / read-only-audit use cases. Saves ~$0.05/file.
--max-cost USD Abort this file's scan if per-file API spend exceeds USD (default: $1.00; pass 0 to disable)
--enable-discovery Proactive payload sweep — runs library of attack payloads against the file in sandbox; surfaces runtime-confirmed CWEs as new findings (+~$0.25/file)
--enable-runtime-probe Phase B+ runtime exploit probing (v1.5). Sonnet generates concrete attack inputs targeting probe-attractive functions; sandbox executes each; deterministic rules confirm exploits via runtime evidence (return value, side-effect canaries). Python only in v1.5; opt-in (~$0.20–0.50/file).
--dast-trigger-verdicts LIST Comma-separated L1 verdicts that trigger DAST. Default: malicious,critical_malicious. Allowed: clean,suspicious,malicious,critical_malicious

argus scan-repo <path> — directory tree scan

Flag Purpose
--diff REF Only scan files differing vs git ref (e.g., --diff origin/main for PR/CI)
--output {markdown,json,sarif} Output format (default: markdown); sarif is SARIF v2.1.0 for GitHub Code Scanning
--output-file PATH Write to file instead of stdout
--max-cost USD Abort the run when cumulative API spend across all files exceeds USD; remaining files are marked cost_cap_reached. Pass 0 or omit to disable
--exclude GLOB Additional gitignore-style exclude pattern (repeatable)
--no-gitignore Ignore .gitignore during walk (default: respected)
--max-file-bytes BYTES Skip files larger than BYTES (default: 1 MiB)
--no-dast Skip DAST verification on every file
--no-remediation Skip Phase C on every file. Phase A + B still run; no patches generated.
--enable-discovery Proactive payload sweep on every DAST-eligible file
--enable-runtime-probe Phase B+ runtime exploit probing (v1.5) on every DAST-eligible Python file. See argus scan for description.
--dast-trigger-verdicts LIST Same as scan
--continue-on-error / --no-continue-on-error On per-file exception, record and continue (default) or abort run

argus install <pkg> — pre-install supply-chain gate

Stages the package via pip download (no setup.py execution), runs the full Argus pipeline on every wheel/sdist in the dependency closure, then either calls real pip install or blocks with the analysis printed. Catches day-zero supply-chain malware at the ingestion boundary — exactly the class advisory-based scanners (pip-audit, safety) miss.

Flag Purpose
<pkg> Package spec (e.g. 'requests', 'litellm==1.50.0', 'fastapi[all]'). Mutually exclusive with -r.
-r PATH / --requirement PATH Install from a requirements.txt; Argus scans every wheel in the resolved closure.
--block-on LIST Comma-separated verdict tiers that block install. Default: malicious,critical_malicious. Use suspicious,malicious,critical_malicious for stricter gating.
--no-dast Cascade-only — skip DAST runtime detonation even if Fly is configured. Faster + cheaper, but leaves runtime-only exploits (load-time RCE in pickles, etc.) un-validated.
--no-cache Ignore the wheel-hash verdict cache. Re-scans every artifact from scratch.
--cache-dir PATH Override cache directory (default: ~/.cache/argus/install).
--dry-run Run the scan + report verdict; do NOT call pip install. For CI gating without side effects.
--strict-coverage Escalate verdict to suspicious when Argus could only statically analyze <70% of files in a wheel (rest are typically native binaries: .so, .pyd, .dll, .dylib, .exe). For security-paranoid users / strict CI gates.
--max-cost USD Per-file cost cap (default: $1.00).
--max-total-cost USD Aggregate cost cap across the whole dependency-closure scan (default: $10). When tripped, remaining wheels are flagged suspicious / unscanned-due-to-cost-cap and the install fails closed. Pass 0 to disable.
--deep Full-fidelity scan — thinking_budget=24000 on every Sonnet/Opus call, sequential per-file scan, 4 wheels concurrent. ~5–10× more expensive but catches subtle multi-step exploits the default mode might miss.
--no-thinking Explicit way to set thinking_budget=0. Already the install default; flag exists for script readability. Mutually exclusive with --deep.
--parallel N Max number of artifacts scanned concurrently (default: 8). Pass lower if you hit API rate limits.
--enable-runtime-probe Phase B+ runtime exploit probing (v1.5) on every DAST-eligible Python file in the dependency closure. Adds ~$0.20–0.50/file. See argus scan.
--pip EXEC Pip executable. Default: pip. Pass 'uv pip' for uv-managed envs.
--output {text,json} Output format. Default: text. JSON for CI consumption.

Phase C is always disabled on the install path. Remediation for a not-yet-installed package is "don't install", not "patch + replay." If the cascade flags a malicious verdict, the install is blocked; the user sees the analysis (CWE, runtime evidence, exfil destination) and decides.

Wheel-hash caching. Verdicts are cached at ~/.cache/argus/install/<sha256>.json. Wheel bytes are immutable on PyPI (re-uploads of the same name+version are rejected), so a verdict is permanently valid for that exact artifact. First-run cost is real; subsequent installs of the same wheel are free.

Coverage transparency. A "clean" verdict on a wheel that's 50% native binaries (.so, .pyd) is honestly weaker evidence than a clean verdict on a wheel that's 100% Python — the report says so. Every artifact verdict reports n_files_unscanned + extension histogram. Native binaries are not silently scrubbed from the verdict — coverage warnings surface. --strict-coverage opt-in escalates the verdict on low-coverage artifacts.

Security & Isolation

Argus deliberately detonates potentially malicious code. Host protection is non-negotiable.

  • Hardware-level isolation. Execution happens inside Firecracker microVMs using KVM hardware virtualization.
  • Ephemeral state. Every detonation spins up a pristine microVM and is destroyed post-execution. Zero persistence.
  • Strict egress control. Network profiles enforced at the hypervisor level prevent lateral movement during DAST verification.

Documentation

Topic Page
Install guide docs/install.md
API key sourcing docs/api-keys.md
Architecture deep dive docs/architecture.md
DAST sandbox setup docs/dast-setup.md
Cost guide docs/cost-guide.md
Roadmap ROADMAP.md
Contributing CONTRIBUTING.md
Security disclosures SECURITY.md

License

Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

argus_ai_scanner-1.5.0.tar.gz (594.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

argus_ai_scanner-1.5.0-py3-none-any.whl (424.5 kB view details)

Uploaded Python 3

File details

Details for the file argus_ai_scanner-1.5.0.tar.gz.

File metadata

  • Download URL: argus_ai_scanner-1.5.0.tar.gz
  • Upload date:
  • Size: 594.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for argus_ai_scanner-1.5.0.tar.gz
Algorithm Hash digest
SHA256 9aca1b6e2d6cf98306b03c34371e1464e10b0bb9ecb1195f665dfdd962d63334
MD5 9528dfcb4c5b08ab86f555eefdffa6a1
BLAKE2b-256 9aa4cc2d7dc0a8b619d62c3567e12f0bf280d734f575595fec01d9951c1533fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for argus_ai_scanner-1.5.0.tar.gz:

Publisher: release.yml on dshochat/Argus_Scanner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file argus_ai_scanner-1.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for argus_ai_scanner-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ec21c69b3a192cb53a6532fd23045d49e605a20ab6b838d0f2aef8d6a39d1b7
MD5 584dc00f64650336ca1eefc85472f399
BLAKE2b-256 8ce4cd0023dd233d0c4063380a88fdfb2385a9d051b25775096b609b9172915f

See more details on using hashes here.

Provenance

The following attestation bundles were made for argus_ai_scanner-1.5.0-py3-none-any.whl:

Publisher: release.yml on dshochat/Argus_Scanner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page