AI-native code security scanner with cascade analysis and Firecracker-microVM DAST runtime validation
Project description
Argus Scanner
We don't flag what we can't exploit.
Argus is an AI-native code security scanner combining a cost-tiered LLM harness (Gemini Flash-Lite triage → Sonnet 4.6 → Opus 4.6 escalation) with runtime DAST detonation in a Firecracker microVM and sandbox-verified remediation. Whether the bug is in code your team wrote (SQL injection, auth bypass, deserialization, command injection, crypto misuse) or in code your stack quietly pulled in (a malicious package, a poisoned CLAUDE.md, a backdoored setup.py, a tampered ML checkpoint loader about to run on someone's machine) — Argus detonates it in the sandbox, captures the exploit firing, generates a patch, replays the same exploit against the patched source, and ships the result as a CI gate.
It targets the gap between "this looks suspicious" (pattern-matching SAST) and "this actually exploits something" (manual reverse engineering).
One scanner. Two threat models. Zero false-positive triage.
Open source. BYOK. Apache 2.0.
You pay your providers directly — Anthropic + Google for the cascade, Fly.io for the optional DAST sandbox. Argus collects nothing.
Quick Start
Get from install to first scan in under 60 seconds:
pip install argus-ai-scanner
export ANTHROPIC_API_KEY="your-anthropic-key"
export GEMINI_API_KEY="your-gemini-key"
# Single file
argus scan path/to/suspicious.py
# Whole repo (current directory)
argus scan-repo .
# CI mode — only files changed vs main, SARIF for GitHub Code Scanning
argus scan-repo . --diff origin/main --output sarif --output-file findings.sarif
# Pre-install supply-chain gate — scan a PyPI package + its dep closure
# BEFORE pip installs anything. Blocks day-zero malware at the ingestion boundary.
argus install requests
argus install -r requirements.txt --dry-run # CI gate without installing
argus install litellm --strict-coverage # extra-paranoid mode
Without DAST configured the CLI gracefully degrades to cascade-only verdicts. DAST mode (Firecracker sandbox) requires a Fly.io account — see docs/dast-setup.md.
Benchmark Performance
Adversarial regression suite, labeled by a 4-LLM consensus oracle. Methodology, sample size, and per-file breakdown: bench_results/v1_1_launch/launch_report.md.
Verdict-exact (higher = better)
Argus (cascade + DAST) ████████████████████ 91.3%
Gemini 3.1 Pro █████████████████░░░ 82.6%
Grok 4.3 █████████████████░░░ 82.6%
Opus 4.6 █████████████████░░░ 78.3%
GPT 5.4 ████████████████░░░░ 73.9%
How Argus works (the three pillars)
Argus has three pillars. The capability matrix below shows exactly what each pillar does for each file type.
Pillar 1 — Cascade harness (static + AI analysis)
Every recognized file flows through a cost-tiered model cascade. Deterministic preprocessing first (free, no models): SHA-256, multi-stage deobfuscation (base64 / hex / eval-chain), dependency graphing, attack-vector flagging, AI-file-pattern detection. Files with no outbound intent get dropped before a single token is spent.
Survivors route through a model cascade:
| Cascade stage | Model | Cost / file | Decides |
|---|---|---|---|
| Triage | Gemini Flash-Lite | ~$0.001 | CLEAN / LOW / HIGH routing |
| Cheap analysis (LOW tier) | Gemini Flash | ~$0.02 | findings on low-priority files |
| Default deep analysis (HIGH tier) | Anthropic Claude Sonnet 4.6 | ~$0.07 | findings on high-priority files |
| High-stakes / borderline escalation | Anthropic Claude Opus 4.6 | ~$0.15 | ~20% of HIGH files |
The harness emits structured findings: CWE, line, severity, code, explanation, suggested fix, proof-of-concept, behavioral profile, attack chains, composite risk score. Aggregate cost is ~$4.65 per 100-file scan on a realistic workload mix; hard per-file + per-scan cost caps abort runs that exceed your declared budget.
Pillar 2 — DAST runtime detonation
When the harness flags suspicion at sufficient verdict tier, the file moves to a Firecracker microVM (minimal-v1, networked-v1, or ml_tools-v1 image profile) for two phases:
- Phase A — exploit testing. Plan an exploit per harness finding, run it in the sandbox, capture syscalls / egress / filesystem writes, classify each finding as
CONFIRMED/BLOCKED/UNREACHED/NOT_TESTEDbased on what actually happened. - Phase B — exploit discovery. Given accumulated evidence, propose NEW hypotheses the harness missed. A deterministic validator gates the proposals; survivors carry forward into the next iteration's Phase A. Up to 3 iterations or until convergence.
This is the layer that kills false positives — a "looks like SQL injection" pattern that the file's own escaping defends against gets BLOCKED, not flagged. And it surfaces what static analysis missed — Phase B has actually found new findings the harness didn't catch.
Pillar 3 — Remediation (fix-and-verify)
When Phase A confirms an exploit on text source (Python, JS / TS, shell), Argus generates a patched version, replays the same exploit attempts against the patched code in the same sandbox, and emits per-finding NEUTRALIZED / STILL_EXPLOITABLE / UNVERIFIABLE with sandbox-grounded evidence. You don't get a remediation suggestion; you get a remediation that's been tested.
Binary artifact policy. For ML artifacts (.pkl / .pt / .bin / .safetensors / .h5 / .onnx), Argus does NOT auto-patch the binary — the model can't emit valid bytecode-level patches and a corrupt patched pickle would mislead the replay. Instead, the remediation pillar emits structured guidance: regenerate the model from a clean training pipeline and serialize using safetensors (which is structurally incapable of carrying executable __reduce__ payloads). Status is UNVERIFIABLE with the guidance in fix_summary.
Opt-out: pass --no-remediation to skip this pillar entirely while keeping the harness + DAST active. Use for compliance scans, CI gates that don't allow source-modification suggestions, read-only audits, or to save ~$0.05/file in patch-generation tokens. The result still includes a structured phase_c block with skipped_reason: "phase_c_disabled_by_config" so downstream consumers can distinguish "remediation off" from "ran and found nothing to fix."
Coverage matrix
What each pillar does, per file type. ✅ = supported, ⚠️ = supported with policy nuance, ⏳ = roadmap, ❌ N/A = not applicable to this format.
| File type | Harness analysis | DAST exploit testing | DAST exploit discovery | Remediation |
|---|---|---|---|---|
Python (.py, .pyw, .pyi, .pth) |
✅ | ✅ | ✅ | ✅ patch + replay |
JavaScript / TypeScript (.js, .mjs, .cjs, .jsx, .ts, .tsx) |
✅ | ✅ | ✅ | ✅ patch + replay |
Shell (.sh, .bash, .zsh) |
✅ | ✅ | ✅ | ✅ patch + replay |
Jupyter notebooks (.ipynb) |
✅ cell-by-cell decomposition | ⏳ roadmap | ⏳ roadmap | ⏳ roadmap |
ML model artifacts (.pkl, .pickle, .pt, .bin, .safetensors, .h5, .hdf5, .keras, .onnx) |
✅ pickletools disassembly | ✅ load-detonation in sandbox | ❌ | ⚠️ guidance only (no auto-patch — see binary policy) |
GitHub Actions workflows (.github/workflows/*.yml) |
✅ deterministic CI-pattern sweep | ⏳ roadmap | ⏳ roadmap | ⏳ roadmap |
Supply-chain manifests (package.json, requirements.txt, Cargo.lock, go.mod, Gemfile, Pipfile, setup.py, pyproject.toml, pom.xml, build.gradle, *.csproj, etc.) |
✅ parsed for deps + lifecycle hooks | ❌ N/A (no runtime to detonate) | ❌ N/A | ❌ N/A |
AI-agent config sentinels (CLAUDE.md, AGENTS.md, SKILL.md, .cursorrules, .clinerules, mcp.json, plugin.json, openapi.{yaml,json}, agent-config.{yaml,json,toml}, etc.) |
✅ prompt-injection surface | ❌ N/A | ❌ N/A | ❌ N/A |
| Other languages tagged for harness (Java, Kotlin, Scala, Go, Rust, Ruby, PHP, C#, C/C++, PowerShell, Lua, Perl, R, Swift, Terraform, HCL) | ✅ generic harness analysis | ⏳ roadmap | ⏳ roadmap | ⏳ roadmap |
Per-finding verdicts (where the FP kill happens)
Every finding ships with one of these statuses:
| Status | Meaning |
|---|---|
CONFIRMED |
Sandbox observed the exploit firing. PoC + event trace surfaced with the finding. |
BLOCKED |
Attack was tested; the file's own code defended against it (sanitization, escaping, allowlist). |
UNREACHED |
Attack was tested; the code path is genuinely unreachable. |
NOT_TESTED |
Sandbox couldn't execute the test. Sub-reason: infra_stub / inconclusive / not_planned. |
A CONFIRMED finding looks like this:
{
"cwe": "CWE-200",
"type": "data_exfiltration",
"severity": "critical",
"status": "CONFIRMED",
"confidence": 1.0,
"runtime_evidence": "Mock HTTP server at 127.0.0.1:8000 captured POST body containing 'FAKE_PRIVATE_KEY_CONTENT' and 'ssh-rsa AAAAFAKEKEY user@host'. The malware decoded its base64 payload and POSTed the contents of ~/.ssh/ to the rewritten C2 endpoint.",
"proof_of_concept": "On any Unix host with SSH keys present, execution sends the full contents of ~/.ssh/ to the remote C2 server over HTTPS."
}
DAST cuts three ways: it confirms exploits with sandbox-captured evidence, refutes false positives with proof of non-exploitability, and verifies remediations by replaying the same exploits against the patched source.
Enterprise Invariants
Anthropic's Claude Security and OpenAI's Codex Security are enterprise-tier and vendor-cloud-only. Argus is the open alternative.
- BYOK. You control LLM access; bills go to your API meter, not ours.
- Zero telemetry. In cascade-only mode, nothing leaves your machine. In DAST mode, file content is sent only to a Fly.io app you own and control — never to Argus-operated infrastructure.
- Local execution. Fully self-contained pipeline; no SaaS dependency.
CLI Reference
argus scan <file> — single-file scan
| Flag | Purpose |
|---|---|
--output {json,markdown} |
Output format (default: json) |
--no-dast |
Skip DAST verification (cascade-only) |
--no-remediation |
Skip Phase C (fix-and-verify). Phase A + B still run; no patch is generated. Compliance / CI-gate / read-only-audit use cases. Saves ~$0.05/file. |
--max-cost USD |
Abort this file's scan if per-file API spend exceeds USD (default: $1.00; pass 0 to disable) |
--enable-discovery |
Proactive payload sweep — runs library of attack payloads against the file in sandbox; surfaces runtime-confirmed CWEs as new findings (+~$0.25/file) |
--dast-trigger-verdicts LIST |
Comma-separated L1 verdicts that trigger DAST. Default: malicious,critical_malicious. Allowed: clean,suspicious,malicious,critical_malicious |
argus scan-repo <path> — directory tree scan
| Flag | Purpose |
|---|---|
--diff REF |
Only scan files differing vs git ref (e.g., --diff origin/main for PR/CI) |
--output {markdown,json,sarif} |
Output format (default: markdown); sarif is SARIF v2.1.0 for GitHub Code Scanning |
--output-file PATH |
Write to file instead of stdout |
--max-cost USD |
Abort the run when cumulative API spend across all files exceeds USD; remaining files are marked cost_cap_reached. Pass 0 or omit to disable |
--exclude GLOB |
Additional gitignore-style exclude pattern (repeatable) |
--no-gitignore |
Ignore .gitignore during walk (default: respected) |
--max-file-bytes BYTES |
Skip files larger than BYTES (default: 1 MiB) |
--no-dast |
Skip DAST verification on every file |
--no-remediation |
Skip Phase C on every file. Phase A + B still run; no patches generated. |
--enable-discovery |
Proactive payload sweep on every DAST-eligible file |
--dast-trigger-verdicts LIST |
Same as scan |
--continue-on-error / --no-continue-on-error |
On per-file exception, record and continue (default) or abort run |
argus install <pkg> — pre-install supply-chain gate
Stages the package via pip download (no setup.py execution), runs the full Argus pipeline on every wheel/sdist in the dependency closure, then either calls real pip install or blocks with the analysis printed. Catches day-zero supply-chain malware at the ingestion boundary — exactly the class advisory-based scanners (pip-audit, safety) miss.
| Flag | Purpose |
|---|---|
<pkg> |
Package spec (e.g. 'requests', 'litellm==1.50.0', 'fastapi[all]'). Mutually exclusive with -r. |
-r PATH / --requirement PATH |
Install from a requirements.txt; Argus scans every wheel in the resolved closure. |
--block-on LIST |
Comma-separated verdict tiers that block install. Default: malicious,critical_malicious. Use suspicious,malicious,critical_malicious for stricter gating. |
--no-dast |
Cascade-only — skip DAST runtime detonation even if Fly is configured. Faster + cheaper, but leaves runtime-only exploits (load-time RCE in pickles, etc.) un-validated. |
--no-cache |
Ignore the wheel-hash verdict cache. Re-scans every artifact from scratch. |
--cache-dir PATH |
Override cache directory (default: ~/.cache/argus/install). |
--dry-run |
Run the scan + report verdict; do NOT call pip install. For CI gating without side effects. |
--strict-coverage |
Escalate verdict to suspicious when Argus could only statically analyze <70% of files in a wheel (rest are typically native binaries: .so, .pyd, .dll, .dylib, .exe). For security-paranoid users / strict CI gates. |
--max-cost USD |
Per-file cost cap (default: $1.00). |
--parallel N |
Max number of artifacts scanned concurrently (default: 4). |
--pip EXEC |
Pip executable. Default: pip. Pass 'uv pip' for uv-managed envs. |
--output {text,json} |
Output format. Default: text. JSON for CI consumption. |
Phase C is always disabled on the install path. Remediation for a not-yet-installed package is "don't install", not "patch + replay." If the cascade flags a malicious verdict, the install is blocked; the user sees the analysis (CWE, runtime evidence, exfil destination) and decides.
Wheel-hash caching. Verdicts are cached at ~/.cache/argus/install/<sha256>.json. Wheel bytes are immutable on PyPI (re-uploads of the same name+version are rejected), so a verdict is permanently valid for that exact artifact. First-run cost is real; subsequent installs of the same wheel are free.
Coverage transparency. A "clean" verdict on a wheel that's 50% native binaries (.so, .pyd) is honestly weaker evidence than a clean verdict on a wheel that's 100% Python — the report says so. Every artifact verdict reports n_files_unscanned + extension histogram. Native binaries are not silently scrubbed from the verdict — coverage warnings surface. --strict-coverage opt-in escalates the verdict on low-coverage artifacts.
Security & Isolation
Argus deliberately detonates potentially malicious code. Host protection is non-negotiable.
- Hardware-level isolation. Execution happens inside Firecracker microVMs using KVM hardware virtualization.
- Ephemeral state. Every detonation spins up a pristine microVM and is destroyed post-execution. Zero persistence.
- Strict egress control. Network profiles enforced at the hypervisor level prevent lateral movement during DAST verification.
Documentation
| Topic | Page |
|---|---|
| Install guide | docs/install.md |
| API key sourcing | docs/api-keys.md |
| Architecture deep dive | docs/architecture.md |
| DAST sandbox setup | docs/dast-setup.md |
| Cost guide | docs/cost-guide.md |
| Roadmap | ROADMAP.md |
| Contributing | CONTRIBUTING.md |
| Security disclosures | SECURITY.md |
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file argus_ai_scanner-1.3.0.tar.gz.
File metadata
- Download URL: argus_ai_scanner-1.3.0.tar.gz
- Upload date:
- Size: 554.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40f64c01968298de711808ea8306de3ac01a13f1500961e6503317addbf72349
|
|
| MD5 |
e605936771ed425ba77019526d961ed3
|
|
| BLAKE2b-256 |
45a8268242dfd8bc6dfff000a6bcc03df4b6e997cc8d0e8ead9c3ee9f6e57ce9
|
Provenance
The following attestation bundles were made for argus_ai_scanner-1.3.0.tar.gz:
Publisher:
release.yml on dshochat/Argus_Scanner
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
argus_ai_scanner-1.3.0.tar.gz -
Subject digest:
40f64c01968298de711808ea8306de3ac01a13f1500961e6503317addbf72349 - Sigstore transparency entry: 1496340281
- Sigstore integration time:
-
Permalink:
dshochat/Argus_Scanner@550d376fd55f096ba73552f77654d14dc59485b3 -
Branch / Tag:
refs/tags/v1.3.0 - Owner: https://github.com/dshochat
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@550d376fd55f096ba73552f77654d14dc59485b3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file argus_ai_scanner-1.3.0-py3-none-any.whl.
File metadata
- Download URL: argus_ai_scanner-1.3.0-py3-none-any.whl
- Upload date:
- Size: 403.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e07cc3a6567977b89b8908150fb9722e32296b0371c04a86cff9bd25e8ec4ed
|
|
| MD5 |
d864af9a36e5b6ef675fa91d023e7f7d
|
|
| BLAKE2b-256 |
487a04e131e07c3f6ab12cb24d006ab3909dafd5e15845c631436646459f030c
|
Provenance
The following attestation bundles were made for argus_ai_scanner-1.3.0-py3-none-any.whl:
Publisher:
release.yml on dshochat/Argus_Scanner
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
argus_ai_scanner-1.3.0-py3-none-any.whl -
Subject digest:
4e07cc3a6567977b89b8908150fb9722e32296b0371c04a86cff9bd25e8ec4ed - Sigstore transparency entry: 1496340388
- Sigstore integration time:
-
Permalink:
dshochat/Argus_Scanner@550d376fd55f096ba73552f77654d14dc59485b3 -
Branch / Tag:
refs/tags/v1.3.0 - Owner: https://github.com/dshochat
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@550d376fd55f096ba73552f77654d14dc59485b3 -
Trigger Event:
push
-
Statement type: