AI-native code security scanner with cascade analysis and Firecracker-microVM DAST runtime validation

These details have not been verified by PyPI

Project description

Argus

An AI-native code security scanner (Semantic Deep Analysis) that proves exploitability at runtime.

Argus combines a cost-graduated LLM cascade (Gemini Flash-Lite → Sonnet 4.6 → Opus 4.6) with a sandbox tier that executes suspect code in a Firecracker microVM and observes what it actually does. Static-analysis findings get promoted to CONFIRMED only when the sandbox captures concrete runtime evidence — a network call, a file write, a process spawn. Findings that cannot be triggered are marked UNREACHED; findings the file's own defenses block are BLOCKED. No more "the LLM said it might be malicious."

Open source, Apache 2.0, BYOK. You pay your providers directly — Anthropic + Google for the cascade, Fly.io for the optional DAST sandbox. Argus collects nothing.

What makes it different

Most scanners stop at "this code matches a vulnerability pattern." Argus runs the code, watches what it does, and reports per-finding outcomes:

Status	Meaning
`CONFIRMED`	The sandbox observed the exploit firing at runtime. PoC + event trace are surfaced with the finding.
`BLOCKED`	The attack was tested; the file's own code defended against it (sanitization, escaping, allowlist, etc.).
`UNREACHED`	The attack was tested; the code path is genuinely unreachable.
`NOT_TESTED`	Sandbox couldn't execute the test (with a sub-reason: `infra_stub` / `inconclusive` / `not_planned`).

vs. other approaches

Approach	Output	False-positive burden	Evidence type
Pattern-match scanner (regex / AST)	Syntactic match	High	None
Single frontier LLM (single-call)	Probabilistic opinion (semantic)	Medium-high	LLM reasoning
Argus	Runtime-verified verdict	Low	Sandbox traces

A CONFIRMED finding looks like this in argus scan output:

{
  "cwe": "CWE-200",
  "type": "data_exfiltration",
  "severity": "critical",
  "status": "CONFIRMED",
  "confidence": 1.0,
  "runtime_evidence": "Mock HTTP server at 127.0.0.1:8000 captured POST body containing
    'FAKE_PRIVATE_KEY_CONTENT' and 'ssh-rsa AAAAFAKEKEY user@host'. The malware decoded
    its base64 payload (process_exit step=0) and POSTed the contents of ~/.ssh/ to the
    rewritten C2 endpoint, exactly as L1's hypothesis predicted.",
  "proof_of_concept": "On any Unix host with SSH keys present, execution sends the full
    contents of ~/.ssh/ to the remote C2 server over HTTPS."
}

This is Argus's moat. Static and single-LLM scanners report suspicion. Argus reports what the code actually did — with concrete evidence, or clear proof it didn't.

Benchmark — Argus vs frontier single-call scanners

Scored against a ground-truth oracle derived from external security research and a multi-vendor LLM consensus (majority agreement):

                       Verdict-exact (higher = better)
Argus (cascade + DAST) ████████████████████  91.3%
Gemini 3.1 Pro         █████████████████░░░  82.6%
Grok 4.3               █████████████████░░░  82.6%
Opus 4.6               █████████████████░░░  78.3%
GPT 5.4                ████████████████░░░░  73.9%

Argus is +13.0pp more accurate than Opus 4.6 and +17.4pp more accurate than GPT-5.4. On the rich-oracle subset Argus also leads on finding quality: CWE F1 0.297 vs Opus 0.180 (+65% lift) and capability F1 0.771 vs Opus 0.720. Mean verdict-distance: 0.087 vs Opus 0.217.

But the differentiator the single-call scanners can't produce is runtime evidence. On the same suite, Argus's DAST tier observed 25 CONFIRMED exploits + 1 BLOCKED with concrete sandbox-captured artefacts — network calls, exfil POST bodies, process traces. By verifying which findings are actually exploitable versus mere pattern matches, Argus minimizes the false-positive flood that drowns security teams using static-only scanners. Unlike single-call LLMs that must guess exploitability, Argus's DAST tier tests it — turning many "maybe" findings into proven CONFIRMED exploits or clean UNREACHED / BLOCKED resolutions.

Methodology + per-file breakdown: bench_results/v1_1_launch/launch_report.md. Re-run is one command: python -m methodology.run_phase_a_report.

How the cascade keeps it cheap

Most files in a real codebase are clean. Argus is built around that observation: spend $0.0001 to dispatch a clean file in 1 second, $0.07 to deep-analyze a suspicious one, and only invoke the sandbox tier on the small subset of files where runtime confirmation actually matters.

File
  ↓
[$0]  Preprocessing               hash, deobfuscation, deps, attack-vector flags
  ↓
[Gemini Flash-Lite]  Triage       CLEAN | LOW | HIGH         ~$0.0001/file
  ↓
  ├─ CLEAN → return
  ├─ LOW   → Gemini Flash         combined analysis           ~$0.02/file
  └─ HIGH  → Sonnet 4.6           combined analysis           ~$0.07/file (default)
                ↓ borderline / high-stakes
              Opus 4.6            deep analysis                ~$0.15/file (~20% of HIGH)
  ↓
[N=3 Sonnet ensemble]             borderline-uncertainty path
  ↓
[DAST sandbox]                    Sonnet orchestrator + Firecracker microVM
                                   (minimal / networked / ml_tools images)
                                   ↓ inconclusive after 2 iterations
                                  Opus iter-3 escalation
  ↓
[Engine guard]                    DAST never lowers L1's verdict without
                                   sandbox-grounded refutation

Cost projection per 100 files

Stage	Calls	API spend
Triage (Flash-Lite)	100	$0.10
LOW analysis (Flash, ~50 files)	50	$1.00
HIGH analysis (Sonnet, ~15 files)	15	$1.05
HIGH + Opus escalation (~5 files)	5	$1.00
Borderline ensemble (Opus, ~3 files)	3	$0.60
DAST verification (~3 files)	3	$0.90
Total		~$4.65

Hard cost caps (--max-cost <USD> per file, or ScanConfig.max_cost_per_file_usd) abort scans that exceed your declared budget. You'll never get a surprise bill from Argus — the bill comes from your API providers, on a meter you control.

Quick start

pip install argus-ai-scanner
export ANTHROPIC_API_KEY=...
export GEMINI_API_KEY=...
argus scan path/to/your/file.py

Requirements:

Python 3.12+
An Anthropic API key — console.anthropic.com
A Google AI Studio key — aistudio.google.com
Optional: a Fly.io account if you want the DAST sandbox tier (Fly setup runbook)

Single-file scan

# Default: cascade + DAST on confirmed-malicious verdicts
uv run argus scan suspicious_package.py

# Tunable DAST coverage — also DAST suspicious files (~30-50% more API spend)
uv run argus scan suspicious_package.py \
  --dast-trigger-verdicts suspicious,malicious,critical_malicious

# Strictest budget mode — DAST only the highest-severity verdict tier
uv run argus scan suspicious_package.py --dast-trigger-verdicts critical_malicious

# Hard cost cap, any verdict
uv run argus scan suspicious_package.py --max-cost 0.50

# Discovery mode — proactive payload sweep for CWEs L1 missed (+~$0.25/file)
uv run argus scan suspicious_package.py --enable-discovery

# Skip DAST entirely (no Fly setup required; cascade-only verdicts)
uv run argus scan suspicious_package.py --no-dast

Repo scan (whole project)

argus scan-repo PATH walks a directory tree, applies file-type and .gitignore filters, and dispatches every supported file through the cascade. For private repos, clone locally first using your existing git credentials, then point Argus at the local path — Argus reads files from disk, not via the GitHub API.

# Whole project, current directory
cd ~/work/my-project
uv run argus scan-repo .

# PR / CI mode — only files changed vs main
uv run argus scan-repo . --diff origin/main

# CI with budget + SARIF output for GitHub Code Scanning
uv run argus scan-repo . \
  --diff origin/main \
  --max-cost 5.00 \
  --output sarif \
  --output-file findings.sarif

# Add a custom exclude pattern on top of .gitignore
uv run argus scan-repo . --exclude "vendor/**" --exclude "**/*.generated.*"

What gets scanned: the file-type allowlist covers Python, JavaScript / TypeScript, shell, Java bytecode, Markdown / RST / AsciiDoc (AI-injection surface), HTML / SVG / XML (XSS / XXE), and supply-chain manifests (package.json, requirements.txt, Cargo.lock, go.mod, Gemfile, composer.json, etc.). AI-agent config sentinels (CLAUDE.md, AGENTS.md, .cursorrules, mcp.json, claude_desktop_config.json, devcontainer.json, …) are explicitly recognized — these are the prime vectors for malicious-instructions-in-config attacks against coding agents. Always-ignored: .git, node_modules, __pycache__, .venv, build dirs, etc.

Output formats: --output markdown (default; human summary) / json (full per-file results) / sarif (SARIF v2.1.0 JSON, uploadable to GitHub Code Scanning).

DAST sandbox tier

DAST is optional. Without it, Argus ships verdicts using the L1 cascade alone. With it, you get per-finding CONFIRMED / BLOCKED / UNREACHED evidence backed by real runtime traces.

When enabled, every DAST plan runs in an ephemeral Firecracker microVM (Fly.io managed). The orchestrator:

Reads L1's hypotheses about how the file might be exploitable
Generates a concrete plan — sandbox commands, expected oracle, image hint
Submits to the microVM, which runs the file with file-content materialized at /workspace/<basename>, captures network calls via DNS hijack, and emits a structured event stream
Reads back the events, scores each hypothesis as CONFIRMED / BLOCKED / UNREACHED / NOT_TESTED
Surfaces the captured evidence (runtime_evidence field per finding)

Three sandbox images cover most workloads:

Image	Contents	Use cases
`minimal-v1`	Python 3.13 + Node.js + npm + JRE + bash + curl	Pickle exploits, file I/O, subprocess, basic crypto
`networked-v1`	minimal + curl / wget / nc / dig / openssl	Exfiltration confirmation via real DNS / network captures
`ml_tools-v1`	networked + torch CPU + transformers + safetensors	Malicious model loaders, pickled `__reduce__` payloads

Multi-language coverage today: Python, JavaScript / TypeScript, bash, Java bytecode. Roadmap: Go, Rust, Java source (compile required), .NET.

Full setup: docs/dast-setup.md.

Privacy

Files you scan never leave your machine in two-tier (no DAST) mode. With DAST enabled, file content is shipped (gzip + base64) to your own Fly app over the Fly machines API — nothing is routed through any Argus-operated infrastructure.

Argus has no telemetry, no opt-in analytics, and no usage reporting. The CLI does not phone home.

Architecture invariants

The non-negotiable design rules — break these in a PR and expect a long review:

Preprocessing is deterministic and free. No model calls in preprocessing/. If you're tempted, the change belongs in analysis/.
The cascade short-circuits cheap files cheap. A clean file costs $0.0001 (triage only). Don't add expensive defaults.
All runners are injectable. scan_file(triage_runner=, sonnet_runner=, opus_runner=, dast_runner=). The engine never hard-codes provider calls — that's how unit tests run with no API spend.
DAST never silently lowers an L1 verdict. A malicious → suspicious downgrade only fires when every L1 finding is sandbox-grounded as BLOCKED or UNREACHED. Without that, L1's verdict stands and dast_keep_l1 is recorded.
Cost guardrails enforced before any user-facing release. max_cost_per_file_usd aborts mid-scan rather than overrun.

Documentation

Topic	Page
Install + first scan	docs/install.md
API key sourcing	docs/api-keys.md
Cascade architecture	docs/architecture.md
Cost guide + budget knobs	docs/cost-guide.md
DAST sandbox setup (Fly.io)	docs/dast-setup.md
Contributing	CONTRIBUTING.md
Security disclosures	SECURITY.md

Development

uv sync --extra dev
uv run pytest tests/unit -v           # always; no API spend
uv run ruff check . && uv run ruff format .
uv run mypy --strict .
uv run pytest tests/integration -v    # before runner / engine changes; spends API

Argus is Python 3.12+, mypy --strict, ruff for lint and format. Pydantic v2 for cross-boundary structures. structlog for logging — never print() outside the CLI. Tests are split into unit (mocked, mandatory) and integration (live API, optional + manual). CI runs unit + lint + types on every PR; integration tests stay local because nobody wants surprise API bills on their fork.

See CONTRIBUTING.md for the full PR process.

License

Apache License 2.0.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.5.0

May 10, 2026

1.3.1

May 10, 2026

1.3.0

May 10, 2026

1.2.1

May 10, 2026

1.2.0

May 8, 2026

This version

1.1.1

May 7, 2026

1.1.0

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

argus_ai_scanner-1.1.1.tar.gz (419.5 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

argus_ai_scanner-1.1.1-py3-none-any.whl (363.8 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file argus_ai_scanner-1.1.1.tar.gz.

File metadata

Download URL: argus_ai_scanner-1.1.1.tar.gz
Upload date: May 7, 2026
Size: 419.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for argus_ai_scanner-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`ad6bc2f2a951b1af3fdf0de2f6ca305a5982410bc91041b516ebf835c5f8f761`
MD5	`ac1a9ba4133c0f715bc3e88ce9a4278e`
BLAKE2b-256	`2f4e7512bf39ae71cc6bc2b8b2514fd60478f45f0b41f216e4b1d4fe775002d4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for argus_ai_scanner-1.1.1.tar.gz:

Publisher: release.yml on dshochat/Argus_Scanner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: argus_ai_scanner-1.1.1.tar.gz
- Subject digest: ad6bc2f2a951b1af3fdf0de2f6ca305a5982410bc91041b516ebf835c5f8f761
- Sigstore transparency entry: 1456107261
- Sigstore integration time: May 7, 2026
Source repository:
- Permalink: dshochat/Argus_Scanner@31d49039c62d7a1078831a8f3026d24d9aabee69
- Branch / Tag: refs/tags/v1.1.1
- Owner: https://github.com/dshochat
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@31d49039c62d7a1078831a8f3026d24d9aabee69
- Trigger Event: push

File details

Details for the file argus_ai_scanner-1.1.1-py3-none-any.whl.

File metadata

Download URL: argus_ai_scanner-1.1.1-py3-none-any.whl
Upload date: May 7, 2026
Size: 363.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for argus_ai_scanner-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9e5a46daaac1ca62af76ad406e1399069bd79c8c44738d6a780d71b24fcb8b77`
MD5	`0487deba7a46b0075e7fc8102d80688e`
BLAKE2b-256	`f765bbb90bd912647bba68f4ddad0ab0db84d2c070a5526f79af2f23dfc038d7`

See more details on using hashes here.

Provenance

The following attestation bundles were made for argus_ai_scanner-1.1.1-py3-none-any.whl:

Publisher: release.yml on dshochat/Argus_Scanner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: argus_ai_scanner-1.1.1-py3-none-any.whl
- Subject digest: 9e5a46daaac1ca62af76ad406e1399069bd79c8c44738d6a780d71b24fcb8b77
- Sigstore transparency entry: 1456107359
- Sigstore integration time: May 7, 2026
Source repository:
- Permalink: dshochat/Argus_Scanner@31d49039c62d7a1078831a8f3026d24d9aabee69
- Branch / Tag: refs/tags/v1.1.1
- Owner: https://github.com/dshochat
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@31d49039c62d7a1078831a8f3026d24d9aabee69
- Trigger Event: push

argus-ai-scanner 1.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Argus

What makes it different

vs. other approaches

Benchmark — Argus vs frontier single-call scanners

How the cascade keeps it cheap

Cost projection per 100 files

Quick start

Single-file scan

Repo scan (whole project)

DAST sandbox tier

Privacy

Architecture invariants

Documentation

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance