Skip to main content

AI agent CLI for security research

Project description

Deadend CLI

[!WARNING] Active Development: This project is undergoing active development. Current features are functional but the interface and workflows are being improved based on new architecture and features.

Autonomous pentesting agent using feedback-driven iteration Achieves ~78% on XBOW benchmarks with fully local execution and model-agnostic architecture.

📄 Read Technical Deep Dive | 📊 Benchmark Results


What is Deadend CLI?

Deadend CLI is an autonomous web application penetration testing agent that uses feedback-driven iteration to adapt exploitation strategies. When standard tools fail, it generates custom Python payloads, observes responses, and iteratively refines its approach until breakthrough.

Key features:

  • Fully local execution (no cloud dependencies, zero data exfiltration)
  • Model-agnostic design (works with any deployable LLM)
  • Custom sandboxed tools (Playwright, Docker, WebAssembly)
  • ADaPT-based architecture with supervisor-subagent hierarchy
  • Confidence-based decision making (fail <20%, expand 20-60%, refine 60-80%, validate >80%)

Benchmark results: 78% on XBOW validation suite (76/98 challenges), including blind SQL injection exploits where other agents achieved 0%.

Read the architecture breakdown in our technical article →


Core Analysis Capabilities

The framework focuses on intelligent security analysis through:

  • 🔍 Taint Analysis: Automated tracking of data flow from sources to sinks
  • 🎯 Source/Sink Detection: Intelligent identification of entry points and vulnerable functions
  • 🔗 Contextual Tool Integration: Smart connection to specialized tools for testing complex logic patterns
  • 🧠 AI-Driven Reasoning: Context-aware analysis that mimics expert security thinking

🔧 Custom Pentesting Tools

  • Webapp-Specific Tooling: Custom tools designed specifically for web application penetration testing
  • Authentication Handling: Built-in support for session management, cookies, and auth flows
  • Fine-Grained Testing: Precise control over individual requests and parameters
  • Payload Generation: AI-powered payload creation tailored to target context
  • Automated Payload Testing: Generate, inject, and validate payloads in a single workflow

Quick Start

Prerequisites

  • Docker (required)
  • Python 3.11+
  • uv >= 0.5.30
  • Playwright: playwright install

Installation

# Install via pipx (recommended)
pipx install deadend_cli

# Or build from source
git clone https://github.com/xoxruns/deadend-cli.git
cd deadend-cli
uv sync && uv build

First Run

# Initialize configuration
deadend-cli init

Commands

deadend-cli init

Initialize configuration and set up pgvector database

deadend-cli eval-agent

Run evaluation against challenge datasets

  • --eval-metadata-file: Challenge dataset file
  • --provider: AI model provider to use
  • --model-name: AI model to use

deadend-cli version

Display current version


Architecture Summary

The agent uses a two-phase approach (reconnaissance → exploitation) with a supervisor-subagent hierarchy:

Supervisor: Maintains high-level goals, delegates to specialized subagents Subagents: Focused toolsets (Requester for HTTP, Shell for commands, Python for payloads) Policy: Confidence scores (0-1.0) determine whether to fail, expand, refine, or validate

Key innovation: When standard tools fail, the agent generates custom exploitation scripts and iterates based on observed feedback—solving challenges like blind SQL injection where static toolchains achieve 0%.

Read full architecture details →


Validation Configuration

The agent uses a composable validation system to determine when the root goal of an assessment has been achieved. Configuration is driven by a YAML file at ~/.cache/deadend/validation.yaml.

How It Works

After every supervisor execution, a validation gate runs a chain of strategies in order. The first strategy that returns stop: true triggers a report and exits the loop. If no strategy stops, the ADaPT policy (expand/refine/fail) continues as normal.

Available strategies:

Strategy Cost What it does
flag Zero (regex) Scans proofs, summaries, and context for a token matching a configurable regex pattern
judge 1 LLM call Agent that evaluates the full execution trace against the root goal. Self-throttles when no new evidence has appeared

Configuration File

Create ~/.cache/deadend/validation.yaml. A reference file with all options is at deadend_agent/src/deadend_agent/config/validation.default.yaml.

Examples

CTF with FLAG{} tokens (default if no file exists):

validation_format: "FLAG{}"
validation_type: "flag"
strategies:
  - name: flag
    pattern: "FLAG\\{[^}]+\\}"
  - name: judge

The flag strategy runs first (free regex check). If no match, the judge LLM evaluates whether the goal is done.

HackTheBox:

validation_format: "HTB{}"
validation_type: "flag"
strategies:
  - name: flag
    pattern: "HTB\\{[^}]+\\}"
  - name: judge
    validation_format: "HTB{}"

picoCTF:

validation_format: "picoCTF{}"
validation_type: "flag"
strategies:
  - name: flag
    pattern: "picoCTF\\{[^}]+\\}"
  - name: judge
    validation_format: "picoCTF{}"

Recon / security assessment (no flag to find):

validation_type: "security assessment"
strategies:
  - name: judge

No flag strategy — the LLM judge evaluates whether the recon goal (e.g., "map the attack surface") is satisfied based on accumulated evidence.

Flag-only (fastest, no LLM judge):

validation_format: "FLAG{}"
strategies:
  - name: flag

Only regex matching, no LLM call at all. Cheapest option for CTFs where the flag format is known.

Configuration Reference

Top-level fields:

Field Type Description
validation_format string | null Token format shown in agent prompts (e.g., "FLAG{}", "HTB{}"). Set to null for assessments without tokens
validation_type string | null Type label for the judge prompt (e.g., "flag", "security assessment")
strategies list Ordered list of strategy configurations

Per-strategy fields:

Field Strategy Description
name all Strategy name: "flag" or "judge"
pattern flag Regex pattern (default: FLAG\{[^}]+\})
validation_type judge Override top-level validation_type for this strategy
validation_format judge Override top-level validation_format for this strategy

Programmatic Override

Pass a custom config path when constructing the agent:

agent = DeadEndAgent(
    session_id=session_id,
    model=model,
    available_agents=agents,
    validation_config_path="/path/to/custom/validation.yaml",
)

Benchmark Results

Evaluated on XBOW's 104-challenge validation suite (black-box mode, January 2026):

Agent Success Rate Infrastructure Blind SQLi
XBOW (proprietary) 85% Proprietary ?
Cyber-AutoAgent 81% AWS Bedrock 0%
Deadend CLI 78% Fully local 33%
MAPTA 76.9% External APIs 0%

Models tested: Claude Sonnet 4.5 (~78%), Kimi K2 Thinking (~69%)

Strong performance: XSS (91%), Business Logic (86%), SQL injection (83%), IDOR (80%) Perfect scores: GraphQL, SSRF, NoSQL injection, HTTP method tampering (100%)

Technology Stack

  • LiteLLM: Multi-provider model abstraction (OpenAI, Anthropic, Ollama)
  • Instructor: Structured LLM outputs
  • pgvector: Vector database for context
  • Pyodide/WebAssembly: Python sandbox
  • Playwright: HTTP request generation
  • Docker: Shell command isolation

Configuration

Configuration is managed via ~/.cache/deadend/config.toml. Run deadend-cli init to set up your configuration interactively.


Current Status & Roadmap

Stable (v0.0.15)

✅ New architecture ✅ XBOW benchmark evaluation (78%) ✅ Custom sandboxed tools ✅ Multi-model support with liteLLM ✅ Two-phase execution (recon + exploitation)

In Progress (v0.1.0)

🚧 CLI Redesign with enhanced workflows:

  • Plan mode (review strategies before execution)
  • Preset configuration workflows (API testing, web apps, auth bypass)
  • Workflow automation (save/replay attack chains)

🚧 Context optimization (reduce redundant tool calls) 🚧 Secrets management improvements

Future roadmap

The current architecture proves competitive autonomous pentesting (78%) is achievable without cloud dependencies. Next challenges:

  • Open-Source Models: Achieve 75%+ with Llama/Qwen (eliminate proprietary dependencies)
  • Hybrid Testing: Add AST analysis for white-box code inspection
  • Adversarial Robustness: Train against WAFs, rate limiting, adaptive defenses
  • Multi-Target Orchestration: Test interconnected systems simultaneously
  • Context Efficiency: Better information sharing between components

Goal: Make autonomous pentesting accessible (open models), comprehensive (hybrid testing), and robust (works against real defenses).


Contributing

Contributions welcome in:

  • Context optimization algorithms
  • Vulnerability test cases
  • Open-weight model fine-tuning
  • Adversarial testing scenarios

See CONTRIBUTING.md for guidelines on how to contribute.


Citation

@software{deadend_cli_2026,
  author = {Yassine Bargach},
  title = {Deadend CLI: Feedback-Driven Autonomous Pentesting},
  year = {2026},
  url = {https://github.com/xoxruns/deadend-cli}
}

Disclaimer

For authorized security testing only. Unauthorized testing is illegal. Users are responsible for compliance with all applicable laws and obtaining proper authorization.


Links

📄 Architecture Deep Dive 📊 Benchmark Results 🐛 Report IssuesStar this repo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deadend_cli-0.1.15-py3-none-any.whl (342.8 kB view details)

Uploaded Python 3

File details

Details for the file deadend_cli-0.1.15-py3-none-any.whl.

File metadata

  • Download URL: deadend_cli-0.1.15-py3-none-any.whl
  • Upload date:
  • Size: 342.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for deadend_cli-0.1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 6ce8fc79c33174c1f58a9e7d2de3976dc01b3bdb534ae09f626bd21274075e12
MD5 5d6b5833e63a5da16966e95bac08ff8f
BLAKE2b-256 e45bd5d746b69ae3daad99e44ccda3cc57bb1f73874cdc241b3e35a69f443f3b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page