AI agent CLI for security research

Project description

Deadend CLI

[!WARNING] Active Development: This project is undergoing active development. Current features are functional but the interface and workflows are being improved based on new architecture and features.

Autonomous pentesting agent using feedback-driven iteration Achieves ~78% on XBOW benchmarks with fully local execution and model-agnostic architecture.

📄 Read Technical Deep Dive | 📊 Benchmark Results

What is Deadend CLI?

Deadend CLI is an autonomous web application penetration testing agent that uses feedback-driven iteration to adapt exploitation strategies. When standard tools fail, it generates custom Python payloads, observes responses, and iteratively refines its approach until breakthrough.

Key features:

Fully local execution (no cloud dependencies, zero data exfiltration)
Model-agnostic design (works with any deployable LLM)
Custom sandboxed tools (Playwright, Docker, WebAssembly)
ADaPT-based architecture with supervisor-subagent hierarchy
Confidence-based decision making (fail <20%, expand 20-60%, refine 60-80%, validate >80%)

Benchmark results: 78% on XBOW validation suite (76/98 challenges), including blind SQL injection exploits where other agents achieved 0%.

Read the architecture breakdown in our technical article →

Core Analysis Capabilities

The framework focuses on intelligent security analysis through:

🔍 Taint Analysis: Automated tracking of data flow from sources to sinks
🎯 Source/Sink Detection: Intelligent identification of entry points and vulnerable functions
🔗 Contextual Tool Integration: Smart connection to specialized tools for testing complex logic patterns
🧠 AI-Driven Reasoning: Context-aware analysis that mimics expert security thinking

🔧 Custom Pentesting Tools

Webapp-Specific Tooling: Custom tools designed specifically for web application penetration testing
Authentication Handling: Built-in support for session management, cookies, and auth flows
Fine-Grained Testing: Precise control over individual requests and parameters
Payload Generation: AI-powered payload creation tailored to target context
Automated Payload Testing: Generate, inject, and validate payloads in a single workflow

Quick Start

Prerequisites

Docker (required)
Python 3.11+
uv >= 0.5.30
Playwright: playwright install

Installation

# Install via pipx (recommended)
pipx install deadend_cli

# Or build from source
git clone https://github.com/xoxruns/deadend-cli.git
cd deadend-cli
uv sync && uv build

First Run

# Initialize configuration
deadend-cli init

Commands

`deadend-cli init`

Initialize configuration and set up pgvector database

`deadend-cli eval-agent`

Run evaluation against challenge datasets

--eval-metadata-file: Challenge dataset file
--provider: AI model provider to use
--model-name: AI model to use

`deadend-cli version`

Display current version

Architecture Summary

The agent uses a two-phase approach (reconnaissance → exploitation) with a supervisor-subagent hierarchy:

Supervisor: Maintains high-level goals, delegates to specialized subagents Subagents: Focused toolsets (Requester for HTTP, Shell for commands, Python for payloads) Policy: Confidence scores (0-1.0) determine whether to fail, expand, refine, or validate

Key innovation: When standard tools fail, the agent generates custom exploitation scripts and iterates based on observed feedback—solving challenges like blind SQL injection where static toolchains achieve 0%.

Read full architecture details →

Validation Configuration

The agent uses a composable validation system to determine when the root goal of an assessment has been achieved. Configuration is driven by a YAML file at ~/.cache/deadend/validation.yaml.

How It Works

After every supervisor execution, a validation gate runs a chain of strategies in order. The first strategy that returns stop: true triggers a report and exits the loop. If no strategy stops, the ADaPT policy (expand/refine/fail) continues as normal.

Available strategies:

Strategy	Cost	What it does
`flag`	Zero (regex)	Scans proofs, summaries, and context for a token matching a configurable regex pattern
`judge`	1 LLM call	Agent that evaluates the full execution trace against the root goal. Self-throttles when no new evidence has appeared

Configuration File

Create ~/.cache/deadend/validation.yaml. A reference file with all options is at deadend_agent/src/deadend_agent/config/validation.default.yaml.

Examples

CTF with FLAG{} tokens (default if no file exists):

validation_format: "FLAG{}"
validation_type: "flag"
strategies:
  - name: flag
    pattern: "FLAG\\{[^}]+\\}"
  - name: judge

The flag strategy runs first (free regex check). If no match, the judge LLM evaluates whether the goal is done.

HackTheBox:

validation_format: "HTB{}"
validation_type: "flag"
strategies:
  - name: flag
    pattern: "HTB\\{[^}]+\\}"
  - name: judge
    validation_format: "HTB{}"

picoCTF:

validation_format: "picoCTF{}"
validation_type: "flag"
strategies:
  - name: flag
    pattern: "picoCTF\\{[^}]+\\}"
  - name: judge
    validation_format: "picoCTF{}"

Recon / security assessment (no flag to find):

validation_type: "security assessment"
strategies:
  - name: judge

No flag strategy — the LLM judge evaluates whether the recon goal (e.g., "map the attack surface") is satisfied based on accumulated evidence.

Flag-only (fastest, no LLM judge):

validation_format: "FLAG{}"
strategies:
  - name: flag

Only regex matching, no LLM call at all. Cheapest option for CTFs where the flag format is known.

Configuration Reference

Top-level fields:

Field	Type	Description
`validation_format`	`string \| null`	Token format shown in agent prompts (e.g., `"FLAG{}"`, `"HTB{}"`). Set to `null` for assessments without tokens
`validation_type`	`string \| null`	Type label for the judge prompt (e.g., `"flag"`, `"security assessment"`)
`strategies`	`list`	Ordered list of strategy configurations

Per-strategy fields:

Field	Strategy	Description
`name`	all	Strategy name: `"flag"` or `"judge"`
`pattern`	`flag`	Regex pattern (default: `FLAG\{[^}]+\}`)
`validation_type`	`judge`	Override top-level `validation_type` for this strategy
`validation_format`	`judge`	Override top-level `validation_format` for this strategy

Programmatic Override

Pass a custom config path when constructing the agent:

agent = DeadEndAgent(
    session_id=session_id,
    model=model,
    available_agents=agents,
    validation_config_path="/path/to/custom/validation.yaml",
)

Benchmark Results

Evaluated on XBOW's 104-challenge validation suite (black-box mode, January 2026):

Agent	Success Rate	Infrastructure	Blind SQLi
XBOW (proprietary)	85%	Proprietary	?
Cyber-AutoAgent	81%	AWS Bedrock	0%
Deadend CLI	78%	Fully local	33%
MAPTA	76.9%	External APIs	0%

Models tested: Claude Sonnet 4.5 (~78%), Kimi K2 Thinking (~69%)

Strong performance: XSS (91%), Business Logic (86%), SQL injection (83%), IDOR (80%) Perfect scores: GraphQL, SSRF, NoSQL injection, HTTP method tampering (100%)

Technology Stack

LiteLLM: Multi-provider model abstraction (OpenAI, Anthropic, Ollama)
Instructor: Structured LLM outputs
pgvector: Vector database for context
Pyodide/WebAssembly: Python sandbox
Playwright: HTTP request generation
Docker: Shell command isolation

Configuration

Configuration is managed via ~/.cache/deadend/config.toml. Run deadend-cli init to set up your configuration interactively.

Current Status & Roadmap

Stable (v0.0.15)

✅ New architecture ✅ XBOW benchmark evaluation (78%) ✅ Custom sandboxed tools ✅ Multi-model support with liteLLM ✅ Two-phase execution (recon + exploitation)

In Progress (v0.1.0)

🚧 CLI Redesign with enhanced workflows:

Plan mode (review strategies before execution)
Preset configuration workflows (API testing, web apps, auth bypass)
Workflow automation (save/replay attack chains)

🚧 Context optimization (reduce redundant tool calls) 🚧 Secrets management improvements

Future roadmap

The current architecture proves competitive autonomous pentesting (78%) is achievable without cloud dependencies. Next challenges:

Open-Source Models: Achieve 75%+ with Llama/Qwen (eliminate proprietary dependencies)
Hybrid Testing: Add AST analysis for white-box code inspection
Adversarial Robustness: Train against WAFs, rate limiting, adaptive defenses
Multi-Target Orchestration: Test interconnected systems simultaneously
Context Efficiency: Better information sharing between components

Goal: Make autonomous pentesting accessible (open models), comprehensive (hybrid testing), and robust (works against real defenses).

Contributing

Contributions welcome in:

Context optimization algorithms
Vulnerability test cases
Open-weight model fine-tuning
Adversarial testing scenarios

See CONTRIBUTING.md for guidelines on how to contribute.

Citation

@software{deadend_cli_2026,
  author = {Yassine Bargach},
  title = {Deadend CLI: Feedback-Driven Autonomous Pentesting},
  year = {2026},
  url = {https://github.com/xoxruns/deadend-cli}
}

Disclaimer

For authorized security testing only. Unauthorized testing is illegal. Users are responsible for compliance with all applicable laws and obtaining proper authorization.

Links

📄 Architecture Deep Dive 📊 Benchmark Results 🐛 Report Issues ⭐ Star this repo

Project details

Release history Release notifications | RSS feed

This version

0.1.15

Apr 12, 2026

0.1.4

Apr 6, 2026

0.1.1

Feb 19, 2026

0.1.0

Feb 9, 2026

0.0.15

Jan 9, 2026

0.0.14

Dec 10, 2025

0.0.13

Oct 23, 2025

0.0.10

Oct 23, 2025

0.0.9

Oct 23, 2025

0.0.7

Oct 7, 2025

0.0.6

Oct 6, 2025

0.0.5

Oct 3, 2025

0.0.4

Oct 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

deadend_cli-0.1.15-py3-none-any.whl (342.8 kB view details)

Uploaded Apr 12, 2026 Python 3

File details

Details for the file deadend_cli-0.1.15-py3-none-any.whl.

File metadata

Download URL: deadend_cli-0.1.15-py3-none-any.whl
Upload date: Apr 12, 2026
Size: 342.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for deadend_cli-0.1.15-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6ce8fc79c33174c1f58a9e7d2de3976dc01b3bdb534ae09f626bd21274075e12`
MD5	`5d6b5833e63a5da16966e95bac08ff8f`
BLAKE2b-256	`e45bd5d746b69ae3daad99e44ccda3cc57bb1f73874cdc241b3e35a69f443f3b`

See more details on using hashes here.

deadend_cli 0.1.15

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Deadend CLI

What is Deadend CLI?

Core Analysis Capabilities

🔧 Custom Pentesting Tools

Quick Start

Prerequisites

Installation

First Run

Commands

deadend-cli init

deadend-cli eval-agent

deadend-cli version

Architecture Summary

Validation Configuration

How It Works

Configuration File

Examples

Configuration Reference

Programmatic Override

Benchmark Results

Technology Stack

Configuration

Current Status & Roadmap

Stable (v0.0.15)

In Progress (v0.1.0)

Future roadmap

Contributing

Citation

Disclaimer

Links

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

`deadend-cli init`

`deadend-cli eval-agent`

`deadend-cli version`