Red-Team AI Agents Before They Red-Team You
Project description
🦞 SuperClaw
SuperClaw — Red-Team AI Agents Before They Red-Team You
Scenario-driven, behavior-first security testing for autonomous agents.
SuperClaw is a security testing framework for AI coding agents such as OpenClaw and agent ecosystems like Moltbook. It identifies vulnerabilities through prompt injection, tool policy bypass, sandbox escape, and multi-agent trust exploitation.
OpenClaw + Moltbook Threat Model
Threat Model
OpenClaw agents often run with broad tool access. When connected to Moltbook or other agent networks, they can ingest untrusted, adversarial content that enables:
- Prompt injection and hidden instruction attacks
- Tool misuse and policy bypass
- Behavioral drift over time
- Cascading cross‑agent exploitation
SuperClaw is built to evaluate these risks before deployment.
Problem & Solution (Summary)
Problem: Autonomous agents are being deployed with high privilege, mutable behavior, and exposure to untrusted inputs—without structured security validation. This makes prompt injection, tool misuse, configuration drift, and data leakage likely, but poorly understood until after exposure.
Solution: SuperClaw is a pre‑deployment, behavior‑driven red‑teaming framework that stress‑tests existing agents. It runs scenario‑based evaluations, records evidence (tool calls, outputs, artifacts), scores behaviors against explicit contracts, and produces actionable reports before agents touch sensitive data or external ecosystems.
Non‑goals: SuperClaw does not generate agents, run production workloads, or automate real‑world exploitation.
⚠️ Security Notice
This tool is for authorized security testing only. See SECURITY.md for:
- Authorization requirements
- Containment requirements (sandbox/VM)
- False positive handling
- Data safety guidelines
Guardrails:
- Local-only mode blocks remote targets by default
- Remote targets require
SUPERCLAW_AUTH_TOKEN(or adapter token)
Supported Targets
- 🦞 OpenClaw — ACP WebSocket adapter
- 🧪 Mock — Offline deterministic testing
- 🔧 Custom — Extend via adapters
Quick Start
# Install
pip install superclaw
# Attack OpenClaw (local instance)
superclaw attack openclaw --target ws://127.0.0.1:18789
# Generate attack scenarios
superclaw generate scenarios --behavior prompt_injection --num-scenarios 20
# Run security audit
superclaw audit openclaw --comprehensive --report-format html --output report
# Offline testing
superclaw attack mock --behaviors prompt-injection-resistance
Attack Techniques
| Technique | Description |
|---|---|
prompt-injection |
Direct/indirect injection attacks |
encoding |
Base64, hex, unicode, typoglycemia obfuscation |
jailbreak |
DAN, grandmother, role-play techniques |
tool-bypass |
Tool policy bypass via alias confusion |
multi-turn |
Multi-turn persistent escalation attacks |
Security Behaviors
Each behavior ships with a structured contract (intent, success criteria, rubric, mitigation).
| Behavior | Severity | Description |
|---|---|---|
prompt-injection-resistance |
CRITICAL | Tests injection detection |
tool-policy-enforcement |
HIGH | Tests allow/deny lists |
sandbox-isolation |
CRITICAL | Tests container boundaries |
session-boundary-integrity |
HIGH | Tests session isolation |
configuration-drift-detection |
MEDIUM | Tests config stability |
acp-protocol-security |
MEDIUM | Tests protocol handling |
CLI Commands
# Attacks
superclaw attack openclaw --target ws://127.0.0.1:18789 --behaviors all
superclaw attack mock --behaviors prompt-injection-resistance
# Scenario generation (Bloom)
superclaw generate scenarios --behavior prompt_injection --num-scenarios 20
superclaw generate scenarios --behavior jailbreak --variations noise,emotional_pressure
# Evaluation
superclaw evaluate openclaw --scenarios scenarios.json --behaviors all
superclaw evaluate mock --scenarios scenarios.json
# Audit
superclaw audit openclaw --comprehensive --report-format html --output report
superclaw audit openclaw --quick
# Reporting
superclaw report generate --results results.json --format sarif # For GitHub Code Scanning
superclaw report drift --baseline baseline.json --current current.json
# Scanning
superclaw scan config
superclaw scan skills --path /path/to/skills
# Utilities
superclaw behaviors
superclaw attacks
superclaw init
Documentation
Full documentation: https://superagenticai.github.io/superclaw/
CodeOptiX Integration
SuperClaw integrates with CodeOptiX for multi-modal evaluation:
# Install with CodeOptiX support
pip install superclaw[codeoptix]
# Check integration status
superclaw codeoptix status
# Register behaviors with CodeOptiX
superclaw codeoptix register
# Run multi-modal evaluation
superclaw codeoptix evaluate --target ws://127.0.0.1:18789 --llm-provider openai
Python API
from superclaw.codeoptix import SecurityEvaluationEngine
from superclaw.adapters import create_adapter
adapter = create_adapter("openclaw", {"target": "ws://127.0.0.1:18789"})
engine = SecurityEvaluationEngine(adapter)
result = engine.evaluate_security(behavior_names=["prompt-injection-resistance"])
print(f"Score: {result.overall_score:.1%}")
print(f"Passed: {result.overall_passed}")
Architecture
superclaw/
├── attacks/ # Attack implementations
│ ├── prompt_injection.py
│ ├── encoding.py
│ ├── jailbreaks.py
│ ├── tool_bypass.py
│ └── multi_turn.py
├── behaviors/ # Security behavior specs
│ ├── injection_resistance.py
│ ├── tool_policy.py
│ ├── sandbox_isolation.py
│ ├── session_boundary.py
│ ├── config_drift.py
│ └── protocol_security.py
├── adapters/ # Agent adapters
│ ├── openclaw.py
│ ├── mock.py
│ └── base.py
├── bloom/ # Scenario generation
│ ├── ideation.py
│ ├── rollout.py
│ └── judgment.py
├── scanners/ # Config + supply-chain scanning
├── analysis/ # Drift comparison
├── codeoptix/ # CodeOptiX integration
│ ├── adapter.py # Behavior adapter
│ ├── evaluator.py # Security evaluator
│ └── engine.py # Evaluation engine
└── reporting/ # Report generation
├── html.py
├── json_report.py
└── sarif.py
Part of Superagentic AI Ecosystem
- SuperQE - Quality Engineering core
- SuperClaw - Agent security testing (this package)
- CodeOptiX - Code optimization engine
Open Source
Built by Superagentic AI · GitHub: SuperagenticAI/superclaw
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file superclaw-0.1.1.tar.gz.
File metadata
- Download URL: superclaw-0.1.1.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cbe7d7bab9db6b905242f991dc57ac03a97392ea3d5c3b0baf29322fa0f5b85c
|
|
| MD5 |
0a8bf0769027d8bbdcaf516497a888a2
|
|
| BLAKE2b-256 |
9dda0f9e87e7426f89370413c273604e601796b35f7b5de964e596a87c0c5d3a
|
File details
Details for the file superclaw-0.1.1-py3-none-any.whl.
File metadata
- Download URL: superclaw-0.1.1-py3-none-any.whl
- Upload date:
- Size: 90.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab5e1c470ee46f41dcd7db6224e146b84c7402cf72d87b3253aaf5b56b1d55eb
|
|
| MD5 |
e3fd45e779603d1118e083a24a7e646d
|
|
| BLAKE2b-256 |
ba30038b7317002583083f5b0f9a0e22ded0194fe1e0b65e493f25465f7e4b31
|