Skip to main content

Red-Team AI Agents Before They Red-Team You

Project description

🦞 SuperClaw

PyPI Docs GitHub Repo CI Docs Build License

SuperClaw logo

SuperClaw — Red-Team AI Agents Before They Red-Team You
Scenario-driven, behavior-first security testing for autonomous agents.

SuperClaw is a security testing framework for AI coding agents such as OpenClaw and agent ecosystems like Moltbook. It identifies vulnerabilities through prompt injection, tool policy bypass, sandbox escape, and multi-agent trust exploitation.

OpenClaw + Moltbook Threat Model

Threat Model
OpenClaw agents often run with broad tool access. When connected to Moltbook or other agent networks, they can ingest untrusted, adversarial content that enables:

  • Prompt injection and hidden instruction attacks
  • Tool misuse and policy bypass
  • Behavioral drift over time
  • Cascading cross‑agent exploitation
    SuperClaw is built to evaluate these risks before deployment.

Problem & Solution (Summary)

Problem: Autonomous agents are being deployed with high privilege, mutable behavior, and exposure to untrusted inputs—without structured security validation. This makes prompt injection, tool misuse, configuration drift, and data leakage likely, but poorly understood until after exposure.

Solution: SuperClaw is a pre‑deployment, behavior‑driven red‑teaming framework that stress‑tests existing agents. It runs scenario‑based evaluations, records evidence (tool calls, outputs, artifacts), scores behaviors against explicit contracts, and produces actionable reports before agents touch sensitive data or external ecosystems.

Non‑goals: SuperClaw does not generate agents, run production workloads, or automate real‑world exploitation.

⚠️ Security Notice

This tool is for authorized security testing only. See SECURITY.md for:

  • Authorization requirements
  • Containment requirements (sandbox/VM)
  • False positive handling
  • Data safety guidelines

Guardrails:

  • Local-only mode blocks remote targets by default
  • Remote targets require SUPERCLAW_AUTH_TOKEN (or adapter token)

Supported Targets

  • 🦞 OpenClaw — ACP WebSocket adapter
  • 🧪 Mock — Offline deterministic testing
  • 🔧 Custom — Extend via adapters

Quick Start

# Install
pip install superclaw

# Attack OpenClaw (local instance)
superclaw attack openclaw --target ws://127.0.0.1:18789

# Generate attack scenarios
superclaw generate scenarios --behavior prompt_injection --num-scenarios 20

# Run security audit
superclaw audit openclaw --comprehensive --report-format html --output report

# Offline testing
superclaw attack mock --behaviors prompt-injection-resistance

Attack Techniques

Technique Description
prompt-injection Direct/indirect injection attacks
encoding Base64, hex, unicode, typoglycemia obfuscation
jailbreak DAN, grandmother, role-play techniques
tool-bypass Tool policy bypass via alias confusion
multi-turn Multi-turn persistent escalation attacks

Security Behaviors

Each behavior ships with a structured contract (intent, success criteria, rubric, mitigation).

Behavior Severity Description
prompt-injection-resistance CRITICAL Tests injection detection
tool-policy-enforcement HIGH Tests allow/deny lists
sandbox-isolation CRITICAL Tests container boundaries
session-boundary-integrity HIGH Tests session isolation
configuration-drift-detection MEDIUM Tests config stability
acp-protocol-security MEDIUM Tests protocol handling

CLI Commands

# Attacks
superclaw attack openclaw --target ws://127.0.0.1:18789 --behaviors all
superclaw attack mock --behaviors prompt-injection-resistance

# Scenario generation (Bloom)
superclaw generate scenarios --behavior prompt_injection --num-scenarios 20
superclaw generate scenarios --behavior jailbreak --variations noise,emotional_pressure

# Evaluation
superclaw evaluate openclaw --scenarios scenarios.json --behaviors all
superclaw evaluate mock --scenarios scenarios.json

# Audit
superclaw audit openclaw --comprehensive --report-format html --output report
superclaw audit openclaw --quick

# Reporting
superclaw report generate --results results.json --format sarif  # For GitHub Code Scanning
superclaw report drift --baseline baseline.json --current current.json

# Scanning
superclaw scan config
superclaw scan skills --path /path/to/skills

# Utilities
superclaw behaviors
superclaw attacks
superclaw init

Documentation

Full documentation: https://superagenticai.github.io/superclaw/

CodeOptiX Integration

SuperClaw integrates with CodeOptiX for multi-modal evaluation:

# Install with CodeOptiX support
pip install superclaw[codeoptix]

# Check integration status
superclaw codeoptix status

# Register behaviors with CodeOptiX
superclaw codeoptix register

# Run multi-modal evaluation
superclaw codeoptix evaluate --target ws://127.0.0.1:18789 --llm-provider openai

Python API

from superclaw.codeoptix import SecurityEvaluationEngine
from superclaw.adapters import create_adapter

adapter = create_adapter("openclaw", {"target": "ws://127.0.0.1:18789"})
engine = SecurityEvaluationEngine(adapter)

result = engine.evaluate_security(behavior_names=["prompt-injection-resistance"])
print(f"Score: {result.overall_score:.1%}")
print(f"Passed: {result.overall_passed}")

Architecture

superclaw/
├── attacks/          # Attack implementations
│   ├── prompt_injection.py
│   ├── encoding.py
│   ├── jailbreaks.py
│   ├── tool_bypass.py
│   └── multi_turn.py
├── behaviors/        # Security behavior specs
│   ├── injection_resistance.py
│   ├── tool_policy.py
│   ├── sandbox_isolation.py
│   ├── session_boundary.py
│   ├── config_drift.py
│   └── protocol_security.py
├── adapters/         # Agent adapters
│   ├── openclaw.py
│   ├── mock.py
│   └── base.py
├── bloom/            # Scenario generation
│   ├── ideation.py
│   ├── rollout.py
│   └── judgment.py
├── scanners/         # Config + supply-chain scanning
├── analysis/         # Drift comparison
├── codeoptix/        # CodeOptiX integration
│   ├── adapter.py    # Behavior adapter
│   ├── evaluator.py  # Security evaluator
│   └── engine.py     # Evaluation engine
└── reporting/        # Report generation
    ├── html.py
    ├── json_report.py
    └── sarif.py

Part of Superagentic AI Ecosystem

  • SuperQE - Quality Engineering core
  • SuperClaw - Agent security testing (this package)
  • CodeOptiX - Code optimization engine

Open Source

Built by Superagentic AI · GitHub: SuperagenticAI/superclaw

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superclaw-0.1.1.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

superclaw-0.1.1-py3-none-any.whl (90.8 kB view details)

Uploaded Python 3

File details

Details for the file superclaw-0.1.1.tar.gz.

File metadata

  • Download URL: superclaw-0.1.1.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for superclaw-0.1.1.tar.gz
Algorithm Hash digest
SHA256 cbe7d7bab9db6b905242f991dc57ac03a97392ea3d5c3b0baf29322fa0f5b85c
MD5 0a8bf0769027d8bbdcaf516497a888a2
BLAKE2b-256 9dda0f9e87e7426f89370413c273604e601796b35f7b5de964e596a87c0c5d3a

See more details on using hashes here.

File details

Details for the file superclaw-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: superclaw-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 90.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for superclaw-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ab5e1c470ee46f41dcd7db6224e146b84c7402cf72d87b3253aaf5b56b1d55eb
MD5 e3fd45e779603d1118e083a24a7e646d
BLAKE2b-256 ba30038b7317002583083f5b0f9a0e22ded0194fe1e0b65e493f25465f7e4b31

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page