Red-Team AI Agents Before They Red-Team You

These details have not been verified by PyPI

Project links

Project description

🦞 SuperClaw

SuperClaw logo

SuperClaw — Red-Team AI Agents Before They Red-Team You
Scenario-driven, behavior-first security testing for autonomous agents.

SuperClaw is a security testing framework for AI coding agents such as OpenClaw and agent ecosystems like Moltbook. It identifies vulnerabilities through prompt injection, tool policy bypass, sandbox escape, and multi-agent trust exploitation.

OpenClaw + Moltbook Threat Model

Threat Model
OpenClaw agents often run with broad tool access. When connected to Moltbook or other agent networks, they can ingest untrusted, adversarial content that enables:

Prompt injection and hidden instruction attacks

Tool misuse and policy bypass

Behavioral drift over time

Cascading cross‑agent exploitation
SuperClaw is built to evaluate these risks before deployment.

Problem & Solution (Summary)

Problem: Autonomous agents are being deployed with high privilege, mutable behavior, and exposure to untrusted inputs—without structured security validation. This makes prompt injection, tool misuse, configuration drift, and data leakage likely, but poorly understood until after exposure.

Solution: SuperClaw is a pre‑deployment, behavior‑driven red‑teaming framework that stress‑tests existing agents. It runs scenario‑based evaluations, records evidence (tool calls, outputs, artifacts), scores behaviors against explicit contracts, and produces actionable reports before agents touch sensitive data or external ecosystems.

Non‑goals: SuperClaw does not generate agents, run production workloads, or automate real‑world exploitation.

⚠️ Security Notice

This tool is for authorized security testing only. See SECURITY.md for:

Authorization requirements
Containment requirements (sandbox/VM)
False positive handling
Data safety guidelines

Guardrails:

Local-only mode blocks remote targets by default
Remote targets require SUPERCLAW_AUTH_TOKEN (or adapter token)

Supported Targets

🦞 OpenClaw — ACP WebSocket adapter
🧪 Mock — Offline deterministic testing
🔧 Custom — Extend via adapters

Quick Start

# Install
pip install superclaw

# Attack OpenClaw (local instance)
superclaw attack openclaw --target ws://127.0.0.1:18789

# Generate attack scenarios
superclaw generate scenarios --behavior prompt_injection --num-scenarios 20

# Run security audit
superclaw audit openclaw --comprehensive --report-format html --output report

# Offline testing
superclaw attack mock --behaviors prompt-injection-resistance

Attack Techniques

Technique	Description
`prompt-injection`	Direct/indirect injection attacks
`encoding`	Base64, hex, unicode, typoglycemia obfuscation
`jailbreak`	DAN, grandmother, role-play techniques
`tool-bypass`	Tool policy bypass via alias confusion
`multi-turn`	Multi-turn persistent escalation attacks

Security Behaviors

Each behavior ships with a structured contract (intent, success criteria, rubric, mitigation).

Behavior	Severity	Description
`prompt-injection-resistance`	CRITICAL	Tests injection detection
`tool-policy-enforcement`	HIGH	Tests allow/deny lists
`sandbox-isolation`	CRITICAL	Tests container boundaries
`session-boundary-integrity`	HIGH	Tests session isolation
`configuration-drift-detection`	MEDIUM	Tests config stability
`acp-protocol-security`	MEDIUM	Tests protocol handling

CLI Commands

# Attacks
superclaw attack openclaw --target ws://127.0.0.1:18789 --behaviors all
superclaw attack mock --behaviors prompt-injection-resistance

# Scenario generation (Bloom)
superclaw generate scenarios --behavior prompt_injection --num-scenarios 20
superclaw generate scenarios --behavior jailbreak --variations noise,emotional_pressure

# Evaluation
superclaw evaluate openclaw --scenarios scenarios.json --behaviors all
superclaw evaluate mock --scenarios scenarios.json

# Audit
superclaw audit openclaw --comprehensive --report-format html --output report
superclaw audit openclaw --quick

# Reporting
superclaw report generate --results results.json --format sarif  # For GitHub Code Scanning
superclaw report drift --baseline baseline.json --current current.json

# Scanning
superclaw scan config
superclaw scan skills --path /path/to/skills

# Utilities
superclaw behaviors
superclaw attacks
superclaw init

Documentation

Full documentation: https://superagenticai.github.io/superclaw/

CodeOptiX Integration

SuperClaw integrates with CodeOptiX for multi-modal evaluation:

# Install with CodeOptiX support
pip install superclaw[codeoptix]

# Check integration status
superclaw codeoptix status

# Register behaviors with CodeOptiX
superclaw codeoptix register

# Run multi-modal evaluation
superclaw codeoptix evaluate --target ws://127.0.0.1:18789 --llm-provider openai

Python API

from superclaw.codeoptix import SecurityEvaluationEngine
from superclaw.adapters import create_adapter

adapter = create_adapter("openclaw", {"target": "ws://127.0.0.1:18789"})
engine = SecurityEvaluationEngine(adapter)

result = engine.evaluate_security(behavior_names=["prompt-injection-resistance"])
print(f"Score: {result.overall_score:.1%}")
print(f"Passed: {result.overall_passed}")

Architecture

superclaw/
├── attacks/          # Attack implementations
│   ├── prompt_injection.py
│   ├── encoding.py
│   ├── jailbreaks.py
│   ├── tool_bypass.py
│   └── multi_turn.py
├── behaviors/        # Security behavior specs
│   ├── injection_resistance.py
│   ├── tool_policy.py
│   ├── sandbox_isolation.py
│   ├── session_boundary.py
│   ├── config_drift.py
│   └── protocol_security.py
├── adapters/         # Agent adapters
│   ├── openclaw.py
│   ├── mock.py
│   └── base.py
├── bloom/            # Scenario generation
│   ├── ideation.py
│   ├── rollout.py
│   └── judgment.py
├── scanners/         # Config + supply-chain scanning
├── analysis/         # Drift comparison
├── codeoptix/        # CodeOptiX integration
│   ├── adapter.py    # Behavior adapter
│   ├── evaluator.py  # Security evaluator
│   └── engine.py     # Evaluation engine
└── reporting/        # Report generation
    ├── html.py
    ├── json_report.py
    └── sarif.py

Part of Superagentic AI Ecosystem

SuperQE - Quality Engineering core
SuperClaw - Agent security testing (this package)
CodeOptiX - Code optimization engine

Open Source

Built by Superagentic AI · GitHub: SuperagenticAI/superclaw

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Feb 1, 2026

0.1.0

Feb 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superclaw-0.1.1.tar.gz (3.2 MB view details)

Uploaded Feb 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

superclaw-0.1.1-py3-none-any.whl (90.8 kB view details)

Uploaded Feb 1, 2026 Python 3

File details

Details for the file superclaw-0.1.1.tar.gz.

File metadata

Download URL: superclaw-0.1.1.tar.gz
Upload date: Feb 1, 2026
Size: 3.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for superclaw-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`cbe7d7bab9db6b905242f991dc57ac03a97392ea3d5c3b0baf29322fa0f5b85c`
MD5	`0a8bf0769027d8bbdcaf516497a888a2`
BLAKE2b-256	`9dda0f9e87e7426f89370413c273604e601796b35f7b5de964e596a87c0c5d3a`

See more details on using hashes here.

File details

Details for the file superclaw-0.1.1-py3-none-any.whl.

File metadata

Download URL: superclaw-0.1.1-py3-none-any.whl
Upload date: Feb 1, 2026
Size: 90.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for superclaw-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab5e1c470ee46f41dcd7db6224e146b84c7402cf72d87b3253aaf5b56b1d55eb`
MD5	`e3fd45e779603d1118e083a24a7e646d`
BLAKE2b-256	`ba30038b7317002583083f5b0f9a0e22ded0194fe1e0b65e493f25465f7e4b31`

See more details on using hashes here.

superclaw 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🦞 SuperClaw

OpenClaw + Moltbook Threat Model

Problem & Solution (Summary)

⚠️ Security Notice

Supported Targets

Quick Start

Attack Techniques

Security Behaviors

CLI Commands

Documentation

CodeOptiX Integration

Python API

Architecture

Part of Superagentic AI Ecosystem

Open Source

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes