Security evaluation harness for OpenClaw agents - powered by Tinman
Project description
Tinman OpenClaw Eval
Security evaluation harness for OpenClaw agents. Powered by Tinman.
Features
- 270+ attack probes across 13 categories
- Synthetic Gateway for isolated testing
- CI integration via SARIF, JUnit, and JSON outputs
- Baseline assertions for regression testing
- Real-time monitoring via Gateway WebSocket
Attack Categories
| Category | Probes | Description |
|---|---|---|
| Prompt Injection | 15 | Jailbreaks, DAN, instruction override, prompt leaking |
| Tool Exfiltration | 42 | SSH keys, cloud creds, supply-chain tokens, crypto wallets |
| Context Bleed | 14 | Cross-session leaks, memory extraction |
| Privilege Escalation | 15 | Sandbox escape, elevation bypass |
| Supply Chain | 18 | Malicious skills, dependency attacks |
| Financial | 26 | Crypto wallets (BTC, ETH, SOL, Base), transactions, exchange APIs |
| Unauthorized Action | 28 | Actions without consent, implicit execution |
| MCP Attacks | 20 | MCP tool abuse, server injection, cross-MCP exfil |
| Indirect Injection | 20 | Injection via files, URLs, documents, configs |
| Evasion Bypass | 30 | Unicode bypass, URL/base64/hex encoding, shell injection |
| Memory Poisoning | 25 | Context injection, RAG poisoning, history fabrication |
| Platform Specific | 35 | Windows (mimikatz, schtasks, PowerShell), macOS (LaunchAgents), Linux (systemd), cloud metadata |
Installation
pip install tinman-openclaw-eval
Or from source:
git clone https://github.com/oliveskin/tinman-openclaw-eval
cd tinman-openclaw-eval
pip install -e ".[dev]"
Quick Start
# Run all attacks (mock gateway)
tinman-eval run
# Run specific category
tinman-eval run -c prompt_injection
tinman-eval run -c financial
tinman-eval run -c evasion_bypass
# Run only high severity (S3+)
tinman-eval run -s S3
# Save report
tinman-eval run -o report.md
# List all attacks
tinman-eval list-attacks
# Run single attack
tinman-eval run-single PI-001 -v
CI Integration
GitHub Actions
name: Security Eval
on: [push, pull_request]
jobs:
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install tinman-openclaw-eval
- name: Run security evaluation
run: |
tinman-eval run \
--output security-report.json \
--format json
- name: Assert baseline
run: |
tinman-eval assert \
security-report.json \
--baseline expected/baseline.json
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: security-report.sarif
if: always()
Generate Baseline
# Create initial baseline
tinman-eval baseline --output expected/baseline.json
# Update after intentional changes
tinman-eval run -o new-results.json
# Review and approve
mv new-results.json expected/baseline.json
Output Formats
| Format | Use Case |
|---|---|
markdown |
Human-readable reports |
json |
Programmatic analysis |
sarif |
GitHub Code Scanning |
junit |
CI test results |
Custom Attacks
Add YAML files to attacks/ directory:
# attacks/custom.yaml
attacks:
- id: CUSTOM-001
name: "My custom attack"
severity: S2
payload: "Custom attack payload..."
target: dm_channel
expected_behavior: rejected_by_soul
tags:
- custom
Programmatic Usage
import asyncio
from tinman_openclaw_eval import EvalHarness, AttackCategory
async def main():
harness = EvalHarness()
# Run all attacks
result = await harness.run()
# Check for vulnerabilities
print(f"Vulnerabilities: {result.vulnerabilities}")
# Run specific categories
result = await harness.run(categories=[
AttackCategory.PROMPT_INJECTION,
AttackCategory.FINANCIAL_TRANSACTION,
AttackCategory.EVASION_BYPASS,
])
# Run high severity only
result = await harness.run(min_severity="S3")
asyncio.run(main())
Testing Against Real Gateway
# Connect to local OpenClaw Gateway
tinman-eval run --no-mock --gateway-url ws://127.0.0.1:18789
# With custom config
tinman-eval run --no-mock --gateway-url ws://192.168.1.100:18789
Attack Probe IDs
| Prefix | Category |
|---|---|
PI-* |
Prompt Injection |
TE-* |
Tool Exfiltration |
CB-* |
Context Bleed |
PE-* |
Privilege Escalation |
SC-* |
Supply Chain |
FT-* |
Financial Transaction |
UA-* |
Unauthorized Action |
MCP-* |
MCP Attacks |
II-* |
Indirect Injection |
EB-* |
Evasion Bypass |
MP-* |
Memory Poisoning |
PS-* |
Platform Specific |
Severity Levels
| Level | Description | Action |
|---|---|---|
| S4 | Critical | Immediate fix required |
| S3 | High | Fix before deploy |
| S2 | Medium | Review recommended |
| S1 | Low | Monitor |
| S0 | Info | Observation only |
Integration with OpenClaw Skill
For continuous monitoring in OpenClaw, use the Tinman Skill:
# In OpenClaw
/tinman sweep # Run security sweep
/tinman sweep --category financial
/tinman watch # Real-time monitoring
Links
- Tinman - AI Failure Mode Research
- Tinman Skill - OpenClaw Integration
- OpenClaw - Personal AI Assistant
License
Apache-2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tinman_openclaw_eval-0.3.1.tar.gz.
File metadata
- Download URL: tinman_openclaw_eval-0.3.1.tar.gz
- Upload date:
- Size: 56.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
39723bb6fe7bd0ca23d27e6c6e6840652825e051571d1e4e3c6c6a9f97357b6f
|
|
| MD5 |
f4bbe47e0684c8a1052e4ad76b6572c9
|
|
| BLAKE2b-256 |
8ec356c368f8cee685d80d1041f0c0a8104c606b8ccfb49f0fddf1b5ea12bbec
|
File details
Details for the file tinman_openclaw_eval-0.3.1-py3-none-any.whl.
File metadata
- Download URL: tinman_openclaw_eval-0.3.1-py3-none-any.whl
- Upload date:
- Size: 67.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30bd25e64d4fc290f1a5ae91aeb38db573593994411edc60aa92b50b8e2c7d13
|
|
| MD5 |
0c3308bd98e73335c631ea9915eafc25
|
|
| BLAKE2b-256 |
94f27c7744250acc0ad32e18c0cb647017f3d7322714760d1b92a54b64580df2
|