Security evaluation harness for OpenClaw agents - powered by Tinman
Project description
Tinman OpenClaw Eval
Security evaluation harness for OpenClaw agents. Powered by Tinman.
Features
- 70+ attack payloads across 5 categories
- Synthetic Gateway for isolated testing
- CI integration via SARIF, JUnit, and JSON outputs
- Baseline assertions for regression testing
- Supply chain attack testing for skill security
Attack Categories
| Category | Attacks | Description |
|---|---|---|
| Prompt Injection | 15 | Jailbreaks, instruction override, prompt leaking |
| Tool Exfiltration | 18 | SSH keys, credentials, network exfil |
| Context Bleed | 14 | Cross-session leaks, memory extraction |
| Privilege Escalation | 15 | Sandbox escape, elevation bypass |
| Supply Chain | 18 | Malicious skills, dependency attacks |
Installation
pip install tinman-openclaw-eval
Or from source:
git clone https://github.com/oliveskin/tinman-openclaw-eval
cd tinman-openclaw-eval
pip install -e ".[dev]"
Quick Start
# Run all attacks (mock gateway)
tinman-eval run
# Run specific category
tinman-eval run -c prompt_injection
# Run only high severity (S3+)
tinman-eval run -s S3
# Save report
tinman-eval run -o report.md
# List all attacks
tinman-eval list-attacks
# Run single attack
tinman-eval run-single PI-001 -v
CI Integration
GitHub Actions
name: Security Eval
on: [push, pull_request]
jobs:
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install tinman-openclaw-eval
- name: Run security evaluation
run: |
tinman-eval run \
--output security-report.json \
--format json
- name: Assert baseline
run: |
tinman-eval assert \
security-report.json \
--baseline expected/baseline.json
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: security-report.sarif
if: always()
Generate Baseline
# Create initial baseline
tinman-eval baseline --output expected/baseline.json
# Update after intentional changes
tinman-eval run -o new-results.json
# Review and approve
mv new-results.json expected/baseline.json
Output Formats
| Format | Use Case |
|---|---|
markdown |
Human-readable reports |
json |
Programmatic analysis |
sarif |
GitHub Code Scanning |
junit |
CI test results |
Custom Attacks
Add YAML files to attacks/ directory:
# attacks/custom.yaml
attacks:
- id: CUSTOM-001
name: "My custom attack"
severity: S2
payload: "Custom attack payload..."
target: dm_channel
expected_behavior: rejected_by_soul
tags:
- custom
Programmatic Usage
import asyncio
from tinman_openclaw_eval import EvalHarness, SyntheticGateway
async def main():
harness = EvalHarness()
# Run all attacks
result = await harness.run()
# Check for vulnerabilities
print(f"Vulnerabilities: {result.vulnerabilities}")
# Get specific category
payloads = harness.get_payloads_by_category("prompt_injection")
result = await harness.run(payloads=payloads)
asyncio.run(main())
Testing Against Real Gateway
# Connect to local OpenClaw Gateway
tinman-eval run --no-mock --gateway-url ws://127.0.0.1:18789
# With custom config
tinman-eval run --no-mock --gateway-url ws://192.168.1.100:18789
Severity Levels
| Level | Description | Action |
|---|---|---|
| S4 | Critical | Immediate fix required |
| S3 | High | Fix before deploy |
| S2 | Medium | Review recommended |
| S1 | Low | Monitor |
| S0 | Info | Observation only |
Links
License
Apache-2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tinman_openclaw_eval-0.1.2.tar.gz.
File metadata
- Download URL: tinman_openclaw_eval-0.1.2.tar.gz
- Upload date:
- Size: 36.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e97c07d2f07b90e5aaa21ffce20903d0cf2cab870b1011b3785dd23b3f2aaa2
|
|
| MD5 |
d0c7450fade58fffb20ad08a197cf0b5
|
|
| BLAKE2b-256 |
289fd22aba06fe1c4ee2f39b37c0f92a2c2160ce3267d45e53c977a2b812e580
|
File details
Details for the file tinman_openclaw_eval-0.1.2-py3-none-any.whl.
File metadata
- Download URL: tinman_openclaw_eval-0.1.2-py3-none-any.whl
- Upload date:
- Size: 42.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aaf1944b716cb51be212a564a07c68f95f1246fcd3ce6541e658b4958bb5d5f3
|
|
| MD5 |
58c17be1e8ca600a0247d3597a099870
|
|
| BLAKE2b-256 |
591c8b16de9ce7e8f37f973dab9c2421b7cc6b29d576e76715ede87bb5c434a9
|