Skip to main content

Security evaluation harness for OpenClaw agents - powered by Tinman

Project description

Tinman OpenClaw Eval

Security evaluation harness for OpenClaw agents. Powered by Tinman.

Features

  • 270+ attack probes across 13 categories
  • Synthetic Gateway for isolated testing
  • CI integration via SARIF, JUnit, and JSON outputs
  • Baseline assertions for regression testing
  • Real-time monitoring via Gateway WebSocket

Attack Categories

Category Probes Description
Prompt Injection 15 Jailbreaks, DAN, instruction override, prompt leaking
Tool Exfiltration 42 SSH keys, cloud creds, supply-chain tokens, crypto wallets
Context Bleed 14 Cross-session leaks, memory extraction
Privilege Escalation 15 Sandbox escape, elevation bypass
Supply Chain 18 Malicious skills, dependency attacks
Financial 26 Crypto wallets (BTC, ETH, SOL, Base), transactions, exchange APIs
Unauthorized Action 28 Actions without consent, implicit execution
MCP Attacks 20 MCP tool abuse, server injection, cross-MCP exfil
Indirect Injection 20 Injection via files, URLs, documents, configs
Evasion Bypass 30 Unicode bypass, URL/base64/hex encoding, shell injection
Memory Poisoning 25 Context injection, RAG poisoning, history fabrication
Platform Specific 35 Windows (mimikatz, schtasks, PowerShell), macOS (LaunchAgents), Linux (systemd), cloud metadata

Installation

pip install tinman-openclaw-eval

Or from source:

git clone https://github.com/oliveskin/tinman-openclaw-eval
cd tinman-openclaw-eval
pip install -e ".[dev]"

Quick Start

# Run all attacks (mock gateway)
tinman-eval run

# Run specific category
tinman-eval run -c prompt_injection
tinman-eval run -c financial
tinman-eval run -c evasion_bypass

# Run only high severity (S3+)
tinman-eval run -s S3

# Save report
tinman-eval run -o report.md

# List all attacks
tinman-eval list-attacks

# Run single attack
tinman-eval run-single PI-001 -v

CI Integration

GitHub Actions

name: Security Eval
on: [push, pull_request]

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - run: pip install tinman-openclaw-eval

      - name: Run security evaluation
        run: |
          tinman-eval run \
            --output security-report.json \
            --format json

      - name: Assert baseline
        run: |
          tinman-eval assert \
            security-report.json \
            --baseline expected/baseline.json

      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: security-report.sarif
        if: always()

Generate Baseline

# Create initial baseline
tinman-eval baseline --output expected/baseline.json

# Update after intentional changes
tinman-eval run -o new-results.json
# Review and approve
mv new-results.json expected/baseline.json

Output Formats

Format Use Case
markdown Human-readable reports
json Programmatic analysis
sarif GitHub Code Scanning
junit CI test results

Custom Attacks

Add YAML files to attacks/ directory:

# attacks/custom.yaml
attacks:
  - id: CUSTOM-001
    name: "My custom attack"
    severity: S2
    payload: "Custom attack payload..."
    target: dm_channel
    expected_behavior: rejected_by_soul
    tags:
      - custom

Programmatic Usage

import asyncio
from tinman_openclaw_eval import EvalHarness, AttackCategory

async def main():
    harness = EvalHarness()

    # Run all attacks
    result = await harness.run()

    # Check for vulnerabilities
    print(f"Vulnerabilities: {result.vulnerabilities}")

    # Run specific categories
    result = await harness.run(categories=[
        AttackCategory.PROMPT_INJECTION,
        AttackCategory.FINANCIAL_TRANSACTION,
        AttackCategory.EVASION_BYPASS,
    ])

    # Run high severity only
    result = await harness.run(min_severity="S3")

asyncio.run(main())

Testing Against Real Gateway

# Connect to local OpenClaw Gateway
tinman-eval run --no-mock --gateway-url ws://127.0.0.1:18789

# With custom config
tinman-eval run --no-mock --gateway-url ws://192.168.1.100:18789

Attack Probe IDs

Prefix Category
PI-* Prompt Injection
TE-* Tool Exfiltration
CB-* Context Bleed
PE-* Privilege Escalation
SC-* Supply Chain
FT-* Financial Transaction
UA-* Unauthorized Action
MCP-* MCP Attacks
II-* Indirect Injection
EB-* Evasion Bypass
MP-* Memory Poisoning
PS-* Platform Specific

Severity Levels

Level Description Action
S4 Critical Immediate fix required
S3 High Fix before deploy
S2 Medium Review recommended
S1 Low Monitor
S0 Info Observation only

Integration with OpenClaw Skill

For continuous monitoring in OpenClaw, use the Tinman Skill:

# In OpenClaw
/tinman sweep                    # Run security sweep
/tinman sweep --category financial
/tinman watch                    # Real-time monitoring

Links

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinman_openclaw_eval-0.3.1.tar.gz (56.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinman_openclaw_eval-0.3.1-py3-none-any.whl (67.4 kB view details)

Uploaded Python 3

File details

Details for the file tinman_openclaw_eval-0.3.1.tar.gz.

File metadata

  • Download URL: tinman_openclaw_eval-0.3.1.tar.gz
  • Upload date:
  • Size: 56.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for tinman_openclaw_eval-0.3.1.tar.gz
Algorithm Hash digest
SHA256 39723bb6fe7bd0ca23d27e6c6e6840652825e051571d1e4e3c6c6a9f97357b6f
MD5 f4bbe47e0684c8a1052e4ad76b6572c9
BLAKE2b-256 8ec356c368f8cee685d80d1041f0c0a8104c606b8ccfb49f0fddf1b5ea12bbec

See more details on using hashes here.

File details

Details for the file tinman_openclaw_eval-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for tinman_openclaw_eval-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 30bd25e64d4fc290f1a5ae91aeb38db573593994411edc60aa92b50b8e2c7d13
MD5 0c3308bd98e73335c631ea9915eafc25
BLAKE2b-256 94f27c7744250acc0ad32e18c0cb647017f3d7322714760d1b92a54b64580df2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page