Skip to main content

Security evaluation harness for OpenClaw agents - powered by Tinman

Project description

Tinman OpenClaw Eval

Security evaluation harness for OpenClaw agents. Powered by Tinman.

Features

  • 70+ attack payloads across 5 categories
  • Synthetic Gateway for isolated testing
  • CI integration via SARIF, JUnit, and JSON outputs
  • Baseline assertions for regression testing
  • Supply chain attack testing for skill security

Attack Categories

Category Attacks Description
Prompt Injection 15 Jailbreaks, instruction override, prompt leaking
Tool Exfiltration 18 SSH keys, credentials, network exfil
Context Bleed 14 Cross-session leaks, memory extraction
Privilege Escalation 15 Sandbox escape, elevation bypass
Supply Chain 18 Malicious skills, dependency attacks

Installation

pip install tinman-openclaw-eval

Or from source:

git clone https://github.com/oliveskin/tinman-openclaw-eval
cd tinman-openclaw-eval
pip install -e ".[dev]"

Quick Start

# Run all attacks (mock gateway)
tinman-eval run

# Run specific category
tinman-eval run -c prompt_injection

# Run only high severity (S3+)
tinman-eval run -s S3

# Save report
tinman-eval run -o report.md

# List all attacks
tinman-eval list-attacks

# Run single attack
tinman-eval run-single PI-001 -v

CI Integration

GitHub Actions

name: Security Eval
on: [push, pull_request]

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - run: pip install tinman-openclaw-eval

      - name: Run security evaluation
        run: |
          tinman-eval run \
            --output security-report.json \
            --format json

      - name: Assert baseline
        run: |
          tinman-eval assert \
            security-report.json \
            --baseline expected/baseline.json

      - name: Upload SARIF
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: security-report.sarif
        if: always()

Generate Baseline

# Create initial baseline
tinman-eval baseline --output expected/baseline.json

# Update after intentional changes
tinman-eval run -o new-results.json
# Review and approve
mv new-results.json expected/baseline.json

Output Formats

Format Use Case
markdown Human-readable reports
json Programmatic analysis
sarif GitHub Code Scanning
junit CI test results

Custom Attacks

Add YAML files to attacks/ directory:

# attacks/custom.yaml
attacks:
  - id: CUSTOM-001
    name: "My custom attack"
    severity: S2
    payload: "Custom attack payload..."
    target: dm_channel
    expected_behavior: rejected_by_soul
    tags:
      - custom

Programmatic Usage

import asyncio
from tinman_openclaw_eval import EvalHarness, SyntheticGateway

async def main():
    harness = EvalHarness()

    # Run all attacks
    result = await harness.run()

    # Check for vulnerabilities
    print(f"Vulnerabilities: {result.vulnerabilities}")

    # Get specific category
    payloads = harness.get_payloads_by_category("prompt_injection")
    result = await harness.run(payloads=payloads)

asyncio.run(main())

Testing Against Real Gateway

# Connect to local OpenClaw Gateway
tinman-eval run --no-mock --gateway-url ws://127.0.0.1:18789

# With custom config
tinman-eval run --no-mock --gateway-url ws://192.168.1.100:18789

Severity Levels

Level Description Action
S4 Critical Immediate fix required
S3 High Fix before deploy
S2 Medium Review recommended
S1 Low Monitor
S0 Info Observation only

Links

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinman_openclaw_eval-0.2.0.tar.gz (47.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinman_openclaw_eval-0.2.0-py3-none-any.whl (56.4 kB view details)

Uploaded Python 3

File details

Details for the file tinman_openclaw_eval-0.2.0.tar.gz.

File metadata

  • Download URL: tinman_openclaw_eval-0.2.0.tar.gz
  • Upload date:
  • Size: 47.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for tinman_openclaw_eval-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9c683a164d1bc61b0e91c2e5a4330b3797cee9f4a7a43cfd74de1f1b7a863c16
MD5 55cb8b198119ac0ea5c63048a7c07381
BLAKE2b-256 77052a97acf7fcb4886fd3e2eaa0e7eb302ff57377aeab752e1109924e5994bd

See more details on using hashes here.

File details

Details for the file tinman_openclaw_eval-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for tinman_openclaw_eval-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b4dc1c372906b6d088dc12dfcb519f57b4875b97ce1ba884788e905456285dd0
MD5 0b248a64b5b4ed4df1a58b769f5c6a58
BLAKE2b-256 f85d1ed6e529e6e3abf52f6d7e8d8e0060002a930918524a9333d98962cb8dc2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page