Skip to main content

A research workbench for developing and testing attacks against large language models, with a focus on prompt injection vulnerabilities and defenses.

Project description

Prompt Injection Workbench

CI

A research workbench for developing and testing attacks against large language models, with a focus on prompt injection vulnerabilities and defenses.

Key Features

  • State Machine Design: Fine-grained control over agent execution for advanced attack scenarios
  • SWE-bench Support: Benchmark agents on real-world code editing tasks from SWE-bench
  • Hydra Configuration: Powerful experiment orchestration with parameter sweeps
  • Extensible Architecture: Plugin system for custom agents, attacks, and environments
  • Usage Limits: Built-in cost and resource controls
  • Experiment Tracking: Automatic caching and result organization

Quick Start

Installation

Install the core package with desired optional features:

# Full installation (all features)
uv sync --all-extras

# Or install only what you need:
uv sync --extra agentdojo      # AgentDojo benchmark support
uv sync --extra swebench       # SWE-bench support
uv sync --extra docker         # Docker sandbox manager
uv sync --extra playwright     # Web automation environment

# Combine multiple extras
uv sync --extra agentdojo --extra docker

Available optional dependencies:

Extra Description
agentdojo AgentDojo dataset, environment, and attacks
swebench SWE-bench dataset for code editing benchmarks
docker Docker sandbox manager
playwright Web automation environment

Set up environment variables:

cp .env.example .env  # Fill in API keys

Export default configuration:

# Export to ./config (default)
uv run prompt-siren config export

# Export to custom directory
uv run prompt-siren config export ./my_config

Run experiments:

# Run benign-only evaluation
uv run prompt-siren run benign +dataset=agentdojo-workspace

# Run with attack
uv run prompt-siren run attack +dataset=agentdojo-workspace +attack=template_string

# Run SWE-bench evaluation (requires Docker)
uv run prompt-siren run benign +dataset=swebench

# Run SWE-bench with specific instances
uv run prompt-siren run benign +dataset=swebench dataset.config.instance_ids='["django__django-11179"]'

# Run SWE-bench Lite (smaller benchmark)
uv run prompt-siren run benign +dataset=swebench dataset.config.dataset_name="SWE-bench/SWE-bench_Lite"

# Override parameters
uv run prompt-siren run benign +dataset=agentdojo-workspace agent.config.model=azure:gpt-5

# Parameter sweep (multirun)
uv run prompt-siren run benign --multirun +dataset=agentdojo-workspace agent.config.model=azure:gpt-5,azure:gpt-5-nano

# Validate configuration without running
uv run prompt-siren config validate +dataset=agentdojo-workspace

# Use config file with environment/attack included (no overrides needed)
uv run prompt-siren run attack --config-dir=./my_config

Tip: Environment and attack can be specified via CLI overrides or included directly in config files. See the Configuration Guide for details.

Analyzing Results

After running experiments, use the results command to aggregate and analyze results:

# View results with default settings (pass@1, grouped by all configs)
uv run prompt-siren results

# Specify custom output directory
uv run prompt-siren results --output-dir=./traces

# Group results by different dimensions
uv run prompt-siren results --group-by=model
uv run prompt-siren results --group-by=env
uv run prompt-siren results --group-by=agent
uv run prompt-siren results --group-by=attack

# Compute pass@k metrics (k>1)
uv run prompt-siren results --k=5
uv run prompt-siren results --k=10

# Compute multiple pass@k metrics simultaneously
uv run prompt-siren results --k=1 --k=5 --k=10

# Different output formats
uv run prompt-siren results --format=json
uv run prompt-siren results --format=csv

Understanding pass@k Metrics

  • pass@1 (default): Averages scores across all runs for each task. A continuous metric showing average performance.
  • pass@k (k>1): Binary success metric. A task "passes" if at least one of k runs achieves a perfect score (1.0). Uses statistical estimation when more than k samples are available.

Results Columns

The results table includes:

  • Configuration columns: env_type, agent_type, attack_type, model, config_hash
  • Metric columns: benign_pass@k, attack_pass@k - The pass@k scores
  • Metadata columns:
    • n_tasks - Total number of tasks aggregated
    • avg_n_samples - Average number of runs per task
    • k - The k value (when computing multiple pass@k metrics)

Platform Requirements

  • Python: 3.10+
  • Package Manager for development: uv (for dependency management)
  • Operating System: Linux or macOS (Windows not supported)
  • Docker: Required for SWE-bench integration and some environments
    • Must be running and accessible
    • Base images should have /bin/bash available (Alpine images need bash package)

Documentation

Development

# Lint and format
uv run ruff check --fix
uv run ruff format
uv run basedpyright

# Test
uv run pytest -v

License

Prompt Siren is licensed under an MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_siren-0.0.1a2.tar.gz (144.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prompt_siren-0.0.1a2-py3-none-any.whl (213.5 kB view details)

Uploaded Python 3

File details

Details for the file prompt_siren-0.0.1a2.tar.gz.

File metadata

  • Download URL: prompt_siren-0.0.1a2.tar.gz
  • Upload date:
  • Size: 144.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for prompt_siren-0.0.1a2.tar.gz
Algorithm Hash digest
SHA256 3c25eca462974991bc1eab41ce3fa7fb7da558cf79805e6e936e2a3d0c1a5cf7
MD5 c9b027903d84d1a3176523afba475abf
BLAKE2b-256 c6357e31320bc38c1be0a2e284d1a23968470f4fd092ea431e185ae342970708

See more details on using hashes here.

File details

Details for the file prompt_siren-0.0.1a2-py3-none-any.whl.

File metadata

  • Download URL: prompt_siren-0.0.1a2-py3-none-any.whl
  • Upload date:
  • Size: 213.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for prompt_siren-0.0.1a2-py3-none-any.whl
Algorithm Hash digest
SHA256 9cd5fbb05dca7e887fa9685080008719e7e474d62e893eaee7b6cb2c332f081b
MD5 251b5b357a80817849bcfc469a06e766
BLAKE2b-256 8cf17a996f8511d8369fc60894f1ff71226182a703b98f34c5a061edc6afc91f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page