TauBench extensions for DoomArena

These details have not been verified by PyPI

Project links

Project description

TauBench experiments

This repository contains tools and scripts for defining and evaluating threat models in the TauBench framework. TauBench focuses on LLM agents in tool-augmented environments, providing a way to simulate realistic adversarial attacks in domains like retail and airline customer service.

Overview

The framework provides a structured way to:

Simulate tool-based adversarial scenarios against LLM agents
Measure metrics like Attack Success Rate (ASR), Task Success Rate (TSR), and stealthiness of attacks
Compare defenses or model variants across structured multi-turn tasks
Evaluate robustness of tool-using agents in realistic settings

Domains

TauBench currently supports:

Retail: Multi-tool shopping agents that handle product searches, returns, and recommendations
Airline: Agents for booking flights, managing itineraries, and accessing sensitive account information

Installation

Install this package

# install main from this repo
pip install -e doomarena/taubench

# or install from pypi
pip install doomarena-taubench

Install taubench

git clone https://github.com/sierra-research/tau-bench scripts/tau-bench
pip install -e git+https://github.com/sierra-research/tau-bench.git#egg=tau_bench

You may also need to set your OpenRouter API key:

export OPENROUTER_API_KEY=<your-api-key>
export OPENAI_API_KEY=<your-api-key>

Usage

Example (Retail malicious user attack):

cd doomarena/taubench/src/doomarena/taubench
python scripts/attack_script.py \
  --config scripts/malicious_user_retail_attack.yaml

Experiment Configuration Options

combined_retail_attack.yaml
Runs multiple retail attack types in a single config for comprehensive evaluation.
malicious_catalog_fixed_injection_retail_attack.yaml
Inserts a malicious product entry into the retail catalog with a fixed injection strategy.
malicious_catalog_retail_attack.yaml
Injects a dynamic malicious catalog item to mislead the retail agent.
malicious_user_airline_attack.yaml
Simulates a malicious user attempting to manipulate the airline booking assistant.
malicious_user_fixed_injection_airline_attack.yaml
Similar to the above but with fixed injection content for consistent attack setup.
malicious_user_retail_attack.yaml
Tests how a retail agent handles adversarial inputs from a user aiming to bypass rules or gain unauthorized benefits.

Each config specifies:

Attack type and injection method
Success filters
Prompt construction (system + few-shot examples)

Results and Metrics

Experiment results are stored in the results/taubench directory, organized by the datetime when they were created. Each results folder includes:

Metadata about the attack configuration, agent, and dataset used
CSV files containing metrics such as:
- Attack Success Rate (ASR)
- Task Success Rate (TSR)
- Attack Stealth Rate
- Tool call counts and usage breakdowns
- Input/output token counts
- Step-by-step interaction logs with the agent

You can analyze per-task outcomes to understand failure modes, effectiveness of the attacks, and behavior of tool-augmented agents under adversarial pressure.

Project Structure

├── README.md 
├── pyproject.toml 
├── src/doomarena/taubench/                  
    ├── attack_gateway.py            # Entry point for attack orchestration
    ├── data/                        # JSON datasets for different domains
    │   ├── airline_classification.json
    │   ├── retail_classification.json
    │   └── sample_airline.json
    ├── filters/                     # Filters for selecting relevant agent actions
    │   ├── is_get_product_details_action_filter.py
    │   └── is_respond_action_filter.py
    ├── success_filters/            # Criteria for judging if attack succeeded
    │   ├── airline_info_leak_success_filter.py
    │   ├── llm_judge.py
    │   ├── retail_refund_success_filter.py
    │   ├── retail_secrets_success_filter.py
    │   └── send_certificate_success_filters.py
    ├── system_prompt_config/       # Prompt configurations and few-shot data
    │   ├── system_prompt_initialization.py
    │   ├── utils.py
    │   ├── dan_mode/
    │   │   ├── dan_mode_airline.txt
    │   │   ├── dan_mode_retail.txt
    │   │   └── dan_mode_retaildb.txt
    │   ├── few_shot_examples/
    │   │   ├── airline_few_shot.json
    │   │   ├── retail_few_shot.json
    │   │   └── retaildb_few_shot.json
    │   └── tools/
    │       ├── airline_tools.json
    │       └── retail_tools.json
    ├── scripts/                    # YAML attack configs and the main runner
    │   ├── combined_retail_attack.yaml
    │   ├── malicious_catalog_fixed_injection_retail_attack.yaml
    │   ├── malicious_catalog_retail_attack.yaml
    │   ├── malicious_user_airline_attack.yaml
    │   ├── malicious_user_fixed_injection_airline_attack.yaml
    │   ├── malicious_user_retail_attack.yaml
    │   └── attack_script.py
├── tests/
    ├── test_data
        ├── taubench_config.yaml
    ├── __init__.py
    ├── test_run_tau_bench_attack.py
    ├── test_taubench_attack_config.py
    ├── test_taubench_attack_gateway.py

Contributing

Contributions are welcome! You can extend this framework by:

Adding new attack vectors -- new prompt injections or misuse of tools in the airline or retail domains
Testing additional agent models -- evaluate how different LLMs or fine-tuned agents perform under attack
Implementing new evaluation metrics -- define novel task-specific or stealth-aware success criteria

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.4

Apr 19, 2025

0.0.3

Apr 18, 2025

0.0.2

Apr 18, 2025

This version

0.0.1

Apr 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doomarena_taubench-0.0.1.tar.gz (27.6 kB view details)

Uploaded Apr 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

doomarena_taubench-0.0.1-py3-none-any.whl (27.8 kB view details)

Uploaded Apr 18, 2025 Python 3

File details

Details for the file doomarena_taubench-0.0.1.tar.gz.

File metadata

Download URL: doomarena_taubench-0.0.1.tar.gz
Upload date: Apr 18, 2025
Size: 27.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for doomarena_taubench-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`67a781498d0af8399701ced1a8ff0175d99995b84d31d4edfabb5260e16e07e8`
MD5	`4ead9a772f45556f7f242b70da19b991`
BLAKE2b-256	`079575e4f787b84819c6b31be9790c66fff0d328b9ff8b9f6a69471d331cd395`

See more details on using hashes here.

File details

Details for the file doomarena_taubench-0.0.1-py3-none-any.whl.

File metadata

Download URL: doomarena_taubench-0.0.1-py3-none-any.whl
Upload date: Apr 18, 2025
Size: 27.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for doomarena_taubench-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f3c5d0d37028f365b514814e37a994d6d1dd64cb108552036da399eda9c358d4`
MD5	`e5d45f6438c3e859c2d578435df59266`
BLAKE2b-256	`3c999de7e912e229d13a128ab7eb3bcbe2fb7033cf8c9bff891f390792f9465d`

See more details on using hashes here.

doomarena-taubench 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

TauBench experiments

Overview

Domains

Installation

Usage

Experiment Configuration Options

Results and Metrics

Project Structure

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes