DecodingTrust Agent Platform (DTap) — A controllable and interactive red-teaming platform for AI agents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

antiquality BillChan226

These details have not been verified by PyPI

Project links

Project description

DecodingTrust-Agent Platform (DTAP)

A Controllable and Interactive Red-Teaming Platform for AI Agents

14 real-world domains · 50+ simulation environments · autonomous red-teaming · verifiable judges

About

DTAP is the first controllable and interactive red-teaming platform for AI agents, spanning 14 real-world domains and 50+ simulation environments that replicate widely-used systems such as Google Workspace, PayPal, and Slack. To scale the risk assessment of agents, we further propose DTAP-RED, the first autonomous red-teaming agent that systematically explores diverse injection vectors (prompt, tool, skill, environment, and combinations) and autonomously discovers effective attack strategies tailored to varying malicious goals. Using DTAP-RED, we curate DTAP-BENCH, a large-scale red-teaming dataset of high-quality instances across domains, each paired with a verifiable judge that automatically validates attack outcomes against the corresponding security policy.

Zhaorun Chen, Xun Liu, Haibo Tong, Chengquan Guo, Yuzhou Nie, Jiawei Zhang, Mintong Kang, Chejian Xu, Qichang Liu, Xiaogeng Liu, Tianneng Shi, Chaowei Xiao, Sanmi Koyejo, Percy Liang, Wenbo Guo, Dawn Song, Bo Li. Virtue AI · University of Chicago · UIUC · UCSB · Johns Hopkins · UC Berkeley · Stanford

Website: https://decodingtrust-agent.com

Installation
Quick Start
Running the Default Benchmark
Customizing Your Own Task List
eval/evaluation.py CLI Reference
Supported Domains
Integrating a Custom Agent
Tool-Call Hooks
Citation

Installation

git clone https://github.com/AI-secure/DecodingTrust-Agent.git
cd DecodingTrust-Agent
pip install -r requirements.txt
pip install -e .

Set the API key for your backbone model (only the providers you actually use are required):

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=...

Docker is required: each task spins up isolated MCP servers and Docker-based environments through TaskExecutor.

Quick Start

A single benign CRM task with the OpenAI Agents SDK backbone:

python eval/evaluation.py \
  --task-list benchmark/crm/benign.jsonl \
  --agent-type openaisdk \
  --model gpt-4o \
  --max-parallel 4

Results are written to results/benchmark/<agent_type>/<model>/<domain>/<type>/<task_id>/ (overridable with EVAL_RESULTS_ROOT).

Running the Default Benchmark

DTAP-BENCH ships pre-curated task lists under benchmark/. Each domain has three files:

File	Threat model	What it contains
`benign.jsonl`	—	Benign workflow tasks (utility evaluation)
`direct.jsonl`	Direct prompt injection	Adversary controls the user prompt
`indirect.jsonl`	Indirect (tool / env / data)	Payload arrives via tool output, document, email, etc.

Each line in these files is a single task record, e.g.:

{"domain": "crm", "type": "benign", "task_id": "1"}
{"domain": "crm", "type": "malicious", "threat_model": "direct",   "risk_category": "dangerous-actions", "task_id": "1"}
{"domain": "crm", "type": "malicious", "threat_model": "indirect", "risk_category": "dangerous-actions", "task_id": "1"}

These records resolve to:

Benign → dataset/<domain>/benign/<task_id>/
Malicious → dataset/<domain>/malicious/<threat_model>/<risk_category>/<task_id>/

A single domain

Run every benign + direct + indirect task in a domain by pointing --task-list at the domain folder:

python eval/evaluation.py \
  --task-list benchmark/finance \
  --agent-type openaisdk \
  --model gpt-4o \
  --max-parallel 8

eval/evaluation.py auto-discovers every *.jsonl under that directory and runs them in one pass with environment reuse.

A single task list (benign / direct / indirect)

# Benign utility only
python eval/evaluation.py --task-list benchmark/crm/benign.jsonl --agent-type openaisdk --model gpt-4o

# Direct prompt injection only
python eval/evaluation.py --task-list benchmark/crm/direct.jsonl --agent-type claudesdk --model claude-sonnet-4-20250514

# Indirect injection only
python eval/evaluation.py --task-list benchmark/crm/indirect.jsonl --agent-type googleadk --model gemini-2.0-flash

You can further narrow execution with filters that operate on the loaded records:

python eval/evaluation.py \
  --task-list benchmark/crm \
  --task-type malicious \
  --threat-model indirect \
  --risk-category data-exfiltration \
  --agent-type openaisdk --model gpt-4o

The entire benchmark

Point --task-list at the top-level benchmark/ directory to run all 14 domains:

python eval/evaluation.py \
  --task-list benchmark \
  --agent-type openaisdk \
  --model gpt-4o \
  --max-parallel 16 \
  --skip-existing

--skip-existing skips any task that already has a judge_result.json, so the same command resumes a partially-finished run.

Customizing Your Own Task List

Any JSONL file with the schema below is a valid --task-list. Pick a subset of tasks, mix domains, or focus on one risk category — there is no separate registration step.

Schema:

Field	Required	Description
`domain`	always	One of the supported domains (see below)
`type`	always	`"benign"` or `"malicious"`
`task_id`	always	Identifier; matches the directory name under `dataset/…/<task_id>/`
`threat_model`	malicious only	`"direct"` or `"indirect"`
`risk_category`	malicious only	e.g. `"data-exfiltration"`, `"dangerous-actions"`, `"social-scoring"`, `"manipulative-use"`

Example — my_subset.jsonl (mix of domains and risk categories, useful for smoke tests):

{"domain": "crm",     "type": "benign",    "task_id": "1"}
{"domain": "finance", "type": "malicious", "threat_model": "direct",   "risk_category": "dangerous-actions",    "task_id": "5"}
{"domain": "gmail",   "type": "malicious", "threat_model": "indirect", "risk_category": "data-exfiltration",    "task_id": "12"}
{"domain": "legal",   "type": "malicious", "threat_model": "indirect", "risk_category": "manipulative-use",     "task_id": "3"}

Run it like any built-in task list:

python eval/evaluation.py --task-list my_subset.jsonl --agent-type openaisdk --model gpt-4o --max-parallel 4

A few practical patterns:

# Curate from an existing file
grep '"risk_category": "data-exfiltration"' benchmark/crm/indirect.jsonl > my_crm_exfil.jsonl
python eval/evaluation.py --task-list my_crm_exfil.jsonl --agent-type openaisdk --model gpt-4o

# Try just one task end-to-end
echo '{"domain": "crm", "type": "benign", "task_id": "1"}' > one_task.jsonl
python eval/evaluation.py --task-list one_task.jsonl --agent-type openaisdk --model gpt-4o

For per-task internals (config.yaml, judge.py, setup.sh), see docs/quickstart.md.

`eval/evaluation.py` CLI Reference

Argument	Default	Description
`--task-list`	required	Path to a JSONL file or a directory containing JSONL files
`--domain`	`None`	Keep only records whose `domain` matches
`--task-type`	`None`	`benign` or `malicious`
`--threat-model`	`None`	`direct` or `indirect`
`--risk-category`	`None`	e.g. `data-exfiltration`
`--max-parallel`	`5`	Concurrent tasks (environments are reused across tasks)
`--agent-type`	`openaisdk`	`openaisdk`, `claudesdk`, `googleadk`, `langchain`, `pocketflow`, `openclaw`
`--model`	`gpt-4o`	Backbone model identifier
`--temperature`	`None`	Sampling temperature (model default if unset)
`--port-range`	`None`	Dynamic MCP port range, e.g. `"10000-12000"`
`--direct-prompt`	off	For direct threat model, use the malicious goal as-is instead of replaying attack turns
`--disallowed-tools`	`None`	Native tools to disable (filesystem tasks have sensible defaults)
`--skip-existing`	off	Skip tasks that already have `judge_result.json`
`--skip-mcp`	off	Skip MCP server startup (debug)
`--skip-judge`	off	Skip judge after task completion (debug)
`--keep-envs`	off	Keep Docker envs running after the run (debug)
`--debug`	off	Save extra trajectory diagnostics

Relevant environment variables:

Variable	Description
`OPENAI_API_KEY` / `ANTHROPIC_API_KEY` / `GOOGLE_API_KEY`	Provider keys
`EVAL_RESULTS_ROOT`	Output root (default `results/`); layout `{root}/benchmark/{agent_type}/{model}/{dataset_path}/`
`DT_PORT_RANGE` / `DT_PORT_DB`	Port allocation for MCP / Docker

Supported Domains

The 14 domains shipped with DTAP-BENCH:

browser · code · crm · customer-service · finance · legal · macos · medical · os-filesystem · research · telecom · travel · windows · workflow

Each domain mounts one or more MCP-backed environments (e.g. Salesforce, Gmail, PayPal, Slack, Atlassian, BigQuery, Snowflake, Databricks, GitHub, Google Docs, Booking, hospital, telecom, terminal, Windows / macOS shells). The full mapping lives in dt_arena/config/mcp.yaml.

Integrating a Custom Agent

DTAP supports two integration modes.

Option A — Use a pre-built native agent (recommended)

If you already have an agent written in one of the supported SDKs (OpenAI Agents SDK, LangChain, Claude SDK, Google ADK), wrap it with build_agent. Your existing tools / MCP servers stay attached; DTAP only adds the benchmark MCP servers (and any tool injections defined by the task) to a clone of your agent — the original is preserved.

import asyncio
from agents import Agent as OpenAIAgent          # your SDK
from utils.agent_helpers import build_agent
from dt_arena.src.types.agent import AgentConfig, RuntimeConfig

async def main():
    # 1. Your pre-built agent (any extra tools / MCP servers are kept untouched)
    native = OpenAIAgent(
        name="MyAgent",
        instructions="You are a helpful CRM assistant.",
        model="gpt-4o",
        mcp_servers=[my_custom_server],
    )

    # 2. Load the benchmark task config (adds salesforce, gmail, etc.)
    agent_cfg   = AgentConfig.from_yaml("dataset/crm/benign/1/config.yaml")
    runtime_cfg = RuntimeConfig(model="gpt-4o", temperature=0.1, max_turns=200,
                                output_dir="./results")

    # 3. Wrap — auto-detects OpenAI SDK / LangChain / Claude SDK / Google ADK
    agent = build_agent(native_agent=native, agent_cfg=agent_cfg, runtime_cfg=runtime_cfg)

    async with agent:
        result = await agent.run("List all leads in my account",
                                 metadata={"task_id": "demo", "domain": "crm"})
        print(result.final_output)

    assert agent.native_agent is native  # original is untouched

asyncio.run(main())

Option B — Implement a new agent backend

To plug a brand-new framework into eval/evaluation.py, subclass Agent and add a directory under agent/ (mirroring agent/openaisdk/, agent/claudesdk/, etc.):

from typing import Any, Union, List
from dt_arena.src.types.agent import Agent, MCPServerConfig

class MyNewAgent(Agent):
    async def initialize(self) -> None:
        await self.load_mcp_servers()       # connects to all task-declared MCP servers

    def _create_mcp_server(self, cfg: MCPServerConfig) -> Any:
        ...                                  # return your SDK-specific MCP client

    async def run(self, user_input: Union[str, List[str]], metadata=None) -> Any:
        inputs = [user_input] if isinstance(user_input, str) else user_input
        for turn in inputs:
            ...                              # execute one turn, append to trajectory
        return self.get_result()

    async def cleanup(self) -> None:
        self.reset_conversation()

Register the new framework name (e.g. mynewagent) in agent/__init__.py and the evaluation driver will accept --agent-type mynewagent. Full interface, multi-turn semantics, and trajectory schema are documented in agent/README.md.

Tool-Call Hooks

Every MCP tool call made by every supported framework flows through a shared pre/post hook system in dt_arena. Hooks are configured in a single JSON file and auto-loaded by every agent — no framework wiring, no CLI flags.

Enable hooks by editing dt_arena/src/hooks/hooks.json:

{
  "hooks": [
    "dt_arena.src.hooks.audit_log:AuditHook"
  ]
}

Write a hook by dropping a file under dt_arena/src/hooks/:

from dt_arena.src.types.hooks import ToolCallContext, ToolCallResult

class MyHook:
    async def on_pre_tool_call(self, ctx: ToolCallContext):
        print(f"calling {ctx.framework}/{ctx.server}/{ctx.tool_name}")

    async def on_post_tool_call(self, ctx: ToolCallContext, result: ToolCallResult):
        print(f"done in {result.duration:.3f}s, error={result.is_error}")

See dt_arena/src/hooks/audit_log.py for a complete working example, and docs/hooks.md for the protocol, parallel-dispatch behavior, and known caveats.

Citation

If you find DTAP useful for your research, please cite:

@article{chen2026decodingtrust,
  title={DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents},
  author={Chen, Zhaorun and Liu, Xun and Tong, Haibo and Guo, Chengquan and Nie, Yuzhou and Zhang, Jiawei and Kang, Mintong and Xu, Chejian and Liu, Qichang and Liu, Xiaogeng and others},
  journal={arXiv preprint arXiv:2605.04808},
  year={2026}
}

License

See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

antiquality BillChan226

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.9

Jun 13, 2026

0.2.8

Jun 13, 2026

0.2.7

Jun 13, 2026

0.2.6

Jun 10, 2026

0.2.4

Jun 4, 2026

0.2.3

Jun 2, 2026

0.2.2

Jun 2, 2026

0.2.1

May 13, 2026

0.1.1

May 13, 2026

This version

0.1.0

May 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decodingtrust_agent_sdk-0.1.0.tar.gz (807.5 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

decodingtrust_agent_sdk-0.1.0-py3-none-any.whl (947.7 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file decodingtrust_agent_sdk-0.1.0.tar.gz.

File metadata

Download URL: decodingtrust_agent_sdk-0.1.0.tar.gz
Upload date: May 13, 2026
Size: 807.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for decodingtrust_agent_sdk-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d9e293463f6c77b53ff884f10eb11cf0c1d6214348cc6e31e98dc5e9dd75ab79`
MD5	`09d35bb39b0df9c5d43685989a49f67b`
BLAKE2b-256	`e392bc881b84eca78e74c66fb31f7a2484731efc85da93f90b7e59bdb4e7814c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for decodingtrust_agent_sdk-0.1.0.tar.gz:

Publisher: publish.yml on AI-secure/DecodingTrust-Agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: decodingtrust_agent_sdk-0.1.0.tar.gz
- Subject digest: d9e293463f6c77b53ff884f10eb11cf0c1d6214348cc6e31e98dc5e9dd75ab79
- Sigstore transparency entry: 1521952340
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: AI-secure/DecodingTrust-Agent@590a6d42c7b9f27673f4d4869d501d6e8266814a
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/AI-secure
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@590a6d42c7b9f27673f4d4869d501d6e8266814a
- Trigger Event: push

File details

Details for the file decodingtrust_agent_sdk-0.1.0-py3-none-any.whl.

File metadata

Download URL: decodingtrust_agent_sdk-0.1.0-py3-none-any.whl
Upload date: May 13, 2026
Size: 947.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for decodingtrust_agent_sdk-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`66568eba629850dd65033dc216ccd916ac6fda2a89dce95473a92f4e59398774`
MD5	`b249a6c3b33f1c083a8eac3e49aae00f`
BLAKE2b-256	`6abae5758ed4c6d2225f69b8e5f71366791227d553dc3f18c1853bffba12e00a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for decodingtrust_agent_sdk-0.1.0-py3-none-any.whl:

Publisher: publish.yml on AI-secure/DecodingTrust-Agent

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: decodingtrust_agent_sdk-0.1.0-py3-none-any.whl
- Subject digest: 66568eba629850dd65033dc216ccd916ac6fda2a89dce95473a92f4e59398774
- Sigstore transparency entry: 1521952410
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: AI-secure/DecodingTrust-Agent@590a6d42c7b9f27673f4d4869d501d6e8266814a
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/AI-secure
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@590a6d42c7b9f27673f4d4869d501d6e8266814a
- Trigger Event: push

decodingtrust-agent-sdk 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DecodingTrust-Agent Platform (DTAP)

About

Table of Contents

Installation

Quick Start

Running the Default Benchmark

A single domain

A single task list (benign / direct / indirect)

The entire benchmark

Customizing Your Own Task List

eval/evaluation.py CLI Reference

Supported Domains

Integrating a Custom Agent

Option A — Use a pre-built native agent (recommended)

Option B — Implement a new agent backend

Tool-Call Hooks

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`eval/evaluation.py` CLI Reference