A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents

Project description

AgentDojo: Benchmarking the Capabilities and Adversarial Robustness of LLM Agents

Quickstart

pip install agentdojo

If you want to use the prompt injection detector, you need to install the transformers extra:

pip install "agentdojo[transformers]"

Running the benchmark

The benchmark can be run with the benchmark script. Documentation on how to use the script can be obtained with the --help flag.

For example, to run the workspace suite on the tasks 0 and 1, with gpt-4o-2024-05-13 as the LLM, the tool filter as a defense, and the attacker with tool knowlege, run the following command:

python -m agentdojo.scripts.benchmark -s workspace -ut user_task_0 \
    -ut user_task_1 --model gpt-4o-2024-05-13 \
    --defense tool_filter --attacker tool_knowledge

To run the above, but on all suites and tasks, run the following:

python -m agentdojo.scripts.benchmark --model gpt-4o-2024-05-13 \
    --defense tool_filter --attacker tool_knowledge

Documentation of the Dojo

Take a look at our documentation.

Development set-up

Take a look at the development set-up docs.

Project details

Release history Release notifications | RSS feed

0.1.11

Jul 19, 2024

0.1.10

Jul 16, 2024

0.1.9

Jul 16, 2024

0.1.8

Jul 15, 2024

0.1.7

Jun 25, 2024

0.1.6

Jun 21, 2024

0.1.5

Jun 21, 2024

0.1.4

Jun 20, 2024

0.1.3

Jun 18, 2024

0.1.2

Jun 18, 2024

0.1.1

Jun 18, 2024

This version

0.1.0

Jun 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentdojo-0.1.0.tar.gz (16.9 MB view hashes)

Uploaded Jun 18, 2024 Source

Built Distribution

agentdojo-0.1.0-py3-none-any.whl (139.0 kB view hashes)

Uploaded Jun 18, 2024 Python 3

Hashes for agentdojo-0.1.0.tar.gz

Hashes for agentdojo-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`70cdb9fb2d394bd62ab6f3fc1b2936979973f84da17088380cff030e102c245a`
MD5	`bfe7b77f9d069c540a5a4f41f2cb1d6a`
BLAKE2b-256	`25388a704b3dfdada85eb014b76b769be294c84e4b746cca2f3fa23a90c0f7bb`

Hashes for agentdojo-0.1.0-py3-none-any.whl

Hashes for agentdojo-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7235d854933580075015a53a59887ff5d6c31406626608d5d92cd4ea1c4d2ec4`
MD5	`c6dd350200dda408a41fc26c24bba7d4`
BLAKE2b-256	`b603b3610098e2d282d91fc6918013da3d6f6ec124d294c9a16c9cdf4e0f6a07`