4 projects
prompt-siren
A research workbench for developing and testing attacks against large language models, with a focus on prompt injection vulnerabilities and defenses.
agentdojo
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
agentdojo-core
Core code for AgentDojo
jailbreakbench
An Open Robustness Benchmark for Jailbreaking Language Models