SpeechJailbreaker: Framework for testing SpeechLLM robustness against jailbreak attacks
Project description
SpeechJailbreaker
SpeechJailbreaker is a research framework for evaluating how vulnerable speech/text-capable LLMs are to jailbreak attacks, and how well different defenses mitigate those attacks.
The repository provides:
- Multiple attack implementations (optimization, fuzzing, and prompt-based)
- A unified runner interface
- Optional defense wrappers/prompts
- Evaluation with
strongrejector default scoring
Features
- Unified command interface through
Scripts/run_interface.py - Configurable attack, model, defense, evaluation, and task count
- Support for common jailbreak attack families:
pgdfuzzericasurereasoningjbcboost_fuzzerpairtapautoattack
- Defense options including:
Noneguardadashieldself-remindericdsmoothllmspirit_bias/spirit_prune/spirit_patch(or short namesbias,prune,patch)
Project Layout
SpeechJailbreaker/
├── BOOST/ # Core attack implementations and utilities
├── Dataset/ # Prompt/target datasets
├── Defenses/ # Defense wrappers and implementations
├── Defense_prompt/ # Prompt-based defense templates
├── Experiments/ # Experiment scripts
├── Scripts/ # Shell entrypoints for each attack
├── speechjailbreaker/ # Python package modules
├── strongreject/ # StrongReject evaluation-related code
└── README.md
Installation
Option A) Install from PyPI
pip install speechjailbreaker
PyPI package: speechjailbreaker
After installing from PyPI, you can use the Python API directly:
from speechjailbreaker import run_attack, AttackConfig
config = AttackConfig(
attack="ica",
model_path="Qwen/Qwen2-Audio-7B-Instruct",
defence="smoothllm",
evaluation="strongreject",
num_tasks=2,
)
exit_code = run_attack(config)
print("Exit code:", exit_code)
Option B) Install from source (recommended for development)
1) Clone
git clone https://github.com/NWULIST/SpeechJailbreaker.git
cd SpeechJailbreaker
2) Create environment
conda env create -f environment.yml
conda activate xllm_env
If vllm is not included by your environment file, install it manually:
pip install vllm
3) Install package in editable mode
pip install -e .
Quick Start
List supported attacks/defenses:
python Scripts/run_interface.py --list-attacks
python Scripts/run_interface.py --list-defenses
Run an attack:
python Scripts/run_interface.py \
--attack ica \
--model_path Qwen/Qwen2-Audio-7B-Instruct \
--defence smoothllm \
--evaluation strongreject \
--num_tasks 2
Common arguments
--attack: attack method name (required unless listing)--model_path: HuggingFace/local model path--defence: defense method (default:None)--evaluation:defaultorstrongreject--num_tasks: number of tasks to run--batch_size: batch size per run--guard: guard model path (if needed by defense)--seed: random seed for reproducibility--few_shot_num: few-shot examples (icaonly)
Running Specific Scripts
You can call attack-specific shell runners directly, for example:
bash Scripts/run_GCG.sh
Other attack scripts follow the same pattern in Scripts/ (for example run_ICA.sh, run_SURE.sh, run_PAIR.sh, etc.).
Datasets
Example datasets shipped with the repository include:
Dataset/harmful.csvDataset/harmful_targets.csvDataset/Advbench.csv
Results and Logs
- Intermediate and final outputs are typically written under
Logs/and/orResults/, depending on the script. - For reproducibility, store the command, model version, defense setting, and random seed for each run.
License
This project is licensed under the MIT License. See LICENSE.
Disclaimer
This project is intended strictly for safety research and red-teaming evaluation. Do not use it for malicious activity or to attack production systems.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speechjailbreaker-0.1.3.tar.gz.
File metadata
- Download URL: speechjailbreaker-0.1.3.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a6eb81003122f1a15002f62c59577987551f37b9290b33faf399f6568cc88fa
|
|
| MD5 |
2a287bab181395a8020bba121a246a53
|
|
| BLAKE2b-256 |
2beda605cd8daecf950072a906cb56af1fc5ed3a8ef5c52b5d4caae7b045fdb3
|
File details
Details for the file speechjailbreaker-0.1.3-py3-none-any.whl.
File metadata
- Download URL: speechjailbreaker-0.1.3-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7badbc524fc7dfc25d3ab91105c915a934f85e6b17a47932f71f6041d477a782
|
|
| MD5 |
89031ae8ccb0774635afb68c710abfee
|
|
| BLAKE2b-256 |
de59b3e0fdd9562e0ea3970eae9272843ea050550994d3bdfb8884d1f1eec6a5
|