SpeechJailbreaker: Framework for testing SpeechLLM robustness against jailbreak attacks

These details have not been verified by PyPI

Project links

Project description

SpeechJailbreaker

SpeechJailbreaker is a research framework for evaluating how vulnerable speech/text-capable LLMs are to jailbreak attacks, and how well different defenses mitigate those attacks.

The repository provides:

Multiple attack implementations (optimization, fuzzing, and prompt-based)
A unified runner interface
Optional defense wrappers/prompts
Evaluation with strongreject or default scoring

Features

Unified command interface through Scripts/run_interface.py
Configurable attack, model, defense, evaluation, and task count
Support for common jailbreak attack families:
- pgd
- fuzzer
- ica
- sure
- reasoning
- jbc
- boost_fuzzer
- pair
- tap
- autoattack
Defense options including:
- None
- guard
- adashield
- self-reminder
- icd
- smoothllm
- spirit_bias / spirit_prune / spirit_patch (or short names bias, prune, patch)

Project Layout

SpeechJailbreaker/
├── BOOST/                  # Core attack implementations and utilities
├── Dataset/                # Prompt/target datasets
├── Defenses/               # Defense wrappers and implementations
├── Defense_prompt/         # Prompt-based defense templates
├── Experiments/            # Experiment scripts
├── Scripts/                # Shell entrypoints for each attack
├── speechjailbreaker/      # Python package modules
├── strongreject/           # StrongReject evaluation-related code
└── README.md

Installation

Option A) Install from PyPI

pip install speechjailbreaker

PyPI package: speechjailbreaker

After installing from PyPI, you can use the Python API directly:

from speechjailbreaker import run_attack, AttackConfig

config = AttackConfig(
    attack="ica",
    model_path="Qwen/Qwen2-Audio-7B-Instruct",
    defence="smoothllm",
    evaluation="strongreject",
    num_tasks=2,
)
exit_code = run_attack(config)
print("Exit code:", exit_code)

Option B) Install from source (recommended for development)

1) Clone

git clone https://github.com/NWULIST/SpeechJailbreaker.git
cd SpeechJailbreaker

2) Create environment

conda env create -f environment.yml
conda activate xllm_env

If vllm is not included by your environment file, install it manually:

pip install vllm

3) Install package in editable mode

pip install -e .

Quick Start

List supported attacks/defenses:

python Scripts/run_interface.py --list-attacks
python Scripts/run_interface.py --list-defenses

Run an attack:

python Scripts/run_interface.py \
  --attack ica \
  --model_path Qwen/Qwen2-Audio-7B-Instruct \
  --defence smoothllm \
  --evaluation strongreject \
  --num_tasks 2

Common arguments

--attack: attack method name (required unless listing)
--model_path: HuggingFace/local model path
--defence: defense method (default: None)
--evaluation: default or strongreject
--num_tasks: number of tasks to run
--batch_size: batch size per run
--guard: guard model path (if needed by defense)
--seed: random seed for reproducibility
--few_shot_num: few-shot examples (ica only)

Running Specific Scripts

You can call attack-specific shell runners directly, for example:

bash Scripts/run_GCG.sh

Other attack scripts follow the same pattern in Scripts/ (for example run_ICA.sh, run_SURE.sh, run_PAIR.sh, etc.).

Datasets

Example datasets shipped with the repository include:

Dataset/harmful.csv
Dataset/harmful_targets.csv
Dataset/Advbench.csv

Results and Logs

Intermediate and final outputs are typically written under Logs/ and/or Results/, depending on the script.
For reproducibility, store the command, model version, defense setting, and random seed for each run.

License

This project is licensed under the MIT License. See LICENSE.

Disclaimer

This project is intended strictly for safety research and red-teaming evaluation. Do not use it for malicious activity or to attack production systems.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.3

Mar 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechjailbreaker-0.1.3.tar.gz (9.1 kB view details)

Uploaded Mar 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speechjailbreaker-0.1.3-py3-none-any.whl (8.8 kB view details)

Uploaded Mar 19, 2026 Python 3

File details

Details for the file speechjailbreaker-0.1.3.tar.gz.

File metadata

Download URL: speechjailbreaker-0.1.3.tar.gz
Upload date: Mar 19, 2026
Size: 9.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for speechjailbreaker-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`4a6eb81003122f1a15002f62c59577987551f37b9290b33faf399f6568cc88fa`
MD5	`2a287bab181395a8020bba121a246a53`
BLAKE2b-256	`2beda605cd8daecf950072a906cb56af1fc5ed3a8ef5c52b5d4caae7b045fdb3`

See more details on using hashes here.

File details

Details for the file speechjailbreaker-0.1.3-py3-none-any.whl.

File metadata

Download URL: speechjailbreaker-0.1.3-py3-none-any.whl
Upload date: Mar 19, 2026
Size: 8.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for speechjailbreaker-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7badbc524fc7dfc25d3ab91105c915a934f85e6b17a47932f71f6041d477a782`
MD5	`89031ae8ccb0774635afb68c710abfee`
BLAKE2b-256	`de59b3e0fdd9562e0ea3970eae9272843ea050550994d3bdfb8884d1f1eec6a5`

See more details on using hashes here.

speechjailbreaker 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SpeechJailbreaker

Features

Project Layout

Installation

Option A) Install from PyPI

Option B) Install from source (recommended for development)

1) Clone

2) Create environment

3) Install package in editable mode

Quick Start

Common arguments

Running Specific Scripts

Datasets

Results and Logs

License

Disclaimer

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes