Skip to main content

SpeechJailbreaker: Framework for testing SpeechLLM robustness against jailbreak attacks

Project description

SpeechJailbreaker

SpeechJailbreaker is a research framework for evaluating how vulnerable speech/text-capable LLMs are to jailbreak attacks, and how well different defenses mitigate those attacks.

The repository provides:

  • Multiple attack implementations (optimization, fuzzing, and prompt-based)
  • A unified runner interface
  • Optional defense wrappers/prompts
  • Evaluation with strongreject or default scoring

Features

  • Unified command interface through Scripts/run_interface.py
  • Configurable attack, model, defense, evaluation, and task count
  • Support for common jailbreak attack families:
    • pgd
    • fuzzer
    • ica
    • sure
    • reasoning
    • jbc
    • boost_fuzzer
    • pair
    • tap
    • autoattack
  • Defense options including:
    • None
    • guard
    • adashield
    • self-reminder
    • icd
    • smoothllm
    • spirit_bias / spirit_prune / spirit_patch (or short names bias, prune, patch)

Project Layout

SpeechJailbreaker/
├── BOOST/                  # Core attack implementations and utilities
├── Dataset/                # Prompt/target datasets
├── Defenses/               # Defense wrappers and implementations
├── Defense_prompt/         # Prompt-based defense templates
├── Experiments/            # Experiment scripts
├── Scripts/                # Shell entrypoints for each attack
├── speechjailbreaker/      # Python package modules
├── strongreject/           # StrongReject evaluation-related code
└── README.md

Installation

Option A) Install from PyPI

pip install speechjailbreaker

PyPI package: speechjailbreaker

After installing from PyPI, you can use the Python API directly:

from speechjailbreaker import run_attack, AttackConfig

config = AttackConfig(
    attack="ica",
    model_path="Qwen/Qwen2-Audio-7B-Instruct",
    defence="smoothllm",
    evaluation="strongreject",
    num_tasks=2,
)
exit_code = run_attack(config)
print("Exit code:", exit_code)

Option B) Install from source (recommended for development)

1) Clone

git clone https://github.com/NWULIST/SpeechJailbreaker.git
cd SpeechJailbreaker

2) Create environment

conda env create -f environment.yml
conda activate xllm_env

If vllm is not included by your environment file, install it manually:

pip install vllm

3) Install package in editable mode

pip install -e .

Quick Start

List supported attacks/defenses:

python Scripts/run_interface.py --list-attacks
python Scripts/run_interface.py --list-defenses

Run an attack:

python Scripts/run_interface.py \
  --attack ica \
  --model_path Qwen/Qwen2-Audio-7B-Instruct \
  --defence smoothllm \
  --evaluation strongreject \
  --num_tasks 2

Common arguments

  • --attack: attack method name (required unless listing)
  • --model_path: HuggingFace/local model path
  • --defence: defense method (default: None)
  • --evaluation: default or strongreject
  • --num_tasks: number of tasks to run
  • --batch_size: batch size per run
  • --guard: guard model path (if needed by defense)
  • --seed: random seed for reproducibility
  • --few_shot_num: few-shot examples (ica only)

Running Specific Scripts

You can call attack-specific shell runners directly, for example:

bash Scripts/run_GCG.sh

Other attack scripts follow the same pattern in Scripts/ (for example run_ICA.sh, run_SURE.sh, run_PAIR.sh, etc.).

Datasets

Example datasets shipped with the repository include:

  • Dataset/harmful.csv
  • Dataset/harmful_targets.csv
  • Dataset/Advbench.csv

Results and Logs

  • Intermediate and final outputs are typically written under Logs/ and/or Results/, depending on the script.
  • For reproducibility, store the command, model version, defense setting, and random seed for each run.

License

This project is licensed under the MIT License. See LICENSE.

Disclaimer

This project is intended strictly for safety research and red-teaming evaluation. Do not use it for malicious activity or to attack production systems.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechjailbreaker-0.1.3.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speechjailbreaker-0.1.3-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file speechjailbreaker-0.1.3.tar.gz.

File metadata

  • Download URL: speechjailbreaker-0.1.3.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for speechjailbreaker-0.1.3.tar.gz
Algorithm Hash digest
SHA256 4a6eb81003122f1a15002f62c59577987551f37b9290b33faf399f6568cc88fa
MD5 2a287bab181395a8020bba121a246a53
BLAKE2b-256 2beda605cd8daecf950072a906cb56af1fc5ed3a8ef5c52b5d4caae7b045fdb3

See more details on using hashes here.

File details

Details for the file speechjailbreaker-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for speechjailbreaker-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7badbc524fc7dfc25d3ab91105c915a934f85e6b17a47932f71f6041d477a782
MD5 89031ae8ccb0774635afb68c710abfee
BLAKE2b-256 de59b3e0fdd9562e0ea3970eae9272843ea050550994d3bdfb8884d1f1eec6a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page