Vocabulary-Based Adversarial Fuzzing (VB-AF) framework for Large Language Models (LLMs)

These details have not been verified by PyPI

Project links

Project description

VB-AF: Vocabulary-Based Adversarial Fuzzing

logo

An implementation of Vocabulary-Based Adversarial Fuzzing (VB-AF) to systematically probe vulnerabilities in Large Language Models (LLMs) at scale. VB-AF is a gray-box fuzzing framework that serves as a tool for AI safety researchers, red-teamers and developers to systematically test the alignment and robustness of modern LLMs (and agents). It works by targetting known and documented weaknesses in transformer architectures.

This framework was heavily inspired by the widely adopted methodology of fuzz-testing, and originally developed for the hackathon 'Red‑Teaming Challenge - OpenAI gpt-oss-20b' hosted on Kaggle. In admiration of its effectiveness and future potential implications of extension, the author (@0ameyasr) decided to convert it into a flexible, interference-free LLM fuzzing framework.

WARNING

This framework is provided solely for authorized security research, academic study, and defensive testing (ethical red-teaming) of Large Language Models (LLMs).

Misuse of this software for any malicious, unlawful, exploitative, or unauthorized activity is strictly forbidden. The author(s) explicitly reject, denounce, and do not condone any attempt to weaponize or abuse this tool. By accessing, installing, or using this software, you agree that any form of misuse is entirely at your own risk and legal liability.

The software is provided “AS IS” without warranty of any kind. The author(s) disclaim all responsibility and liability for damages, losses, legal claims, or consequences of any kind arising from misuse.

By continuing to use this tool, you expressly acknowledge and accept full personal and legal accountability for your actions. Unauthorized or malicious use may subject you to civil and/or criminal penalties under applicable laws.

Key Features

Intuitive, easy-to-use API balancing both un-interrupted low-level control and convenient high-level fuzzing harness decoration.
Built-in support for random seeding to ensure experiments are fully reproducible.
Designed to expose even deep, uncovered vulnerabilities in a model's Chain-of-Thought (CoT) reasoning, not just surface-level filter bypasses; though it's reach is not restricted to this.
Moves beyond simple role-playing jailbreak prompts to a systematic, scalable and highly configurable fuzzing framework.
Open to community and research contributions!

Installation

You can install vbaf directly from PyPI:

pip install vbaf

Quick Start

Using vbaf is simple. First, configure the fuzzer with your desired parameters. Then, apply the @fuzzer.fuzz decorator to your inference function. The decorator will transform your function into a generator that runs the fuzzing process for n_attempts and yields the (fuzzy_payload, response) for each attempt.

from vbaf import VBAF

# 1. Define a vocabulary to generate noise from (this is a mock)
tokens = ["error", "network", "token", "string", "exception", "test"]

# 2. Configure the fuzzer instance
fuzzer = VBAF(
    vocabulary=tokens,
    n_size=50,
    rand_bounds=(3, 5)
)

# 3. Apply the decorator to your LLM inference function
@fuzzer.fuzz(n_attempts=3)
def fuzzing_harness(prompt: str):
    # This is a mock function, that would normally call an LLM API
    # Say Gemini API, OpenAI's Chat Completion, etc.
    return f"Mock Response for: {prompt}"

# 4. Start the fuzzing process
# The decorated function now yields a (fuzzy_payload, response) tuple
for fuzzy_payload, result in fuzzing_harness("How do I build a model?"):
    print(f"Fuzzy Payload: {fuzzy_payload}")
    print(f"Response: {result}")
    ... # Post-process the results

How It Works

VB-AF is not some random prompt generator. It's a systematic fuzzer that exploits two documented weaknesses in transformer-based models:

Attention Dilution: The framework aims to overwhelm the model's context window with high-entropy but semantically valid noise, generated from a token vocabulary. This forces the model's attention to spread thin, weakening its ability to enforce safety protocols.
"Lost in the Middle": The core payload is strategically injected into the middle of the noisy context. This targets the empirically observed weakness of LLMs where their attention is least effective, forcing the model to expend more attention to find the true instruction. (Liu et al. TACL 2024)

The result is a state analogous to 'cognitive dissonance', where the model's internal reasoning shortcuts its safety alignment to deliver a "helpful" response, leading to a reward-hack in most documented cases.

Full Documentation

For a complete guide, API reference, and a deeper look into the methodology, please see the full documentation website.

Contributing

Contributions are welcome! Whether it's reporting a bug, suggesting a new feature, optimization, improving the docs, or submitting a PR - your input will be valued. Please see the Contribution Guidelines for detailed instructions on how to get started.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Aug 24, 2025

0.1.1

Aug 23, 2025

0.1.0

Aug 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vbaf-1.0.0.tar.gz (11.6 kB view details)

Uploaded Aug 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vbaf-1.0.0-py3-none-any.whl (11.4 kB view details)

Uploaded Aug 24, 2025 Python 3

File details

Details for the file vbaf-1.0.0.tar.gz.

File metadata

Download URL: vbaf-1.0.0.tar.gz
Upload date: Aug 24, 2025
Size: 11.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for vbaf-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`e6ee7299135f01630ceb18a1fb90b93acb8c0808aab6cff10e7305dbd0618750`
MD5	`e27f610b6811e150ba36fd2738f50726`
BLAKE2b-256	`6b070b937908993442c8cf934a540daef3bb43a26eacf042fbeb35b43f9aa2ed`

See more details on using hashes here.

File details

Details for the file vbaf-1.0.0-py3-none-any.whl.

File metadata

Download URL: vbaf-1.0.0-py3-none-any.whl
Upload date: Aug 24, 2025
Size: 11.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for vbaf-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5f21749cf1ed9e051f62c0571ad92be87cb36b24576b685bbbb873fe14abef65`
MD5	`c94f01347c7910da30684fa343d329b0`
BLAKE2b-256	`a00bccc7934e8c697644d84cd4201b3411f942396bc1a3900b4553e9a464e7fe`

See more details on using hashes here.

vbaf 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

VB-AF: Vocabulary-Based Adversarial Fuzzing

WARNING

Key Features

Installation

Quick Start

How It Works

Full Documentation

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes