Evolutionary adversarial testing framework - Quality-diversity evolution for AI safety research

These details have not been verified by PyPI

Project links

Project description

rotalabs-redqueen

Evolutionary adversarial testing framework for LLMs from Rotalabs.

Quality-diversity evolution for automated red-teaming and AI safety research.

Overview

rotalabs-redqueen uses evolutionary algorithms to discover diverse, effective adversarial attacks against language models. Rather than manually crafting jailbreaks, it evolves attack strategies using:

Genetic Algorithms - Standard evolutionary optimization
MAP-Elites - Quality-diversity to find diverse successful attacks
Novelty Search - Reward novel behaviors, not just fitness

The framework operates at the semantic level - evolving attack strategies, encodings, and personas rather than raw tokens.

Installation

# Core package (includes mock target for testing)
pip install rotalabs-redqueen

# With OpenAI support
pip install rotalabs-redqueen[openai]

# With Anthropic support
pip install rotalabs-redqueen[anthropic]

# All LLM providers
pip install rotalabs-redqueen[llm]

# Development
pip install rotalabs-redqueen[dev]

Quick Start

Python API

import asyncio
from rotalabs_redqueen import (
    LLMAttackGenome,
    JailbreakFitness,
    MockTarget,
    HeuristicJudge,
    evolve,
)

async def main():
    # Create target and fitness function
    target = MockTarget()  # Use OpenAITarget or AnthropicTarget for real tests
    fitness = JailbreakFitness(target, HeuristicJudge())

    # Run evolution
    result = await evolve(
        genome_class=LLMAttackGenome,
        fitness=fitness,
        generations=50,
        population_size=20,
    )

    # Examine results
    if result.best:
        print(f"Best fitness: {result.best.fitness.value}")
        print(f"Best prompt: {result.best.genome.to_prompt()}")

asyncio.run(main())

Quality-Diversity with MAP-Elites

from rotalabs_redqueen import (
    LLMAttackGenome,
    JailbreakFitness,
    MockTarget,
    MapElitesArchive,
    BehaviorDimension,
    AttackStrategy,
    Encoding,
    evolve,
)

async def main():
    target = MockTarget()
    fitness = JailbreakFitness(target)

    # Create archive to track diverse solutions
    archive = MapElitesArchive(
        dimensions=[
            BehaviorDimension("strategy", 0.0, 1.0, len(AttackStrategy)),
            BehaviorDimension("encoding", 0.0, 1.0, len(Encoding)),
            BehaviorDimension("has_persona", 0.0, 1.0, 2),
        ]
    )

    result = await evolve(
        genome_class=LLMAttackGenome,
        fitness=fitness,
        generations=100,
        archive=archive,
    )

    # Check archive coverage
    coverage = result.archive.coverage()
    print(f"Archive coverage: {coverage.coverage_percent:.1f}%")
    print(f"Diverse solutions: {coverage.filled_cells}")

Command Line Interface

# Run a test campaign with mock target
rotalabs-redqueen run --target mock:random --generations 20

# Run against OpenAI (requires OPENAI_API_KEY)
rotalabs-redqueen run --target openai:gpt-4 --generations 50

# Use MAP-Elites for diverse attacks
rotalabs-redqueen run --target mock:random --use-archive

# Use LLM judge for more accurate evaluation
rotalabs-redqueen run --target mock:random --llm-judge anthropic:claude-sonnet-4-20250514

# Save results to file
rotalabs-redqueen run --target mock:random --output results.json

# Show available options
rotalabs-redqueen info --strategies
rotalabs-redqueen info --encodings
rotalabs-redqueen info --targets

Architecture

Core Framework

The core evolutionary framework is generic and can be used for any optimization problem:

Genome - Abstract base for evolvable representations
Fitness - Async fitness evaluation
Population - Collection of individuals with selection
Selection - Tournament, novelty, and hybrid selection
Archive - MAP-Elites quality-diversity archive
Evolution - Main evolutionary loop

LLM Domain

The LLM domain provides specialized components for adversarial testing:

LLMAttackGenome - Attack representation with strategies, encodings, personas
LLMTarget - Unified interface for OpenAI, Anthropic, Ollama, etc.
Judge - Evaluate attack success (heuristic or LLM-based)
JailbreakFitness - Fitness function combining target and judge

Attack Strategies

Strategy	Description
`ROLEPLAY`	Assume a character/persona (e.g., DAN)
`ENCODING`	Obfuscate the request (base64, rot13, etc.)
`AUTHORITY`	Claim special permissions
`HYPOTHETICAL`	Frame as fictional/educational
`MULTI_TURN`	Build up through conversation
`DIRECT`	Direct jailbreak attempt

Encodings

Encoding	Description
`NONE`	No encoding
`BASE64`	Base64 encoding
`ROT13`	ROT13 cipher
`LEETSPEAK`	L33t sp34k
`PIG_LATIN`	Pig Latin
`REVERSE`	Reversed text

Extending

Custom Genomes

from rotalabs_redqueen import Genome, BehaviorDescriptor

class MyGenome(Genome["MyGenome"]):
    @classmethod
    def random(cls, rng=None):
        # Create random genome
        ...

    def mutate(self, rng=None):
        # Return mutated copy
        ...

    def crossover(self, other, rng=None):
        # Return offspring
        ...

    def to_phenotype(self):
        # Convert to evaluable form
        ...

    def behavior(self):
        # Return behavior descriptor for QD
        return BehaviorDescriptor((dim1, dim2, ...))

Custom Fitness Functions

from rotalabs_redqueen import Fitness, FitnessResult, FitnessValue

class MyFitness(Fitness[MyGenome]):
    async def evaluate(self, genome):
        # Evaluate genome
        score = compute_score(genome.to_phenotype())
        return FitnessResult(
            fitness=FitnessValue(score),
            behavior=genome.behavior(),
        )

Custom Targets

from rotalabs_redqueen import LLMTarget, TargetResponse

class MyTarget(LLMTarget):
    @property
    def name(self):
        return "my-target"

    async def query(self, prompt):
        # Query your LLM
        response = await my_llm_api(prompt)
        return TargetResponse(
            content=response.text,
            model="my-model",
            tokens_used=response.tokens,
        )

Use Cases

Red-teaming: Discover vulnerabilities in LLM safety measures
Defense testing: Validate content filters and guardrails
Research: Study attack patterns and defenses systematically
Benchmarking: Compare robustness across models

Responsible Use

This tool is intended for defensive security research - testing and improving the safety of AI systems you own or have permission to test.

Do not use this tool to:

Attack systems without authorization
Generate harmful content for malicious purposes
Circumvent safety measures of production systems

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.0

Jan 31, 2026

This version

0.1.0

Jan 27, 2026

0.0.1

Jan 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rotalabs_redqueen-0.1.0.tar.gz (28.4 kB view details)

Uploaded Jan 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rotalabs_redqueen-0.1.0-py3-none-any.whl (30.0 kB view details)

Uploaded Jan 27, 2026 Python 3

File details

Details for the file rotalabs_redqueen-0.1.0.tar.gz.

File metadata

Download URL: rotalabs_redqueen-0.1.0.tar.gz
Upload date: Jan 27, 2026
Size: 28.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for rotalabs_redqueen-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0d966b33525594d50357a6a328a21db4b9598c3b602559b3858a1b1a9c982f38`
MD5	`f70039a8f42ec1857464a80afdfbd15a`
BLAKE2b-256	`0598088b67c8e574381655d18f5ea14daa419667779c20e9ebc77eee7d2c1404`

See more details on using hashes here.

File details

Details for the file rotalabs_redqueen-0.1.0-py3-none-any.whl.

File metadata

Download URL: rotalabs_redqueen-0.1.0-py3-none-any.whl
Upload date: Jan 27, 2026
Size: 30.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for rotalabs_redqueen-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`49e919e1e7c030066668c849fba9eabe2ad433b39d233b6cb351f235adafe500`
MD5	`0bb41abf34924404c0a1134fbf551100`
BLAKE2b-256	`1218307f1e387ec0f8a04aebfa30797381fccf789a469bc46fd754e8f72de06d`

See more details on using hashes here.

rotalabs-redqueen 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

rotalabs-redqueen

Overview

Installation

Quick Start

Python API

Quality-Diversity with MAP-Elites

Command Line Interface

Architecture

Core Framework

LLM Domain

Attack Strategies

Encodings

Extending

Custom Genomes

Custom Fitness Functions

Custom Targets

Use Cases

Responsible Use

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes