Evolutionary adversarial testing framework - Quality-diversity evolution for AI safety research
Project description
rotalabs-redqueen
Evolutionary adversarial testing framework for LLMs from Rotalabs.
Quality-diversity evolution for automated red-teaming and AI safety research.
Overview
rotalabs-redqueen uses evolutionary algorithms to discover diverse, effective adversarial attacks against language models. Rather than manually crafting jailbreaks, it evolves attack strategies using:
- Genetic Algorithms - Standard evolutionary optimization
- MAP-Elites - Quality-diversity to find diverse successful attacks
- Novelty Search - Reward novel behaviors, not just fitness
The framework operates at the semantic level - evolving attack strategies, encodings, and personas rather than raw tokens.
Installation
# Core package (includes mock target for testing)
pip install rotalabs-redqueen
# With OpenAI support
pip install rotalabs-redqueen[openai]
# With Anthropic support
pip install rotalabs-redqueen[anthropic]
# All LLM providers
pip install rotalabs-redqueen[llm]
# Development
pip install rotalabs-redqueen[dev]
Quick Start
Python API
import asyncio
from rotalabs_redqueen import (
LLMAttackGenome,
JailbreakFitness,
MockTarget,
HeuristicJudge,
evolve,
)
async def main():
# Create target and fitness function
target = MockTarget() # Use OpenAITarget or AnthropicTarget for real tests
fitness = JailbreakFitness(target, HeuristicJudge())
# Run evolution
result = await evolve(
genome_class=LLMAttackGenome,
fitness=fitness,
generations=50,
population_size=20,
)
# Examine results
if result.best:
print(f"Best fitness: {result.best.fitness.value}")
print(f"Best prompt: {result.best.genome.to_prompt()}")
asyncio.run(main())
Quality-Diversity with MAP-Elites
from rotalabs_redqueen import (
LLMAttackGenome,
JailbreakFitness,
MockTarget,
MapElitesArchive,
BehaviorDimension,
AttackStrategy,
Encoding,
evolve,
)
async def main():
target = MockTarget()
fitness = JailbreakFitness(target)
# Create archive to track diverse solutions
archive = MapElitesArchive(
dimensions=[
BehaviorDimension("strategy", 0.0, 1.0, len(AttackStrategy)),
BehaviorDimension("encoding", 0.0, 1.0, len(Encoding)),
BehaviorDimension("has_persona", 0.0, 1.0, 2),
]
)
result = await evolve(
genome_class=LLMAttackGenome,
fitness=fitness,
generations=100,
archive=archive,
)
# Check archive coverage
coverage = result.archive.coverage()
print(f"Archive coverage: {coverage.coverage_percent:.1f}%")
print(f"Diverse solutions: {coverage.filled_cells}")
Command Line Interface
# Run a test campaign with mock target
rotalabs-redqueen run --target mock:random --generations 20
# Run against OpenAI (requires OPENAI_API_KEY)
rotalabs-redqueen run --target openai:gpt-4 --generations 50
# Use MAP-Elites for diverse attacks
rotalabs-redqueen run --target mock:random --use-archive
# Use LLM judge for more accurate evaluation
rotalabs-redqueen run --target mock:random --llm-judge anthropic:claude-sonnet-4-20250514
# Save results to file
rotalabs-redqueen run --target mock:random --output results.json
# Show available options
rotalabs-redqueen info --strategies
rotalabs-redqueen info --encodings
rotalabs-redqueen info --targets
Architecture
Core Framework
The core evolutionary framework is generic and can be used for any optimization problem:
- Genome - Abstract base for evolvable representations
- Fitness - Async fitness evaluation
- Population - Collection of individuals with selection
- Selection - Tournament, novelty, and hybrid selection
- Archive - MAP-Elites quality-diversity archive
- Evolution - Main evolutionary loop
LLM Domain
The LLM domain provides specialized components for adversarial testing:
- LLMAttackGenome - Attack representation with strategies, encodings, personas
- LLMTarget - Unified interface for OpenAI, Anthropic, Ollama, etc.
- Judge - Evaluate attack success (heuristic or LLM-based)
- JailbreakFitness - Fitness function combining target and judge
Attack Strategies
| Strategy | Description |
|---|---|
ROLEPLAY |
Assume a character/persona (e.g., DAN) |
ENCODING |
Obfuscate the request (base64, rot13, etc.) |
AUTHORITY |
Claim special permissions |
HYPOTHETICAL |
Frame as fictional/educational |
MULTI_TURN |
Build up through conversation |
DIRECT |
Direct jailbreak attempt |
Encodings
| Encoding | Description |
|---|---|
NONE |
No encoding |
BASE64 |
Base64 encoding |
ROT13 |
ROT13 cipher |
LEETSPEAK |
L33t sp34k |
PIG_LATIN |
Pig Latin |
REVERSE |
Reversed text |
Extending
Custom Genomes
from rotalabs_redqueen import Genome, BehaviorDescriptor
class MyGenome(Genome["MyGenome"]):
@classmethod
def random(cls, rng=None):
# Create random genome
...
def mutate(self, rng=None):
# Return mutated copy
...
def crossover(self, other, rng=None):
# Return offspring
...
def to_phenotype(self):
# Convert to evaluable form
...
def behavior(self):
# Return behavior descriptor for QD
return BehaviorDescriptor((dim1, dim2, ...))
Custom Fitness Functions
from rotalabs_redqueen import Fitness, FitnessResult, FitnessValue
class MyFitness(Fitness[MyGenome]):
async def evaluate(self, genome):
# Evaluate genome
score = compute_score(genome.to_phenotype())
return FitnessResult(
fitness=FitnessValue(score),
behavior=genome.behavior(),
)
Custom Targets
from rotalabs_redqueen import LLMTarget, TargetResponse
class MyTarget(LLMTarget):
@property
def name(self):
return "my-target"
async def query(self, prompt):
# Query your LLM
response = await my_llm_api(prompt)
return TargetResponse(
content=response.text,
model="my-model",
tokens_used=response.tokens,
)
Use Cases
- Red-teaming: Discover vulnerabilities in LLM safety measures
- Defense testing: Validate content filters and guardrails
- Research: Study attack patterns and defenses systematically
- Benchmarking: Compare robustness across models
Responsible Use
This tool is intended for defensive security research - testing and improving the safety of AI systems you own or have permission to test.
Do not use this tool to:
- Attack systems without authorization
- Generate harmful content for malicious purposes
- Circumvent safety measures of production systems
Links
- Website: https://rotalabs.ai
- GitHub: https://github.com/rotalabs/rotalabs-redqueen
- Documentation: https://rotalabs.github.io/rotalabs-redqueen/
- Contact: research@rotalabs.ai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rotalabs_redqueen-0.1.0.tar.gz.
File metadata
- Download URL: rotalabs_redqueen-0.1.0.tar.gz
- Upload date:
- Size: 28.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d966b33525594d50357a6a328a21db4b9598c3b602559b3858a1b1a9c982f38
|
|
| MD5 |
f70039a8f42ec1857464a80afdfbd15a
|
|
| BLAKE2b-256 |
0598088b67c8e574381655d18f5ea14daa419667779c20e9ebc77eee7d2c1404
|
File details
Details for the file rotalabs_redqueen-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rotalabs_redqueen-0.1.0-py3-none-any.whl
- Upload date:
- Size: 30.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49e919e1e7c030066668c849fba9eabe2ad433b39d233b6cb351f235adafe500
|
|
| MD5 |
0bb41abf34924404c0a1134fbf551100
|
|
| BLAKE2b-256 |
1218307f1e387ec0f8a04aebfa30797381fccf789a469bc46fd754e8f72de06d
|