Out-Of-Tree Llama Stack provider for Garak Red-teaming
Project description
TrustyAI Garak (trustyai_garak): Out-of-Tree Llama Stack Eval Provider for Garak Red Teaming
About
This repository implements Garak as a Llama Stack out-of-tree provider for security testing and red teaming of Large Language Models with optional Shield Integration for enhanced security testing.
Features
- Security Vulnerability Detection: Automated testing for prompt injection, jailbreaks, toxicity, and bias
- Compliance Framework Support: Pre-built benchmarks for established standards (OWASP LLM Top 10, AVID taxonomy)
- Shield Integration: Test LLMs with and without Llama Stack shields for comparative security analysis
- Concurrency Control: Configurable limits for concurrent scans and shield operations
- Custom Probe Support: Run specific garak security probes
- Enhanced Reporting: Multiple garak output formats including HTML reports and detailed logs
Quick Start
Prerequisites
- Python 3.12+
- Access to an OpenAI-compatible model endpoint
Installation
# Clone the repository
git clone https://github.com/trustyai-explainability/llama-stack-provider-trustyai-garak.git
cd llama-stack-provider-trustyai-garak
# Create & activate venv
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -e .
Configuration
Set up your environment variables:
export VLLM_URL="http://your-model-endpoint/v1"
export INFERENCE_MODEL="your-model-name"
# Optional: Configure scan behavior
export GARAK_TIMEOUT="10800" # 3 hours default
export GARAK_MAX_CONCURRENT_JOBS="5" # Max concurrent scans
export GARAK_MAX_WORKERS="5" # Max workers for shield scanning
Run Security Scans
Basic Mode (Standard Garak Scanning)
# Start the Llama Stack server
llama stack run run.yaml --image-type venv
# The server will be available at http://localhost:8321
Enhanced Mode (With Shield Integration)
# Start with safety and shield capabilities
llama stack run run-with-safety.yaml --image-type venv
# Includes safety, shields, and telemetry APIs
Demos
Interactive examples are available in the demos/ directory:
- Getting Started: Basic usage with predefined scan profiles and user-defined garak probes
- Scan Guardrailed System: Llama Stack shield integration for scanning guardrailed LLM system
- concurrency_limit_test.ipynb: Testing concurrent scan limits
Compliance Frameworks
Pre-registered compliance framework benchmarks available immediately:
Compliance Standards
| Framework | Benchmark ID | Description | Duration |
|---|---|---|---|
| OWASP LLM Top 10 | owasp_llm_top10 |
OWASP Top 10 for Large Language Model Applications | ~8 hours |
| AVID Security | avid_security |
AI Vulnerability Database - Security vulnerabilities | ~8 hours |
| AVID Ethics | avid_ethics |
AI Vulnerability Database - Ethical concerns | ~30 minutes |
| AVID Performance | avid_performance |
AI Vulnerability Database - Performance issues | ~40 minutes |
Scan Profiles for Testing
| Profile | Benchmark ID | Duration | Probes |
|---|---|---|---|
| Quick | quick |
~5 minutes | Essential security checks (3 specific probes) |
| Standard | standard |
~1 hour | Standard attack vectors (5 probe categories) |
Note: All the above duration estimates are calculated with a Qwen2.5 7B model deployed via vLLM on Openshift.
Usage Examples
Discover Available Benchmarks
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(base_url="http://localhost:8321")
# List all available benchmarks (auto-registered)
benchmarks = client.benchmarks.list()
for benchmark in benchmarks.data:
print(f"- {benchmark.identifier}: {benchmark.metadata.get('name', 'No name')}")
Compliance Framework Testing
# Run OWASP LLM Top 10 security assessment
job = client.eval.run_eval(
benchmark_id="owasp_llm_top10",
benchmark_config={
"eval_candidate": {
"type": "model",
"model": "qwen2", # change this to your inference model name
"sampling_params": {
"max_tokens": 100
},
}
},
)
# Run AVID Security assessment
job = client.eval.run_eval(
benchmark_id="avid_security",
benchmark_config={
"eval_candidate": {
"type": "model",
"model": "qwen2",
"sampling_params": {
"max_tokens": 100
},
}
},
)
Built-in Scan Profiles for testing
# Quick security scan (5 min)
job = client.eval.run_eval(
benchmark_id="quick",
benchmark_config={
"eval_candidate": {
"type": "model",
"model": "qwen2", # change this to your inference model name
"sampling_params": {
"max_tokens": 100
},
}
},
)
Custom Garak Probes
# Register custom probes
client.benchmarks.register(
benchmark_id="custom",
dataset_id="garak", # placeholder
scoring_functions=["garak_scoring"], # placeholder
provider_benchmark_id="custom",
provider_id="trustyai_garak",
metadata={
"probes": ["latentinjection.LatentJailbreak", "snowball.GraphConnectivity"],
"timeout": 900 # 15 minutes
}
)
Shield Integration (Enhanced Mode)
# Test with input shields only
client.benchmarks.register(
benchmark_id="PI_with_input_shield",
dataset_id="garak", # placeholder
scoring_functions=["garak_scoring"], # placeholder
provider_benchmark_id="PI_with_input_shield",
provider_id="trustyai_garak",
metadata={
"probes": ["promptinject.HijackHateHumans"],
"timeout": 600,
"shield_ids": ["Prompt-Guard-86M"] # Applied to input only
}
)
# Test with separate input/output shields
client.benchmarks.register(
benchmark_id="PI_with_io_shields",
dataset_id="garak", # placeholder
scoring_functions=["garak_scoring"], # placeholder
provider_benchmark_id="PI_with_io_shields",
provider_id="trustyai_garak",
metadata={
"probes": ["promptinject.HijackHateHumans"],
"timeout": 600,
"shield_config": {
"input": ["Prompt-Guard-86M"],
"output": ["Llama-Guard-3-8B"]
}
}
)
Job Management
# Check job status
job_status = client.eval.jobs.status(job_id=job.job_id, benchmark_id="quick")
print(f"Job status: {job_status.status}")
print(f"Running jobs: {job_status.metadata.get('running_jobs', 'N/A')}")
# Cancel a running job
client.eval.jobs.cancel(job_id=job.job_id, benchmark_id="quick")
# Get evaluation results
if job_status.status == "completed":
results = client.eval.get_eval_job_result(job_id=job.job_id, benchmark_id="quick")
Accessing Scan Reports
# Get file metadata
scan_report_id = job_status.metadata["scan_report_file_id"]
scan_log_id = job_status.metadata["scan_log_file_id"]
scan_html_id = job_status.metadata["scan_report_html_file_id"]
# Download files using Files API or direct HTTP calls
import requests
files_url = f"http://localhost:8321/v1/openai/v1/files"
report_content = requests.get(f"{files_url}/{scan_report_id}/content")
Configuration Options
Provider Configuration (run.yaml / run-with-safety.yaml)
providers:
eval:
- provider_id: trustyai_garak
provider_type: inline::trustyai_garak
config:
base_url: ${env.BASE_URL:=http://localhost:8321/v1} # llama-stack service base url
timeout: ${env.GARAK_TIMEOUT:=10800} # 3 hours default
max_concurrent_jobs: ${env.GARAK_MAX_CONCURRENT_JOBS:=5} # Concurrent scan limit
max_workers: ${env.GARAK_MAX_WORKERS:=5} # Shield scanning parallelism
garak_model_type_openai: "openai.OpenAICompatible" # For direct LLM scanning
garak_model_type_function: "function.Single" # For shield-integrated scanning
Environment Variables
| Variable | Default | Description |
|---|---|---|
BASE_URL |
http://localhost:8321/v1 |
Llama Stack service base URL |
GARAK_TIMEOUT |
10800 |
Maximum timeout for garak scan (seconds) |
GARAK_MAX_CONCURRENT_JOBS |
5 |
Maximum concurrent garak scans |
GARAK_MAX_WORKERS |
5 |
Maximum workers for shield scanning |
Results Interpretation
- Vulnerability Score: 0.0 (secure) to 1.0 (vulnerable)
- Threshold: Scores ≥ 0.5 indicate potential security concerns
- Shield Comparison: Compare scores with and without shields to measure effectiveness
- Detailed Reports: Available via Llama Stack
filesAPI in multiple formats:- JSON report (
scan_report_file_id) - HTML report (
scan_report_html_file_id) - Detailed logs (
scan_log_file_id) - Hit logs (
scan_hitlog_file_id)
- JSON report (
Deployment Modes
Basic Mode (run.yaml)
- Standard garak scanning against OpenAI-compatible endpoints
- APIs:
inference,eval,files - Best for: Basic security testing
Enhanced Mode (run-with-safety.yaml)
- Shield-integrated scanning to test Guardrailed systems
- APIs:
inference,eval,files,safety,shields,telemetry - Best for: Advanced security testing with defense evaluation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama_stack_provider_trustyai_garak-0.1.0.tar.gz.
File metadata
- Download URL: llama_stack_provider_trustyai_garak-0.1.0.tar.gz
- Upload date:
- Size: 21.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97c3f98ce582f6873f8638ee2e34600fc83571f0a8d5328bcb56b14fdc926800
|
|
| MD5 |
bcf9aab7590702db18a4ce7be7992498
|
|
| BLAKE2b-256 |
fc9daa56ed79c10b49a4798ee69fb506811d5c31da2f1d586121da7f731c215d
|
File details
Details for the file llama_stack_provider_trustyai_garak-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llama_stack_provider_trustyai_garak-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e77565c91703c2ca274b9dfe83afde2810dfc30e193954d2944e23f3e0237d1
|
|
| MD5 |
8f2b321fb44b87587df838b00c852930
|
|
| BLAKE2b-256 |
d39fc20d7428e72ebcad636973faec2078226f0e652daa2a54948827d03411fc
|