Out-Of-Tree Llama Stack provider for Garak Red-teaming

These details have not been verified by PyPI

Project links

Project description

TrustyAI Garak (`trustyai_garak`): Out-of-Tree Llama Stack Eval Provider for Garak Red Teaming

About

This repository implements Garak as a Llama Stack out-of-tree provider for security testing and red teaming of Large Language Models with optional Shield Integration for enhanced security testing.

Features

Security Vulnerability Detection: Automated testing for prompt injection, jailbreaks, toxicity, and bias
Compliance Framework Support: Pre-built benchmarks for established standards (OWASP LLM Top 10, AVID taxonomy)
Shield Integration: Test LLMs with and without Llama Stack shields for comparative security analysis
Concurrency Control: Configurable limits for concurrent scans and shield operations
Custom Probe Support: Run specific garak security probes
Enhanced Reporting: Multiple garak output formats including HTML reports and detailed logs

Quick Start

Prerequisites

Python 3.12+
Access to an OpenAI-compatible model endpoint

Installation

# Clone the repository
git clone https://github.com/trustyai-explainability/llama-stack-provider-trustyai-garak.git
cd llama-stack-provider-trustyai-garak

# Create & activate venv
python3 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -e .

Configuration

Set up your environment variables:

export VLLM_URL="http://your-model-endpoint/v1"
export INFERENCE_MODEL="your-model-name"

# Optional: Configure scan behavior
export GARAK_TIMEOUT="10800"  # 3 hours default
export GARAK_MAX_CONCURRENT_JOBS="5"  # Max concurrent scans
export GARAK_MAX_WORKERS="5"  # Max workers for shield scanning

Run Security Scans

Basic Mode (Standard Garak Scanning)

# Start the Llama Stack server
llama stack run run.yaml --image-type venv

# The server will be available at http://localhost:8321

Enhanced Mode (With Shield Integration)

# Start with safety and shield capabilities
llama stack run run-with-safety.yaml --image-type venv

# Includes safety, shields, and telemetry APIs

Demos

Interactive examples are available in the demos/ directory:

Getting Started: Basic usage with predefined scan profiles and user-defined garak probes
Scan Guardrailed System: Llama Stack shield integration for scanning guardrailed LLM system
concurrency_limit_test.ipynb: Testing concurrent scan limits

Compliance Frameworks

Pre-registered compliance framework benchmarks available immediately:

Compliance Standards

Framework	Benchmark ID	Description	Duration
OWASP LLM Top 10	`owasp_llm_top10`	OWASP Top 10 for Large Language Model Applications	~8 hours
AVID Security	`avid_security`	AI Vulnerability Database - Security vulnerabilities	~8 hours
AVID Ethics	`avid_ethics`	AI Vulnerability Database - Ethical concerns	~30 minutes
AVID Performance	`avid_performance`	AI Vulnerability Database - Performance issues	~40 minutes

Scan Profiles for Testing

Profile	Benchmark ID	Duration	Probes
Quick	`quick`	~5 minutes	Essential security checks (3 specific probes)
Standard	`standard`	~1 hour	Standard attack vectors (5 probe categories)

Note: All the above duration estimates are calculated with a Qwen2.5 7B model deployed via vLLM on Openshift.

Usage Examples

Discover Available Benchmarks

from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url="http://localhost:8321")

# List all available benchmarks (auto-registered)
benchmarks = client.benchmarks.list()
for benchmark in benchmarks.data:
    print(f"- {benchmark.identifier}: {benchmark.metadata.get('name', 'No name')}")

Compliance Framework Testing

# Run OWASP LLM Top 10 security assessment
job = client.eval.run_eval(
    benchmark_id="owasp_llm_top10",
    benchmark_config={
        "eval_candidate": {
            "type": "model",
            "model": "qwen2", # change this to your inference model name
            "sampling_params": {
                "max_tokens": 100
            },
        }
     },
)

# Run AVID Security assessment
job = client.eval.run_eval(
    benchmark_id="avid_security",
    benchmark_config={
        "eval_candidate": {
            "type": "model", 
            "model": "qwen2",
            "sampling_params": {
                "max_tokens": 100
            },
        }
     },
)

Built-in Scan Profiles for testing

# Quick security scan (5 min)
job = client.eval.run_eval(
    benchmark_id="quick",
    benchmark_config={
        "eval_candidate": {
            "type": "model",
            "model": "qwen2", # change this to your inference model name
            "sampling_params": {
                "max_tokens": 100
            },
        }
     },
)

Custom Garak Probes

# Register custom probes
client.benchmarks.register(
    benchmark_id="custom",
    dataset_id="garak", # placeholder
    scoring_functions=["garak_scoring"], # placeholder
    provider_benchmark_id="custom",
    provider_id="trustyai_garak",
    metadata={
        "probes": ["latentinjection.LatentJailbreak", "snowball.GraphConnectivity"],
        "timeout": 900  # 15 minutes
    }
)

Shield Integration (Enhanced Mode)

# Test with input shields only
client.benchmarks.register(
    benchmark_id="PI_with_input_shield",
    dataset_id="garak", # placeholder
    scoring_functions=["garak_scoring"], # placeholder
    provider_benchmark_id="PI_with_input_shield",
    provider_id="trustyai_garak",
    metadata={
        "probes": ["promptinject.HijackHateHumans"],
        "timeout": 600,
        "shield_ids": ["Prompt-Guard-86M"]  # Applied to input only
    }
)

# Test with separate input/output shields
client.benchmarks.register(
    benchmark_id="PI_with_io_shields",
    dataset_id="garak", # placeholder
    scoring_functions=["garak_scoring"], # placeholder
    provider_benchmark_id="PI_with_io_shields",
    provider_id="trustyai_garak",
    metadata={
        "probes": ["promptinject.HijackHateHumans"],
        "timeout": 600,
        "shield_config": {
            "input": ["Prompt-Guard-86M"],
            "output": ["Llama-Guard-3-8B"]
        }
    }
)

Job Management

# Check job status
job_status = client.eval.jobs.status(job_id=job.job_id, benchmark_id="quick")
print(f"Job status: {job_status.status}")
print(f"Running jobs: {job_status.metadata.get('running_jobs', 'N/A')}")

# Cancel a running job
client.eval.jobs.cancel(job_id=job.job_id, benchmark_id="quick")

# Get evaluation results
if job_status.status == "completed":
    results = client.eval.get_eval_job_result(job_id=job.job_id, benchmark_id="quick")

Accessing Scan Reports

# Get file metadata
scan_report_id = job_status.metadata["scan_report_file_id"]
scan_log_id = job_status.metadata["scan_log_file_id"]
scan_html_id = job_status.metadata["scan_report_html_file_id"]

# Download files using Files API or direct HTTP calls
import requests
files_url = f"http://localhost:8321/v1/openai/v1/files"
report_content = requests.get(f"{files_url}/{scan_report_id}/content")

Configuration Options

Provider Configuration (`run.yaml` / `run-with-safety.yaml`)

providers:
  eval:
    - provider_id: trustyai_garak
      provider_type: inline::trustyai_garak
      config:
        base_url: ${env.BASE_URL:=http://localhost:8321/v1} # llama-stack service base url
        timeout: ${env.GARAK_TIMEOUT:=10800}  # 3 hours default
        max_concurrent_jobs: ${env.GARAK_MAX_CONCURRENT_JOBS:=5}  # Concurrent scan limit
        max_workers: ${env.GARAK_MAX_WORKERS:=5}  # Shield scanning parallelism
        garak_model_type_openai: "openai.OpenAICompatible"  # For direct LLM scanning
        garak_model_type_function: "function.Single"  # For shield-integrated scanning

Environment Variables

Variable	Default	Description
`BASE_URL`	`http://localhost:8321/v1`	Llama Stack service base URL
`GARAK_TIMEOUT`	`10800`	Maximum timeout for garak scan (seconds)
`GARAK_MAX_CONCURRENT_JOBS`	`5`	Maximum concurrent garak scans
`GARAK_MAX_WORKERS`	`5`	Maximum workers for shield scanning

Results Interpretation

Vulnerability Score: 0.0 (secure) to 1.0 (vulnerable)
Threshold: Scores ≥ 0.5 indicate potential security concerns
Shield Comparison: Compare scores with and without shields to measure effectiveness
Detailed Reports: Available via Llama Stack files API in multiple formats:
- JSON report (scan_report_file_id)
- HTML report (scan_report_html_file_id)
- Detailed logs (scan_log_file_id)
- Hit logs (scan_hitlog_file_id)

Deployment Modes

Basic Mode (`run.yaml`)

Standard garak scanning against OpenAI-compatible endpoints
APIs: inference, eval, files
Best for: Basic security testing

Enhanced Mode (`run-with-safety.yaml`)

Shield-integrated scanning to test Guardrailed systems
APIs: inference, eval, files, safety, shields, telemetry
Best for: Advanced security testing with defense evaluation

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Apr 14, 2026

0.3.1

Mar 13, 2026

0.3.0

Mar 12, 2026

0.2.0

Feb 12, 2026

0.1.8

Jan 14, 2026

0.1.7

Dec 10, 2025

0.1.6

Dec 3, 2025

0.1.5

Oct 7, 2025

0.1.4

Sep 30, 2025

0.1.3

Sep 11, 2025

0.1.2

Sep 9, 2025

0.1.1

Aug 1, 2025

This version

0.1.0

Aug 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_stack_provider_trustyai_garak-0.1.0.tar.gz (21.1 kB view details)

Uploaded Aug 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llama_stack_provider_trustyai_garak-0.1.0-py3-none-any.whl (19.5 kB view details)

Uploaded Aug 1, 2025 Python 3

File details

Details for the file llama_stack_provider_trustyai_garak-0.1.0.tar.gz.

File metadata

Download URL: llama_stack_provider_trustyai_garak-0.1.0.tar.gz
Upload date: Aug 1, 2025
Size: 21.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llama_stack_provider_trustyai_garak-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`97c3f98ce582f6873f8638ee2e34600fc83571f0a8d5328bcb56b14fdc926800`
MD5	`bcf9aab7590702db18a4ce7be7992498`
BLAKE2b-256	`fc9daa56ed79c10b49a4798ee69fb506811d5c31da2f1d586121da7f731c215d`

See more details on using hashes here.

File details

Details for the file llama_stack_provider_trustyai_garak-0.1.0-py3-none-any.whl.

File metadata

Download URL: llama_stack_provider_trustyai_garak-0.1.0-py3-none-any.whl
Upload date: Aug 1, 2025
Size: 19.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llama_stack_provider_trustyai_garak-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6e77565c91703c2ca274b9dfe83afde2810dfc30e193954d2944e23f3e0237d1`
MD5	`8f2b321fb44b87587df838b00c852930`
BLAKE2b-256	`d39fc20d7428e72ebcad636973faec2078226f0e652daa2a54948827d03411fc`

See more details on using hashes here.

llama-stack-provider-trustyai-garak 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

TrustyAI Garak (trustyai_garak): Out-of-Tree Llama Stack Eval Provider for Garak Red Teaming

About

Features

Quick Start

Prerequisites

Installation

Configuration

Run Security Scans

Basic Mode (Standard Garak Scanning)

Enhanced Mode (With Shield Integration)

Demos

Compliance Frameworks

Compliance Standards

Scan Profiles for Testing

Usage Examples

Discover Available Benchmarks

Compliance Framework Testing

Built-in Scan Profiles for testing

Custom Garak Probes

Shield Integration (Enhanced Mode)

Job Management

Accessing Scan Reports

Configuration Options

Provider Configuration (run.yaml / run-with-safety.yaml)

Environment Variables

Results Interpretation

Deployment Modes

Basic Mode (run.yaml)

Enhanced Mode (run-with-safety.yaml)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

TrustyAI Garak (`trustyai_garak`): Out-of-Tree Llama Stack Eval Provider for Garak Red Teaming

Provider Configuration (`run.yaml` / `run-with-safety.yaml`)

Basic Mode (`run.yaml`)

Enhanced Mode (`run-with-safety.yaml`)