Skip to main content

Out-Of-Tree Llama Stack provider for Garak Red-teaming

Project description

TrustyAI Garak: LLM Red Teaming for Llama Stack

Automated vulnerability scanning and red teaming for Large Language Models using Garak. This project implements garak as an external evaluation provider for Llama Stack.

What It Does

  • 🔍 Vulnerability Assessment: Red Team LLMs for prompt injection, jailbreaks, toxicity, bias and other vulnerabilities
  • 📋 Compliance: OWASP LLM Top 10, AVID taxonomy benchmarks
  • 🛡️ Shield Testing: Measure guardrail effectiveness
  • ☁️ Cloud-Native: Runs on OpenShift AI / Kubernetes
  • 📊 Detailed Reports: JSON and HTML reports

Pick Your Deployment

# Mode Server Scans Use Case Guide
1 Total Remote OpenShift AI Data Science Pipelines Production → Setup
2 Partial Remote Local laptop Data Science Pipelines Development → Setup
3 Total Inline Local laptop Local laptop Testing only → Setup

Installation

# For Deployment 1 (Total remote)
## no installation needed! 

# For Deployment 2 (Partial remote)
pip install llama-stack-provider-trustyai-garak

# For Deployment 3 (local scans) - requires extra
pip install "llama-stack-provider-trustyai-garak[inline]"

Quick Example

from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url="http://localhost:8321")

# Run security scan (5 minutes)
job = client.alpha.eval.run_eval(
    benchmark_id="trustyai_garak::quick",
    benchmark_config={
        "eval_candidate": {
            "type": "model",
            "model": "your-model-name",
            "sampling_params": {"max_tokens": 100}
        }
    }
)

# Check status
status = client.alpha.eval.jobs.status(job_id=job.job_id, benchmark_id="trustyai_garak::quick")
print(f"Status: {status.status}")

# Get results
if status.status == "completed":
    results = client.alpha.eval.get_eval_job_result(job_id=job.job_id, benchmark_id="trustyai_garak::quick")

Available Benchmarks

Benchmark ID Tests Duration
trustyai_garak::owasp_llm_top10 OWASP Top 10 ~2 hrs
trustyai_garak::avid_security AVID Security ~2 hrs
trustyai_garak::avid_ethics AVID Ethics ~10 min
trustyai_garak::avid_performance AVID Performance ~10 min
trustyai_garak::quick 3 test probes ~5 min

Or register custom benchmarks with specific Garak probes.

Shield Testing Example

# Test how well guardrails (shields) block attacks
client.benchmarks.register(
    benchmark_id="with_shield",
    dataset_id="garak",
    scoring_functions=["garak_scoring"],
    provider_id="trustyai_garak_remote",  # or trustyai_garak_inline
    provider_benchmark_id="with_shield",
    metadata={
        "probes": ["promptinject.HijackHateHumans"],
        "shield_ids": ["Prompt-Guard-86M"]  # Shield to test
    }
)

job = client.alpha.eval.run_eval(
    benchmark_id="with_shield",
    benchmark_config={"eval_candidate": {"type": "model", "model": "your-model"}}
)

Compare results with/without shields to measure effectiveness.

Understanding Results

Vulnerability Score

  • 0.0 = Secure (model refused attack)
  • 0.5 = Threshold (concerning)
  • 1.0 = Vulnerable (model was compromised)

Reports Available

Access via job.metadata:

  • scan.log: Detailed log of this scan.
  • scan.report.jsonl: Report containing information about each attempt (prompt) of each garak probe.
  • scan.hitlog.jsonl: Report containing only the information about attempts that the model was found vulnerable to.
  • scan.avid.jsonl: AVID (AI Vulnerability Database) format of scan.report.jsonl. You can find info about AVID here.
  • scan.report.html: Visual representation of the scan. In remote mode, this is logged as a html artifact of the pipeline.
# Download HTML report
html_id = job.metadata[f"{job.job_id}_scan.report.html"]
content = client.files.content(html_id)
with open("report.html", "w") as f:
    f.write(content)

Support & Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_stack_provider_trustyai_garak-0.1.8.tar.gz (56.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_stack_provider_trustyai_garak-0.1.8.tar.gz.

File metadata

File hashes

Hashes for llama_stack_provider_trustyai_garak-0.1.8.tar.gz
Algorithm Hash digest
SHA256 a90512b5ae53b3cc93fd7248e4fbef15a7f360fc86d04858da7bb40ee05e5256
MD5 fefdd5ad469759430f9f40ce813912ab
BLAKE2b-256 d1f6b8c7f690078449f9e3dc00289eb44be219c129ef76741fbe1fc35b84d7c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for llama_stack_provider_trustyai_garak-0.1.8.tar.gz:

Publisher: build-and-publish.yaml on trustyai-explainability/llama-stack-provider-trustyai-garak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llama_stack_provider_trustyai_garak-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_stack_provider_trustyai_garak-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 9762e33e5d6f143ed2f469b7e25d67fff6a9b943f3a6efb6eea159e33f95209d
MD5 7971fdaa731b36f78926ce24d4e6279b
BLAKE2b-256 b06272638472a39b801565f0472fe8e4beb3b655303daaeefb6248cd98b1d96a

See more details on using hashes here.

Provenance

The following attestation bundles were made for llama_stack_provider_trustyai_garak-0.1.8-py3-none-any.whl:

Publisher: build-and-publish.yaml on trustyai-explainability/llama-stack-provider-trustyai-garak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page