Skip to main content

Out-Of-Tree Llama Stack provider for Garak Red-teaming

Project description

TrustyAI Garak: LLM Red Teaming for Llama Stack

Automated vulnerability scanning and red teaming for Large Language Models using Garak. This project implements garak as an external evaluation provider for Llama Stack.

What It Does

  • 🔍 Vulnerability Assessment: Red Team LLMs for prompt injection, jailbreaks, toxicity, bias and other vulnerabilities
  • 📋 Compliance: OWASP LLM Top 10, AVID taxonomy benchmarks
  • 🛡️ Shield Testing: Measure guardrail effectiveness
  • ☁️ Cloud-Native: Runs on OpenShift AI / Kubernetes
  • 📊 Detailed Reports: JSON and HTML reports

Pick Your Deployment

# Mode Server Scans Use Case Guide
1 Total Remote OpenShift AI Data Science Pipelines Production → Setup
2 Partial Remote Local laptop Data Science Pipelines Development → Setup
3 Total Inline Local laptop Local laptop Testing only → Setup

Installation

# For Deployment 1 (Total remote)
## no installation needed! 

# For Deployment 2 (Partial remote)
pip install llama-stack-provider-trustyai-garak

# For Deployment 3 (local scans) - requires extra
pip install "llama-stack-provider-trustyai-garak[inline]"

Quick Example

from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url="http://localhost:8321")

# Run security scan (5 minutes)
job = client.alpha.eval.run_eval(
    benchmark_id="trustyai_garak::quick",
    benchmark_config={
        "eval_candidate": {
            "type": "model",
            "model": "your-model-name",
            "sampling_params": {"max_tokens": 100}
        }
    }
)

# Check status
status = client.alpha.eval.jobs.status(job_id=job.job_id, benchmark_id="trustyai_garak::quick")
print(f"Status: {status.status}")

# Get results
if status.status == "completed":
    results = client.alpha.eval.get_eval_job_result(job_id=job.job_id, benchmark_id="trustyai_garak::quick")

Available Benchmarks

Benchmark ID Tests Duration
trustyai_garak::owasp_llm_top10 OWASP Top 10 ~2 hrs
trustyai_garak::avid_security AVID Security ~2 hrs
trustyai_garak::avid_ethics AVID Ethics ~10 min
trustyai_garak::avid_performance AVID Performance ~10 min
trustyai_garak::quick 3 test probes ~5 min

Or register custom benchmarks with specific Garak probes.

Shield Testing Example

# Test how well guardrails (shields) block attacks
client.benchmarks.register(
    benchmark_id="with_shield",
    dataset_id="garak",
    scoring_functions=["garak_scoring"],
    provider_id="trustyai_garak_remote",  # or trustyai_garak_inline
    provider_benchmark_id="with_shield",
    metadata={
        "probes": ["promptinject.HijackHateHumans"],
        "shield_ids": ["Prompt-Guard-86M"]  # Shield to test
    }
)

job = client.alpha.eval.run_eval(
    benchmark_id="with_shield",
    benchmark_config={"eval_candidate": {"type": "model", "model": "your-model"}}
)

Compare results with/without shields to measure effectiveness.

Understanding Results

Vulnerability Score

  • 0.0 = Secure (model refused attack)
  • 0.5 = Threshold (concerning)
  • 1.0 = Vulnerable (model was compromised)

Reports Available

Access via job.metadata:

  • scan.log: Detailed log of this scan.
  • scan.report.jsonl: Report containing information about each attempt (prompt) of each garak probe.
  • scan.hitlog.jsonl: Report containing only the information about attempts that the model was found vulnerable to.
  • scan.avid.jsonl: AVID (AI Vulnerability Database) format of scan.report.jsonl. You can find info about AVID here.
  • scan.report.html: Visual representation of the scan. In remote mode, this is logged as a html artifact of the pipeline.
# Download HTML report
html_id = job.metadata[f"{job.job_id}_scan.report.html"]
content = client.files.content(html_id)
with open("report.html", "w") as f:
    f.write(content)

Support & Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_stack_provider_trustyai_garak-0.2.0.tar.gz (58.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_stack_provider_trustyai_garak-0.2.0.tar.gz.

File metadata

File hashes

Hashes for llama_stack_provider_trustyai_garak-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c63e179adcbbb2006dccaa78daf7a30da3855abe96d562a952fcdca5d31c8eb0
MD5 35adf75e0317f3f7dc8443d81c367867
BLAKE2b-256 e09f42304ce2967ce67028ce7045b6de7489c84046a214705808df43d7ab6dbc

See more details on using hashes here.

Provenance

The following attestation bundles were made for llama_stack_provider_trustyai_garak-0.2.0.tar.gz:

Publisher: build-and-publish.yaml on trustyai-explainability/llama-stack-provider-trustyai-garak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llama_stack_provider_trustyai_garak-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_stack_provider_trustyai_garak-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ea6b9fc2ac48f1324c53a88983daeadec174efb7239f8ae9fbf752bfb0e527d
MD5 2080cbe903ed0e7fd7e876b549a2b71d
BLAKE2b-256 7ff6926d0459c6eb7bd67086fe341d72621667044a36983a54889cb94745e0fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for llama_stack_provider_trustyai_garak-0.2.0-py3-none-any.whl:

Publisher: build-and-publish.yaml on trustyai-explainability/llama-stack-provider-trustyai-garak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page