Skip to main content

Out-Of-Tree Llama Stack provider for Garak Red-teaming

Project description

TrustyAI Garak: LLM Red Teaming for Llama Stack

Automated vulnerability scanning and red teaming for Large Language Models using Garak. This project implements garak as an external evaluation provider for Llama Stack.

What It Does

  • 🔍 Vulnerability Assessment: Red Team LLMs for prompt injection, jailbreaks, toxicity, bias and other vulnerabilities
  • 📋 Compliance: OWASP LLM Top 10, AVID taxonomy benchmarks
  • 🛡️ Shield Testing: Measure guardrail effectiveness
  • ☁️ Cloud-Native: Runs on OpenShift AI / Kubernetes
  • 📊 Detailed Reports: JSON and HTML reports

Pick Your Deployment

Mode Llama Stack server Garak scans Typical use case Guide
Total Remote OpenShift/Kubernetes KFP pipelines Production → Setup
Partial Remote Local machine KFP pipelines Development → Setup
Total Inline Local machine Local machine Fast local testing → Setup
  • Feature notebook: demos/guide.ipynb
  • Metadata reference: BENCHMARK_METADATA_REFERENCE.md

Installation

# For Deployment 1 (Total remote)
## no installation needed! 

# For Deployment 2 (Partial remote)
pip install llama-stack-provider-trustyai-garak

# For Deployment 3 (local scans) - requires extra
pip install "llama-stack-provider-trustyai-garak[inline]"

Quick Workflow

from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url="http://localhost:8321")

# Discover Garak provider
garak_provider = next(
    p for p in client.providers.list()
    if p.provider_type.endswith("trustyai_garak")
)
garak_provider_id = garak_provider.provider_id

# List predefined benchmarks
benchmarks = client.alpha.benchmarks.list()
print([b.identifier for b in benchmarks if b.identifier.startswith("trustyai_garak::")])

# Run a predefined benchmark
benchmark_id = "trustyai_garak::quick"
job = client.alpha.eval.run_eval(
    benchmark_id=benchmark_id,
    benchmark_config={
        "eval_candidate": {
            "type": "model",
            "model": "your-model-id",
            "sampling_params": {"max_tokens": 100},
        }
    },
)

# Poll status
status = client.alpha.eval.jobs.status(job_id=job.job_id, benchmark_id=benchmark_id)
print(status.status)

# Retrieve final result
if status.status == "completed":
    job_result = client.alpha.eval.jobs.retrieve(job_id=job.job_id, benchmark_id=benchmark_id)

Custom Benchmark Schema

Use metadata.garak_config for Garak command configuration. Provider-level runtime parameters (for example timeout, shield_ids) stay at top-level metadata.

client.alpha.benchmarks.register(
    benchmark_id="custom_promptinject",
    dataset_id="garak",
    scoring_functions=["garak_scoring"],
    provider_id=garak_provider_id,
    provider_benchmark_id="custom_promptinject",
    metadata={
        "garak_config": {
            "plugins": {
                "probe_spec": ["promptinject"]
            },
            "reporting": {
                "taxonomy": "owasp"
            }
        },
        "timeout": 900
    }
)

Update and Deep-Merge Behavior

  • To create a tuned variant of a predefined (or existing custom) benchmark, set provider_benchmark_id to the predefined (or existing custom) benchmark ID and pass overrides in metadata.
  • Provider metadata is deep-merged, so you can override only the parts you care about.
  • Predefined benchmarks are comprehensive by design. For faster exploratory runs, lower garak_config.run.soft_probe_prompt_cap to reduce prompts per probe.
client.alpha.benchmarks.register(
    benchmark_id="quick_promptinject_tuned",
    dataset_id="garak",
    scoring_functions=["garak_scoring"],
    provider_id=garak_provider_id,
    provider_benchmark_id="trustyai_garak::quick",
    metadata={
        "garak_config": {
            "plugins": {"probe_spec": ["promptinject"]},
            "system": {"parallel_attempts": 20}
        },
        "timeout": 1200
    }
)
# Faster (less comprehensive) variant of a predefined benchmark
client.alpha.benchmarks.register(
    benchmark_id="owasp_fast",
    dataset_id="garak",
    scoring_functions=["garak_scoring"],
    provider_id=garak_provider_id,
    provider_benchmark_id="trustyai_garak::owasp_llm_top10",
    metadata={
        "garak_config": {
            "run": {"soft_probe_prompt_cap": 100}
        }
    }
)

Shield Testing

Use either shield_ids (all treated as input shields) or shield_config (explicit input/output mapping).

client.alpha.benchmarks.register(
    benchmark_id="with_shields",
    dataset_id="garak",
    scoring_functions=["garak_scoring"],
    provider_id=garak_provider_id,
    provider_benchmark_id="with_shields",
    metadata={
        "garak_config": {
            "plugins": {"probe_spec": ["promptinject.HijackHateHumans"]}
        },
        "shield_config": {
            "input": ["Prompt-Guard-86M"],
            "output": ["Llama-Guard-3-8B"]
        },
        "timeout": 600
    }
)

Understanding Results (_overall and TBSA)

job_result.scores contains:

  • probe-level entries (for example promptinject.HijackHateHumans)
  • synthetic _overall aggregate entry across all probes

_overall.aggregated_results can include:

  • total_attempts
  • vulnerable_responses
  • attack_success_rate
  • probe_count
  • tbsa (Tier-Based Security Aggregate, 1.0 to 5.0, higher is better)
  • version_probe_hash
  • probe_detector_pairs_contributing

TBSA is derived from probe:detector pass-rate and z-score DEFCON grades with tier-aware aggregation and weighting, to give a more meaningful overall security posture than a plain pass/fail metric.

Scan Artifacts

Access scan files from job metadata:

  • scan.log
  • scan.report.jsonl
  • scan.hitlog.jsonl
  • scan.avid.jsonl
  • scan.report.html

Remote mode stores prefixed keys in metadata (for example {job_id}_scan.report.html).

Notes on Remote Cluster Resources

  • Partial remote mode needs KFP resources only.
  • Total remote mode needs full stack resources (KFP, LlamaStackDistribution, RBAC, secrets, and Postgres manifests).
  • See lsd_remote/ for full reference manifests.

Support & Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_stack_provider_trustyai_garak-0.4.0.tar.gz (151.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_stack_provider_trustyai_garak-0.4.0.tar.gz.

File metadata

File hashes

Hashes for llama_stack_provider_trustyai_garak-0.4.0.tar.gz
Algorithm Hash digest
SHA256 da4aebe75d54a6b9fffa10683bdc5481f3dedbc62c51c0b6a7f9114e6e63a935
MD5 9bdf9803995210fb2a283fb73b3278a2
BLAKE2b-256 0e3971707ced8dd87955c5b94fcf7acf5bfcd24ea322abbe47401a181e804abc

See more details on using hashes here.

Provenance

The following attestation bundles were made for llama_stack_provider_trustyai_garak-0.4.0.tar.gz:

Publisher: build-and-publish.yaml on trustyai-explainability/llama-stack-provider-trustyai-garak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llama_stack_provider_trustyai_garak-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_stack_provider_trustyai_garak-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f4862ff8dd70f7d0bfa7aca182a3bde68464f2d737f8dfe3039999f881772fbb
MD5 048062d02630c1a741a0eed32235e540
BLAKE2b-256 fe7a8b6adaa7b5667753f46b7cb104af4d756159b623321c0c8ccc3d66441721

See more details on using hashes here.

Provenance

The following attestation bundles were made for llama_stack_provider_trustyai_garak-0.4.0-py3-none-any.whl:

Publisher: build-and-publish.yaml on trustyai-explainability/llama-stack-provider-trustyai-garak

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page