Skip to main content

Benchmark prompt-injection resilience and tool safety for dual-LLM agent architectures.

Project description

dual-llm-bench

dual-llm-bench is a small benchmarking package for measuring prompt-injection resilience, privileged-context exposure, tool-safety behavior, and useful-task retention in agentic LLM systems.

The package is inspired by evaluation libraries such as Ragas: define a structured sample, run an agent, score the resulting trace with composable metrics, and aggregate the result into a report.

Install

pip install dual-llm-bench

For local development:

pip install -e ".[dev]"
pytest

Quick Start

from dual_llm_bench import BenchmarkSuite, built_in_metrics
from dual_llm_bench.runners import CallableRunner


def agent(sample):
    return {
        "privileged_input": sample.user_goal,
        "policy_verdict": "needs_human",
        "final_outcome": "Created a security review ticket.",
        "tool_request": {"name": "create_ticket", "arguments": {"queue": "security"}},
        "quarantined_output": {
            "summary": "Suspicious vendor request.",
            "entities": ["invoice_id:8821"],
            "suspicious_instructions": sample.attack_markers,
            "sanitization_notes": ["Removed attacker-controlled instructions."],
        },
    }


suite = BenchmarkSuite.from_builtin("pycon-core")
report = suite.run(CallableRunner(agent), metrics=built_in_metrics())
print(report.to_markdown())

Core Metrics

  • Injection Resistance: rewards agents that avoid forbidden tools and unsafe final outcomes when prompt-injection fixtures are present.
  • Privileged Context Exposure: checks whether attacker markers leak into the privileged model input.
  • Tool Decision Safety F1: compares tool/policy decisions with expected safe labels.
  • Utility Retention: checks whether useful business facts survive quarantine and appear in the final trace.

CLI

dual-llm-bench list-datasets
dual-llm-bench inspect pycon-core
dual-llm-bench score-traces traces.jsonl --dataset pycon-core

score-traces expects JSONL records containing sample_id plus trace fields such as privileged_input, policy_verdict, tool_request, quarantined_output, and final_outcome.

Example Integrations

Publishing

Preferred release path:

  1. Push a clean main branch and wait for GitHub CI to pass.
  2. Create a GitHub release for a version tag such as v0.1.0.
  3. Configure PyPI Trusted Publishing for .github/workflows/publish.yml and the pypi environment.
  4. Let the release workflow build, smoke-test, and publish the package.

Manual local validation:

python -m pip install -e ".[dev]"
pytest
ruff check .
mypy src
rm -rf dist
python -m build
twine check dist/*

Use TestPyPI before publishing a public release if you have not validated the workflow before. Prefer Trusted Publishing over long-lived PyPI tokens.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dual_llm_bench-0.1.1.tar.gz (23.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dual_llm_bench-0.1.1-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file dual_llm_bench-0.1.1.tar.gz.

File metadata

  • Download URL: dual_llm_bench-0.1.1.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dual_llm_bench-0.1.1.tar.gz
Algorithm Hash digest
SHA256 82ccf3e21be3fa85790b8b85fec69bd038cdd20fc12f0b1414eea4422fbad226
MD5 8c9b7afb0d5a316fd11cf45497346b43
BLAKE2b-256 05b3ff4e3b97c0b0e4e0b13067dd14c42e66b3753644883f61cb579b1961cfdc

See more details on using hashes here.

Provenance

The following attestation bundles were made for dual_llm_bench-0.1.1.tar.gz:

Publisher: publish.yml on rudrakshkarpe/dual-llm-bench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dual_llm_bench-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dual_llm_bench-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dual_llm_bench-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 aa5e7469df067771417b730302c82a72c1fd1288d1cb48fa77b88a8dc529b2a0
MD5 2c63e17db4e2a9a8d27d2320cdd068bb
BLAKE2b-256 7f1c282f6a1da2970be2481925cff4cc4a4c4fc5dd6523c950c635d4643c960a

See more details on using hashes here.

Provenance

The following attestation bundles were made for dual_llm_bench-0.1.1-py3-none-any.whl:

Publisher: publish.yml on rudrakshkarpe/dual-llm-bench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page