Skip to main content

Single-pass runtime reliability instrumentation for LLM agents using token logprobs.

Project description

AgentUQ

Single-pass runtime reliability gate for LLM agents using token logprobs.

AgentUQ turns provider-native token logprobs into localized runtime decisions for agent steps. It does not claim to know whether an output is true. It tells you where a generation looked brittle or ambiguous and whether the workflow should continue, annotate the trace, regenerate a risky span, retry the step, dry-run verify, ask for confirmation, or block execution.

Why teams use it

  • Catch brittle action-bearing spans before execution: SQL clauses, tool arguments, selectors, URLs, paths, shell flags, and JSON leaves
  • Localize risk to the exact span that matters instead of treating the whole response as one opaque score
  • Spend expensive verification selectively by using AgentUQ as the first-pass gate

Install

pip install agentuq

For the OpenAI example below, also install the provider SDK:

pip install openai

For local development and contributions:

python -m venv .venv
. .venv/bin/activate
pip install -e .[dev]

Examples below assume the public package and import namespace agentuq.

Integration status

OpenAI Responses API is the stable integration path in the current docs. Every other documented provider, gateway, and framework integration is preview, including OpenAI Chat Completions, OpenRouter, LiteLLM, Gemini, Fireworks, Together, LangChain, LangGraph, and the OpenAI Agents SDK.

Minimal loop

from openai import OpenAI

from agentuq import Analyzer, UQConfig
from agentuq.adapters.openai_responses import OpenAIResponsesAdapter

client = OpenAI()
response = client.responses.create(
    model="gpt-4.1-mini",
    input="Return the single word Paris.",
    include=["message.output_text.logprobs"],
    top_logprobs=5,
    temperature=0.0,
    top_p=1.0,
)

adapter = OpenAIResponsesAdapter()
analyzer = Analyzer(UQConfig(policy="balanced", tolerance="strict"))
record = adapter.capture(
    response,
    {
        "model": "gpt-4.1-mini",
        "include": ["message.output_text.logprobs"],
        "top_logprobs": 5,
        "temperature": 0.0,
        "top_p": 1.0,
    },
)
result = analyzer.analyze_step(
    record,
    adapter.capability_report(
        response,
        {
            "model": "gpt-4.1-mini",
            "include": ["message.output_text.logprobs"],
            "top_logprobs": 5,
            "temperature": 0.0,
            "top_p": 1.0,
        },
    ),
)

print(result.pretty())

Documentation

The web docs are built with Docusaurus from the canonical Markdown in docs/ and the site app in website/.

Repo layout

  • src/agentuq: library code
  • examples: usage examples
  • tests: offline, contract, and optional live tests
  • docs: canonical documentation content
  • website: Docusaurus site and Vercel-facing app

Testing

Default pytest runs only offline tests:

python -m pytest

Live smoke checks are manual and opt-in:

AGENTUQ_RUN_LIVE=1 python -m pytest -m live

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentuq-0.1.0.tar.gz (257.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentuq-0.1.0-py3-none-any.whl (52.0 kB view details)

Uploaded Python 3

File details

Details for the file agentuq-0.1.0.tar.gz.

File metadata

  • Download URL: agentuq-0.1.0.tar.gz
  • Upload date:
  • Size: 257.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for agentuq-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7e737a054ce4d40ecd151649e0ca058f0272cf5d81550a601c2dfaba2632bd6f
MD5 71fcb47eeee594a8052cf4e127da8be4
BLAKE2b-256 626a1485325b067b12d4786762bdd85d268cb9edb00682f05eafecbe3e11e349

See more details on using hashes here.

File details

Details for the file agentuq-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentuq-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 52.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for agentuq-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 93fbbaca522f67e20f725d238d0d6a30f3139b7ba9bf7684973c4ce20b704611
MD5 17ad881093d02e13ae3ad550dda49be2
BLAKE2b-256 26550b049b7133129246d7fe367657da0d128c511b55f66f8cb72256d66fbb59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page