Single-pass runtime reliability instrumentation for LLM agents using token logprobs.
Project description
AgentUQ
Single-pass runtime reliability gate for LLM agents using token logprobs.
AgentUQ turns provider-native token logprobs into localized runtime decisions for agent steps. It does not claim to know whether an output is true. It tells you where a generation looked brittle or ambiguous and whether the workflow should continue, annotate the trace, regenerate a risky span, retry the step, dry-run verify, ask for confirmation, or block execution.
Why teams use it
- Catch brittle action-bearing spans before execution: SQL clauses, tool arguments, selectors, URLs, paths, shell flags, and JSON leaves
- Localize risk to the exact span that matters instead of treating the whole response as one opaque score
- Spend expensive verification selectively by using AgentUQ as the first-pass gate
Install
pip install agentuq
For the OpenAI example below, also install the provider SDK:
pip install openai
For local development and contributions:
python -m venv .venv
. .venv/bin/activate
pip install -e .[dev]
Examples below assume the public package and import namespace agentuq.
Integration status
OpenAI Responses API is the stable integration path in the current docs. Every other documented provider, gateway, and framework integration is preview, including OpenAI Chat Completions, OpenRouter, LiteLLM, Gemini, Fireworks, Together, LangChain, LangGraph, and the OpenAI Agents SDK.
Minimal loop
from openai import OpenAI
from agentuq import Analyzer, UQConfig
from agentuq.adapters.openai_responses import OpenAIResponsesAdapter
client = OpenAI()
response = client.responses.create(
model="gpt-4.1-mini",
input="Return the single word Paris.",
include=["message.output_text.logprobs"],
top_logprobs=5,
temperature=0.0,
top_p=1.0,
)
adapter = OpenAIResponsesAdapter()
analyzer = Analyzer(UQConfig(policy="balanced", tolerance="strict"))
record = adapter.capture(
response,
{
"model": "gpt-4.1-mini",
"include": ["message.output_text.logprobs"],
"top_logprobs": 5,
"temperature": 0.0,
"top_p": 1.0,
},
)
result = analyzer.analyze_step(
record,
adapter.capability_report(
response,
{
"model": "gpt-4.1-mini",
"include": ["message.output_text.logprobs"],
"top_logprobs": 5,
"temperature": 0.0,
"top_p": 1.0,
},
),
)
print(result.pretty())
Documentation
The web docs are built with Docusaurus from the canonical Markdown in docs/ and the site app in website/.
- Start here: docs/index.mdx
- Get started: docs/get-started/index.md
- Provider and framework quickstarts: docs/quickstarts/index.md
- Concepts: docs/concepts/index.md
- API reference: docs/concepts/public_api.md
- Maintainers: docs/maintainers/index.md
- Contributing: CONTRIBUTING.md
Repo layout
src/agentuq: library codeexamples: usage examplestests: offline, contract, and optional live testsdocs: canonical documentation contentwebsite: Docusaurus site and Vercel-facing app
Testing
Default pytest runs only offline tests:
python -m pytest
Live smoke checks are manual and opt-in:
AGENTUQ_RUN_LIVE=1 python -m pytest -m live
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentuq-0.1.0.tar.gz.
File metadata
- Download URL: agentuq-0.1.0.tar.gz
- Upload date:
- Size: 257.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e737a054ce4d40ecd151649e0ca058f0272cf5d81550a601c2dfaba2632bd6f
|
|
| MD5 |
71fcb47eeee594a8052cf4e127da8be4
|
|
| BLAKE2b-256 |
626a1485325b067b12d4786762bdd85d268cb9edb00682f05eafecbe3e11e349
|
File details
Details for the file agentuq-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentuq-0.1.0-py3-none-any.whl
- Upload date:
- Size: 52.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93fbbaca522f67e20f725d238d0d6a30f3139b7ba9bf7684973c4ce20b704611
|
|
| MD5 |
17ad881093d02e13ae3ad550dda49be2
|
|
| BLAKE2b-256 |
26550b049b7133129246d7fe367657da0d128c511b55f66f8cb72256d66fbb59
|