Measure the real-world error rate and dollar cost of an AI agent's decisions. OpenTelemetry-native.
Project description
agentloss
Your eval tool tells you your AI agent's hallucination rate. agentloss tells you what it
costs. An OpenTelemetry-native SDK that measures the real-world error rate and dollar
loss of an AI agent's decisions — by capturing its consequential actions in-process and
joining them to ground truth (real resolved outcomes, not an offline labeled set).
Every eval/observability tool scores quality proxies — LLM-judge, hallucination rate, task
completion. agentloss answers the question the market keeps asking and no tool measures:
what are my agent's mistakes costing, and is it safe to trust with more autonomy?
Part of ADMT (Automated Decision-Making Technology) — admt.ai.
Install
pip install agentloss
Quickstart
Instrument only the consequential action — the tool call that moves money or commits the business — not every LLM call.
from agentloss import decision, report_outcome, Decision
@decision # bare decorator; the returned Decision is recorded
def approve_payment(invoice):
action = run_matching(invoice) # "approve" | "hold" | "reject"
return Decision(action=action, value_at_risk_usd=invoice.total,
business_key=invoice.number, use_case="ap_3way_match")
# when the outcome resolves (correction, dispute, chargeback, audit, human review):
report_outcome(business_key="INV-1", ground_truth="duplicate-should-block",
source="recovery_audit", realized_loss_usd=14200)
You already have the ground truth? (the common case — a disputes / chargebacks table). That's the default: each reported outcome is a census observation that counts toward the number, no flags needed. Join the whole table in one line:
from agentloss import record_outcomes
record_outcomes([
{"business_key": "INV-1", "ground_truth": "reject", "source": "chargeback",
"realized_loss_usd": 80.0},
{"business_key": "INV-2", "ground_truth": "approve", "source": "dispute"}, # a CORRECT one
])
Report the outcomes that agreed with the agent too, not only the disputes — the rate's
denominator is reported approvals, so reporting only errors makes it read ~100%. source
is one of recovery_audit | dispute | chargeback | refund | human_queue | verification_agent.
It computes the error rate by segment (with confidence intervals), realized + expected dollar loss, and the agent's incremental risk vs. a baseline. Raw prompts/records stay in your boundary; only derived metrics leave.
Confirm the wiring — agentloss.doctor() inspects the store and catches the silent
failures in plain language (outcomes reported but none counted, only-errors reported, a loss
source that won't be summed). Or from a shell: agentloss doctor --json.
Works with your existing traces (Phoenix / Langfuse / Braintrust / OTel)
Already tracing your agent with OpenInference/OpenTelemetry? Don't re-instrument. Add a few
agentloss.* attributes to the consequential span, point agentloss at your spans, and it adds
the loss/outcome layer on top of what your tracer already emits:
from agentloss import ingest_spans, sample_and_verify, print_report
ingest_spans(your_spans) # OTel/OpenInference spans carrying agentloss.* attributes
sample_and_verify(verify_fn) # Tier A: get a number with no external labels wired
print_report() # error rate by segment + dollar loss
How it works
- Instrument consequential actions, not the whole agent. The costly events are the handful of tool calls that move money or commit state.
- Ground truth arrives late, from outside the agent — a correction, dispute, audit result,
or human review. Capture it via
report_outcome, the human-review queue, and active sampling- a verification agent. This is real resolved outcomes, not an offline dataset.
- Honest statistics. Monetary-unit sampling with a target verifier budget; two-phase calibration corrects a fallible verifier's bias back to truth (with confidence intervals).
See docs/SDK-SPEC.md for the full API, agentloss.* semantic conventions,
and the pack/adapter model.
Try the demo
An oracle-validated harness that seeds an accounts-payable environment with known errors and
checks that agentloss recovers the true error rate and dollar loss:
python -m dogfood.run # deterministic mock, no deps
AGENTLOSS_VERIFIER_LLM=claude ANTHROPIC_API_KEY=... python -m dogfood.run
For AI coding agents
agentloss is built to be discovered and wired by coding agents:
llms.txt, the instrument-agent-reliability
skill, the AGENTS.md rule, and an MCP server
(how_to_instrument, explain_attribute, validate_integration).
License
Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentloss-0.0.9.tar.gz.
File metadata
- Download URL: agentloss-0.0.9.tar.gz
- Upload date:
- Size: 33.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a8d6b5ab61b1a90b1ff33effe32469f53926694a18bb49550b0c6bb61b2d999
|
|
| MD5 |
d65a1899617e052668e74f9e19fb9e8c
|
|
| BLAKE2b-256 |
2655158b97f727a08dacba470414b382e79ca0a218108aaf7d0e6f6a2a034801
|
Provenance
The following attestation bundles were made for agentloss-0.0.9.tar.gz:
Publisher:
publish.yml on ADMT-ai/agentloss
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentloss-0.0.9.tar.gz -
Subject digest:
9a8d6b5ab61b1a90b1ff33effe32469f53926694a18bb49550b0c6bb61b2d999 - Sigstore transparency entry: 2040059767
- Sigstore integration time:
-
Permalink:
ADMT-ai/agentloss@655ce32872cee459ba9274cd6fa33c9d573e492b -
Branch / Tag:
refs/tags/v0.0.9 - Owner: https://github.com/ADMT-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@655ce32872cee459ba9274cd6fa33c9d573e492b -
Trigger Event:
release
-
Statement type:
File details
Details for the file agentloss-0.0.9-py3-none-any.whl.
File metadata
- Download URL: agentloss-0.0.9-py3-none-any.whl
- Upload date:
- Size: 39.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15be329c6edb2bd0a90fecd9a9c80574e16f027f76d7a0aa21fde58b14f51aad
|
|
| MD5 |
fba7faae76eafbb1380fb0a3e8e137c3
|
|
| BLAKE2b-256 |
9ce67ea6b3f5de19308fffd6ff26c52339740a8b745dfbb27106d956ef1b4d16
|
Provenance
The following attestation bundles were made for agentloss-0.0.9-py3-none-any.whl:
Publisher:
publish.yml on ADMT-ai/agentloss
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentloss-0.0.9-py3-none-any.whl -
Subject digest:
15be329c6edb2bd0a90fecd9a9c80574e16f027f76d7a0aa21fde58b14f51aad - Sigstore transparency entry: 2040059939
- Sigstore integration time:
-
Permalink:
ADMT-ai/agentloss@655ce32872cee459ba9274cd6fa33c9d573e492b -
Branch / Tag:
refs/tags/v0.0.9 - Owner: https://github.com/ADMT-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@655ce32872cee459ba9274cd6fa33c9d573e492b -
Trigger Event:
release
-
Statement type: