Skip to main content

Measure the real-world error rate and dollar cost of an AI agent's decisions. OpenTelemetry-native.

Project description

agentloss

Your eval tool tells you your AI agent's hallucination rate. agentloss tells you what it costs. An OpenTelemetry-native SDK that measures the real-world error rate and dollar loss of an AI agent's decisions — by capturing its consequential actions in-process and joining them to ground truth (real resolved outcomes, not an offline labeled set).

Every eval/observability tool scores quality proxies — LLM-judge, hallucination rate, task completion. agentloss answers the question the market keeps asking and no tool measures: what are my agent's mistakes costing, and is it safe to trust with more autonomy?

Part of ADMT (Automated Decision-Making Technology) — admt.ai.

Install

pip install agentloss

Quickstart

Instrument only the consequential action — the tool call that moves money or commits the business — not every LLM call.

from agentloss import decision, report_outcome, Decision

@decision
def approve_payment(invoice):
    action = run_matching(invoice)            # "approve" | "hold" | "reject"
    return Decision(action=action, value_at_risk_usd=invoice.total,
                    business_key=invoice.number, use_case="ap_3way_match")

# when the outcome resolves (correction, dispute, audit, human review):
report_outcome(business_key="INV-1", ground_truth="duplicate-should-block",
               source="recovery_audit", realized_loss_usd=14200)

It computes the error rate by segment (with confidence intervals), realized + expected dollar loss, and the agent's incremental risk vs. a baseline. Raw prompts/records stay in your boundary; only derived metrics leave.

How it works

  • Instrument consequential actions, not the whole agent. The costly events are the handful of tool calls that move money or commit state.
  • Ground truth arrives late, from outside the agent — a correction, dispute, audit result, or human review. Capture it via report_outcome, the human-review queue, and active sampling
    • a verification agent. This is real resolved outcomes, not an offline dataset.
  • Honest statistics. Monetary-unit sampling with a target verifier budget; two-phase calibration corrects a fallible verifier's bias back to truth (with confidence intervals).

See docs/SDK-SPEC.md for the full API, agentloss.* semantic conventions, and the pack/adapter model.

Try the demo

An oracle-validated harness that seeds an accounts-payable environment with known errors and checks that agentloss recovers the true error rate and dollar loss:

python -m dogfood.run                                  # deterministic mock, no deps
AGENTLOSS_VERIFIER_LLM=claude ANTHROPIC_API_KEY=... python -m dogfood.run

For AI coding agents

agentloss is built to be discovered and wired by coding agents: llms.txt, the instrument-agent-reliability skill, the AGENTS.md rule, and an MCP server (how_to_instrument, explain_attribute, validate_integration).

License

Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentloss-0.0.1.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentloss-0.0.1-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file agentloss-0.0.1.tar.gz.

File metadata

  • Download URL: agentloss-0.0.1.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentloss-0.0.1.tar.gz
Algorithm Hash digest
SHA256 24796591657ecb3837b144aca4e07a08c1dc88e8535dcb12dd7add7710b25fd5
MD5 e761abbdabfb2554430a30d8533e0b44
BLAKE2b-256 293484a20361a88dbc731e3b87905554c281b47905fb58e953572c828eeb4d1f

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentloss-0.0.1.tar.gz:

Publisher: publish.yml on ADMT-ai/agentloss

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentloss-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: agentloss-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentloss-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 30a7ee98695abafb97b49fe0e9f29f125eec446ec710b995b35d5870556da6f8
MD5 c22a6df88b3ab802c22bbe3917908537
BLAKE2b-256 6432b899d1977c7d3ead8befe57d3e391566e928b206991c92982120ad2aaed4

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentloss-0.0.1-py3-none-any.whl:

Publisher: publish.yml on ADMT-ai/agentloss

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page