FinOps for LLM agents: budgets, counterfactual costs, and waste tracking for OpenAI and Anthropic

These details have not been verified by PyPI

Project links

Project description

agenticmeter

FinOps for LLM agents. Budget caps that actually stop runaway calls. Waste decomposition that shows where the money went. Caching analysis that tells you what to fix. Agent-loop detection for multi-agent pipelines.

Drop-in for OpenAI and Anthropic. Works with LangChain. Zero runtime dependencies in core. No backend, no account, no telemetry.

from agenticmeter import Ledger
import openai

client = openai.OpenAI()

with Ledger(budget="$2.00") as ledger:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "hi"}],
    )

print(ledger.summary())

Why this exists

Every other tool in this space — tokencost, LangSmith, Langfuse, Helicone, LiteLLM's cost tracking — tells you what you spent. None of them tell you:

That 86% of your spend was waste (retries on the same prompt, truncated outputs, failed calls)
That the same workload on gpt-4o-mini would have cost 94% less
That your invoice_agent is stuck in a tool-calling loop, eating $0.30 per run before giving up
That a 2,500-token system prompt is being sent uncached on every iteration when caching would cut that input cost by 90%

agenticmeter does. It's the layer that sits between your code and your monthly OpenAI/Anthropic bill, and it turns vague "this feels expensive" feelings into specific dollar-denominated decisions.

It also enforces budgets in-process. Other tools warn you tomorrow. agenticmeter raises BudgetExceeded on the next call, the moment you cross the line — so a runaway agent stops at $2.00 instead of $200.

Install

pip install agenticmeter              # core only, zero deps
pip install agenticmeter[openai]      # + OpenAI auto-tracking
pip install agenticmeter[anthropic]   # + Anthropic auto-tracking
pip install agenticmeter[langchain]   # + LangChain callback handler
pip install agenticmeter[all]         # everything

Core has zero runtime dependencies. SDKs are optional extras — install only what you use.

Quickstart

OpenAI

from agenticmeter import Ledger
from openai import OpenAI

client = OpenAI()

with Ledger(budget="$1.00", name="my_agent") as ledger:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello!"}],
    )

print(ledger.summary())

Anthropic

from agenticmeter import Ledger
from anthropic import Anthropic

client = Anthropic()

with Ledger(budget="$1.00") as ledger:
    response = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=100,
        messages=[{"role": "user", "content": "Hello!"}],
    )

print(ledger.summary())

LangChain (any provider it supports)

LangChain wraps responses differently than the raw SDKs, so it has its own callback handler that extracts tokens correctly and handles per-agent attribution.

from agenticmeter import Ledger
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

with Ledger(budget="$1.00") as ledger:
    cb = ledger.as_langchain_callback()

    with ledger.tag("researcher"):
        result_a = llm.invoke("research topic A", config={"callbacks": [cb]})
    with ledger.tag("writer"):
        result_b = llm.invoke("write about it", config={"callbacks": [cb]})

print(ledger.summary())

What a report looks like

Real output from a multi-agent customer support pipeline:

Ledger Summary (customer_support_agent)
=======================================
Total spent:      $0.0807 / $5.0000
Total calls:      9  (8 ok, 1 failed)
Total tokens:     13,650 in / 4,352 out

By provider:
  anthropic     $0.0420
  openai        $0.0387

By tag:
  rag_agent     $0.0480  (4 calls)
  writer        $0.0327  (2 calls)
  classifier    $0.0000  (3 calls)

Agent loops detected:
  rag_agent     16 calls, $0.0480  (context grew 22.0x: 500 -> 11,000 tokens, stuck)

Waste:            $0.0697 (86.4% of spend)
  retries       $0.0557  (11,648 tokens)
  truncated     $0.0140  (2,524 tokens)
  Top repeated prompts:
    rag_query    x4  $0.0420
    answer       x2  $0.0382
  Errors:       RateLimitError=1

Caching:          11.7% hit rate, saved $0.0001
  Read tokens:  1,800

Missed caching opportunities: ~$0.0203 potential savings
  rag_query    x4 @ ~2,500 tokens could save $0.0184
  answer       x2 @ ~1,500 tokens could save $0.0019

Cheaper alternatives (same workload):
  openai/gpt-4o-mini                         $0.0047  (save 94%)
  anthropic/claude-3-haiku-20240307          $0.0089  (save 89%)
  openai/gpt-3.5-turbo                       $0.0134  (save 83%)

Five things you can act on immediately, all from one report.

Core features

Hard budget enforcement. BudgetExceeded is raised the moment a tracked operation crosses your cap. Runaway agents fail fast instead of burning credit.

Per-agent attribution via tags. Wrap blocks with ledger.tag("name") and the report shows where the money went per agent. Essential for multi-agent pipelines.

Waste decomposition. Retries, truncations, and failures are tracked separately with dollar amounts and token counts. Top repeated prompts and error types are surfaced automatically.

Counterfactual costs / cheaper alternatives. Every report shows what the same workload would have cost on smaller or different models, sorted by biggest savings first.

Prompt-caching analysis. Realized savings from cache hits are extracted from OpenAI and Anthropic responses. Missed opportunities — repeated long prompts that aren't using caching — are flagged with dollar estimates of what caching would save.

Agent-loop detection. When a tagged agent makes many LLM calls in one ledger run, the package flags it as a likely tool-calling loop, with context-growth ratio as severity.

Three ways to integrate — context manager (recommended), decorator (@track_budget), or manual ledger.record() for providers without auto-tracking.

Three ways to use it

Context manager (recommended)

with Ledger(budget="$5.00", name="nightly_batch") as ledger:
    run_my_agent()

print(ledger.summary())

Decorator

from agenticmeter import track_budget

@track_budget("$0.50", on_complete=lambda l: print(l.summary()))
def answer_question(q):
    ...

Manual recording

For providers we don't auto-track yet (Bedrock, Azure, Vertex AI, custom wrappers):

with Ledger(budget="$1.00") as ledger:
    response = my_bedrock_call(...)
    ledger.record(
        provider="anthropic",
        model="claude-3-5-sonnet-20241022",
        input_tokens=response["usage"]["input_tokens"],
        output_tokens=response["usage"]["output_tokens"],
    )

Provider support

Provider	Auto-tracked	LangChain	Manual record
OpenAI	✅	✅	✅
Anthropic	✅	✅	✅
Azure OpenAI	partial*	✅	✅
AWS Bedrock	—	✅	✅
Google Gemini	—	✅	✅
Cohere, Mistral, etc.	—	✅	✅

* Azure OpenAI uses the openai SDK and works via the OpenAI auto-tracker; verified for non-streaming calls.

Native auto-trackers for Bedrock, Azure, and Gemini are planned for v0.4. Until then, instrument them inside your wrapper with ledger.record(...) or use LangChain's ChatBedrock / AzureChatOpenAI / ChatGoogleGenerativeAI which route through our callback handler.

How it compares

	agenticmeter	tokencost	LangSmith	Langfuse	LiteLLM
Cost calculation	✅	✅	✅	✅	✅
Budget enforcement in-process	✅	—	—	—	proxy only
Counterfactual costs (savings on other models)	✅	—	—	—	—
Waste decomposition (retries, truncations, failed)	✅	—	partial	partial	—
Missed caching detection	✅	—	—	—	—
Realized cache savings	✅	—	—	—	—
Agent-loop detection	✅	—	—	—	—
Per-agent tagging	✅	—	✅	✅	✅
Works offline / no account	✅	✅	—	self-host	✅
Zero core dependencies	✅	—	—	—	—

Limitations (v0.3)

Streaming responses aren't auto-tracked yet. Non-streaming create() calls only. Streaming planned for v0.4.
Bedrock, Vertex AI, and Gemini lack native trackers. Use manual ledger.record() or route via LangChain.
Prompt retry detection is text-hash based. Semantically equivalent prompts with different wording are missed.
Threading propagation is manual. When using ThreadPoolExecutor, you need ctx = contextvars.copy_context() and executor.submit(ctx.run, fn, ...) for tags to propagate. Automated in v0.4.

Programmatic API

Everything in summary() is also available as data:

ledger.total_cost                       # float, USD
ledger.remaining_budget                 # float or None
ledger.total_calls                      # int
ledger.successful_calls                 # int
ledger.failed_calls                     # int
ledger.total_input_tokens               # int
ledger.total_output_tokens              # int

ledger.by_tag()                         # dict: tag -> cost
ledger.calls_by_tag()                   # dict: tag -> count
ledger.wasted_cost                      # float
ledger.wasted_cost_by_category()        # dict: retries/truncated/failed -> cost
ledger.wasted_tokens()                  # dict: same shape, tokens
ledger.most_retried_prompts(n=5)        # list of {prompt_hash, calls, cost}
ledger.errors_by_type()                 # dict: ErrorClass -> count

ledger.cache_hit_rate()                 # 0.0 to 1.0
ledger.realized_cache_savings()         # float, USD
ledger.caching_summary()                # dict
ledger.missed_caching_opportunities()   # list of actionable opportunities

ledger.detect_agent_loops()             # list of {tag, calls, growth_ratio, ...}
ledger.savings_opportunities()          # list of cheaper alternatives
ledger.counterfactuals()                # dict: model -> cost for this workload
ledger.cheapest_alternative()           # (provider, model, cost) tuple

ledger.to_dict()                        # full structured data
ledger.to_json()                        # JSON string

Pricing data

Prices live in src/agenticmeter/prices.json. Standard non-batch rates. Cache rates included: OpenAI cached reads at 50% of input, Anthropic at 10% read / 125% write.

To add a model or update a price, edit the JSON — no code changes.

License

MIT.

Status

Alpha (v0.3.0). APIs may shift. Battle-tested on real multi-agent pipelines. File issues with rough edges — they're useful.

Roadmap

See CHANGELOG.md for what's shipped. Coming in v0.4:

Streaming response auto-tracking
Native Bedrock, Azure, and Gemini trackers
Custom-wrapper @track_llm_call decorator
Automatic ContextVar propagation through ThreadPoolExecutor and asyncio
CLI for retrospective analysis of saved runs

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

May 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agenticmeter-0.3.0.tar.gz (22.8 kB view details)

Uploaded May 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agenticmeter-0.3.0-py3-none-any.whl (25.7 kB view details)

Uploaded May 28, 2026 Python 3

File details

Details for the file agenticmeter-0.3.0.tar.gz.

File metadata

Download URL: agenticmeter-0.3.0.tar.gz
Upload date: May 28, 2026
Size: 22.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for agenticmeter-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`fa1fac8a5d56741be2e33b225a53ae7f79317076beb2ed40c331707b2019dd7e`
MD5	`abe7cc7b3462b5cf3ad9d4300c5dd52e`
BLAKE2b-256	`a4b2e9218ac48bb339dd8325e4b7bc5c1191b073b1b89cdd55a9f906f4b72fd2`

See more details on using hashes here.

File details

Details for the file agenticmeter-0.3.0-py3-none-any.whl.

File metadata

Download URL: agenticmeter-0.3.0-py3-none-any.whl
Upload date: May 28, 2026
Size: 25.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for agenticmeter-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d9197e45fdec5515515ac1383031f267fa476d95cdddb47ccec829f78391d78e`
MD5	`1d417ad7841e791a07a6551a6113b831`
BLAKE2b-256	`cdba17dc8ae12042893badaaa7a7233e77c11a89f8d2734fc60aa7df9e94e716`

See more details on using hashes here.

agenticmeter 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agenticmeter

Why this exists

Install

Quickstart

OpenAI

Anthropic

LangChain (any provider it supports)

What a report looks like

Core features

Three ways to use it

Context manager (recommended)

Decorator

Manual recording

Provider support

How it compares

Limitations (v0.3)

Programmatic API

Pricing data

License

Status

Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes