Skip to main content

Budget evaluator for agent-control -- cumulative LLM cost and token tracking

Project description

agent-control-evaluator-budget

Budget evaluator for agent-control that tracks cumulative LLM token and cost usage per scope and time window.

Install

pip install "agent-control-evaluators[budget]"

Fallback direct wheel install:

pip install agent-control-evaluator-budget

For local development:

uv pip install -e evaluators/contrib/budget

Quickstart

from agent_control_evaluator_budget.budget import (
    BudgetEvaluatorConfig,
    BudgetLimitRule,
    ModelPricing,
)

config = BudgetEvaluatorConfig(
    budget_id="support-daily",
    limits=[
        BudgetLimitRule(
            scope={"agent": "support"},
            group_by="user_id",
            window_seconds=86_400,
            limit=500,
            limit_unit="usd_cents",
        ),
        BudgetLimitRule(
            scope={"agent": "support"},
            group_by="user_id",
            window_seconds=86_400,
            limit=50_000,
            limit_unit="tokens",
        ),
    ],
    pricing={
        "gpt-4.1-mini": ModelPricing(input_per_1k=0.04, output_per_1k=0.16),
    },
    model_path="model",
    metadata_paths={
        "agent": "metadata.agent",
        "user_id": "metadata.user_id",
    },
    unknown_model_behavior="block",
)

The evaluator reads token usage from standard fields such as usage.input_tokens and usage.output_tokens. Configure token_path only when your event shape uses a custom location.

Scope and group_by

Each BudgetLimitRule has a static scope and an optional group_by field.

scope filters which events a rule applies to. A rule with scope={"agent": "support"} only applies when extracted metadata contains agent="support". An empty scope is global.

group_by creates independent buckets per extracted metadata value. The common per-user pattern is:

BudgetLimitRule(
    scope={"agent": "support"},
    group_by="user_id",
    window_seconds=86_400,
    limit=500,
    limit_unit="usd_cents",
)

With metadata_paths={"user_id": "metadata.user_id"}, each user gets a separate daily budget inside the support scope.

Budget pools

budget_id identifies the accumulated budget pool.

Evaluators with the same budget_id share accumulated spend and token totals across all evaluator instances. Each evaluator still evaluates using its own configured rules -- the shared state is the bucket (the rolling sum), not the rule set. Evaluators with different budget_id values are fully isolated.

Use stable names such as support-daily, billing-global, or tenant-acme-monthly. Avoid generating a new budget_id per request unless each request should have an isolated budget.

Pricing

ModelPricing stores cost rates in cents per 1K tokens:

ModelPricing(input_per_1k=0.04, output_per_1k=0.16)

input_per_1k is applied to input tokens. output_per_1k is applied to output tokens.

Pricing and model_path are required when any rule uses limit_unit="usd_cents". Token-only rules can omit both. If an event uses a model that is not in the pricing table and a cost rule exists, unknown_model_behavior="block" fails closed. Use "warn" to log a warning and treat the cost as 0.

Dual Ceiling Pattern

Use two evaluators when cost and token ceilings need independent control records or different budget_id pools:

cost_config = BudgetEvaluatorConfig(
    budget_id="support-cost-daily",
    limits=[
        BudgetLimitRule(
            scope={"agent": "support"},
            group_by="user_id",
            window_seconds=86_400,
            limit=500,
            limit_unit="usd_cents",
        )
    ],
    pricing={
        "gpt-4.1-mini": ModelPricing(input_per_1k=0.04, output_per_1k=0.16),
    },
    model_path="model",
    metadata_paths={"agent": "metadata.agent", "user_id": "metadata.user_id"},
)

token_config = BudgetEvaluatorConfig(
    budget_id="support-token-daily",
    limits=[
        BudgetLimitRule(
            scope={"agent": "support"},
            group_by="user_id",
            window_seconds=86_400,
            limit=50_000,
            limit_unit="tokens",
        )
    ],
    metadata_paths={"agent": "metadata.agent", "user_id": "metadata.user_id"},
)

This pattern lets cost and token budgets reset, alert, and roll out independently. A single evaluator can also contain both rules when one shared pool and one control result are sufficient.

Limitations

InMemoryBudgetStore is single-process only. State is lost on restart and is not shared across workers or pods.

Use a distributed store for production deployments that run multiple processes, multiple workers, or multiple pods.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_control_evaluator_budget-7.7.0.tar.gz (25.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file agent_control_evaluator_budget-7.7.0.tar.gz.

File metadata

File hashes

Hashes for agent_control_evaluator_budget-7.7.0.tar.gz
Algorithm Hash digest
SHA256 a368a06344d65f8910b86feff0de748977a602800c58a02109449028cb811dc6
MD5 11525f54a830b6f0aceb8a89c7583478
BLAKE2b-256 e332db46a0c0472547f9449045b79ff4103dfc5a2e3d33de4de3474d691cf487

See more details on using hashes here.

File details

Details for the file agent_control_evaluator_budget-7.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_control_evaluator_budget-7.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 19a28413b5968003fb3dd7a3b10120b6deeb5986da53664cc9f356408f271b47
MD5 d8a1f784368dac56ebf5341634f80f5d
BLAKE2b-256 424fe57b6eb37a85ada11cb350942bafc820594f6b52e59c047b91e64a783c48

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page