Skip to main content

Token usage tracking and quota enforcement middleware for AI agent pipelines

Project description

🔑 AzureAICommunity - Agent - Token Guard Middleware

Token usage tracking and quota enforcement middleware for AI agent applications built on the Agent Framework.

PyPI Version Python Versions PyPI Downloads License PyPI Status

Track every token, enforce every limit — supports all providers, both streaming and non-streaming.

Getting Started · Configuration · Usage · Contributing


Overview

azureaicommunity-agent-token-guard is a plug-and-play token tracking and quota enforcement layer for AI agent pipelines built on agent-framework. It captures token usage per request, accumulates it against a period quota, and blocks future requests once the limit is hit — with zero changes to your existing agent code.


✨ Features

Feature
📊 Track token usage — captures input_tokens, output_tokens, total_tokens, model, and timestamp per request
🚫 Enforce quotas — blocks requests before they reach the LLM once a period limit is hit
🔔 Quota alerts — fires a callback when the limit is exceeded (log, notify, charge)
🌊 Streaming support — works with both stream=True and regular calls
📅 Period-flexible — built-in month_key, week_key, day_key or bring your own
👥 Per-user quotas — pluggable user_id_getter for multi-tenant apps
🗄️ Pluggable storage — implement QuotaStore protocol to use Redis, Postgres, etc.
🔌 Provider-agnostic — works with any agent-framework compatible LLM client

📦 Installation

pip install azureaicommunity-agent-token-guard

🚀 Quick Start

import asyncio, json
from agent_framework import Agent
from agent_framework.ollama import OllamaChatClient
from token_guard_middleware import TokenGuardMiddleware
from token_guard_middleware.token_tracker import InMemoryQuotaStore, QuotaExceededError

def save_usage(record):
    print(json.dumps(record, indent=2))

def quota_alert(payload):
    print("QUOTA EXCEEDED:", json.dumps(payload, indent=2))

quota_store = InMemoryQuotaStore()

middleware = TokenGuardMiddleware(
    on_usage=save_usage,
    on_quota_exceeded=quota_alert,
    quota_store=quota_store,
    quota_tokens=50,          # intentionally low to show quota enforcement
)

async def main():
    client = OllamaChatClient(model="gemma3:4b")
    agent = Agent(client)

    # First call — succeeds and records ~60 tokens (exceeds quota of 50)
    try:
        result = await agent.run("Hello!", middleware=[middleware])
        print(result.text)
    except QuotaExceededError as e:
        print(f"Blocked: {e}")

    # Second call — quota already exceeded, quota_alert fires and call is blocked
    try:
        result = await agent.run("How are you?", middleware=[middleware])
        print(result.text)
    except QuotaExceededError as e:
        print(f"Blocked: {e}")

asyncio.run(main())

🧑‍💻 Usage

Usage Record

Every call to on_usage receives a dict:

{
  "user_id": "anonymous",
  "period_key": "2026-04",
  "model": "gemma3:4b",
  "input_tokens": 11,
  "output_tokens": 52,
  "total_tokens": 63,
  "quota_tokens": 50,
  "used_tokens_after_call": 63,
  "timestamp_utc": "2026-04-14T11:46:09.698893+00:00",
  "streaming": false
}

Quota Alert Payload

When the quota is exceeded on_quota_exceeded receives:

{
  "user_id": "anonymous",
  "period_key": "2026-04",
  "used_tokens": 63,
  "quota_tokens": 50,
  "reason": "quota_exceeded_before_call"
}

⚙️ Configuration

TokenGuardMiddleware

Parameter Type Default Description
on_usage Callable[[dict], Any] required Called after every successful request with the usage record
quota_store QuotaStore required Storage backend for accumulated token counts
quota_tokens int required Max tokens allowed per period
on_quota_exceeded Callable[[dict], Any] None Called when quota is hit (before raising)
user_id_getter Callable[[ChatContext], str] default_user_id_getter Extracts user/tenant ID from context
period_key_fn Callable[[], str] month_key Returns the current billing period key

Period key functions

from token_guard_middleware.token_tracker import month_key, week_key, day_key

middleware = TokenGuardMiddleware(..., period_key_fn=month_key)   # Monthly (default)
middleware = TokenGuardMiddleware(..., period_key_fn=day_key)     # Daily
middleware = TokenGuardMiddleware(..., period_key_fn=week_key)    # Weekly

# Custom — e.g. per-user-per-day
middleware = TokenGuardMiddleware(
    ...,
    period_key_fn=lambda: f"{get_current_user_id()}-{day_key()}",
)

Per-user quotas

def get_user_id(context):
    return context.metadata.get("user_id", "anonymous")

middleware = TokenGuardMiddleware(
    ...,
    user_id_getter=get_user_id,
)

Custom Storage Backend

Implement the QuotaStore protocol to persist usage in Redis, Postgres, or any other store:

from token_guard_middleware.token_tracker import QuotaStore

class RedisQuotaStore:
    def get_usage(self, user_id: str, period_key: str) -> int:
        return int(redis.get(f"{user_id}:{period_key}") or 0)

    def add_usage(self, user_id: str, period_key: str, tokens: int) -> None:
        redis.incrby(f"{user_id}:{period_key}", tokens)

middleware = TokenGuardMiddleware(
    ...,
    quota_store=RedisQuotaStore(),
)

⚙️ How It Works

1. Intercept  →  middleware captures the outgoing agent request
2. Check      →  quota store is queried for current period usage
3. Block      →  if quota exceeded, raises QuotaExceededError before calling LLM
4. Forward    →  request proceeds to the LLM provider
5. Track      →  response token counts are extracted and written to quota store
6. Notify     →  on_usage callback fires with the full usage record

Provider Compatibility:

Works with any LLM client that implements the agent-framework ChatClient interface.


🤝 Contributing

Contributions are welcome! Please open an issue to discuss what you'd like to change before submitting a pull request.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Commit your changes (git commit -m 'Add my feature')
  4. Push to the branch (git push origin feature/my-feature)
  5. Open a Pull Request

📄 License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azureaicommunity_agent_token_guard-0.1.0.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file azureaicommunity_agent_token_guard-0.1.0.tar.gz.

File metadata

File hashes

Hashes for azureaicommunity_agent_token_guard-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e63e5e40ae8ab164566ade7b2c01b8324d2617d3172b76f9f22bab2f891bc708
MD5 dab66501bbc2e2e1a7bd025a8b6e024d
BLAKE2b-256 b2f2789283c7e23b98c2b5ada7dcd7f8119a785e1345bec0418fd2bda1f9af1d

See more details on using hashes here.

File details

Details for the file azureaicommunity_agent_token_guard-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for azureaicommunity_agent_token_guard-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f986a88910a5f58edcaee7a2e77ba4d239bafa3a4111089eadbc1d470bdee38e
MD5 aeccfd8f5efee74a3918717d26a3577a
BLAKE2b-256 3deb3cddf920de931150b0acb072a669cc2f2f1b1b02112d20a59e56aac2024e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page