Token usage tracking and quota enforcement middleware for AI agent pipelines

These details have not been verified by PyPI

Project links

Project description

🔑 AzureAICommunity - Agent - Token Guard Middleware

Token usage tracking and quota enforcement middleware for AI agent applications built on the Agent Framework.

PyPI Downloads

Track every token, enforce every limit — supports all providers, both streaming and non-streaming.

Getting Started · Configuration · Usage · Contributing

Overview

azureaicommunity-agent-token-guard is a plug-and-play token tracking and quota enforcement layer for AI agent pipelines built on agent-framework. It captures token usage per request, accumulates it against a period quota, and blocks future requests once the limit is hit — with zero changes to your existing agent code.

✨ Features

	Feature
📊	Track token usage — captures `input_tokens`, `output_tokens`, `total_tokens`, model, and timestamp per request
🚫	Enforce quotas — blocks requests before they reach the LLM once a period limit is hit
🔔	Quota alerts — fires a callback when the limit is exceeded (log, notify, charge)
🌊	Streaming support — works with both `stream=True` and regular calls
📅	Period-flexible — built-in `month_key`, `week_key`, `day_key` or bring your own
👥	Per-user quotas — pluggable `user_id_getter` for multi-tenant apps
🗄️	Pluggable storage — implement `QuotaStore` protocol to use Redis, Postgres, etc.
🔌	Provider-agnostic — works with any `agent-framework` compatible LLM client

📦 Installation

pip install azureaicommunity-agent-token-guard

🚀 Quick Start

import asyncio, json
from agent_framework import Agent
from agent_framework.ollama import OllamaChatClient
from token_guard_middleware import TokenGuardMiddleware
from token_guard_middleware.token_tracker import InMemoryQuotaStore, QuotaExceededError

def save_usage(record):
    print(json.dumps(record, indent=2))

def quota_alert(payload):
    print("QUOTA EXCEEDED:", json.dumps(payload, indent=2))

quota_store = InMemoryQuotaStore()

middleware = TokenGuardMiddleware(
    on_usage=save_usage,
    on_quota_exceeded=quota_alert,
    quota_store=quota_store,
    quota_tokens=50,          # intentionally low to show quota enforcement
)

async def main():
    client = OllamaChatClient(model="gemma3:4b")
    agent = Agent(client)

    # First call — succeeds and records ~60 tokens (exceeds quota of 50)
    try:
        result = await agent.run("Hello!", middleware=[middleware])
        print(result.text)
    except QuotaExceededError as e:
        print(f"Blocked: {e}")

    # Second call — quota already exceeded, quota_alert fires and call is blocked
    try:
        result = await agent.run("How are you?", middleware=[middleware])
        print(result.text)
    except QuotaExceededError as e:
        print(f"Blocked: {e}")

asyncio.run(main())

🧑‍💻 Usage

Usage Record

Every call to on_usage receives a dict:

{
  "user_id": "anonymous",
  "period_key": "2026-04",
  "model": "gemma3:4b",
  "input_tokens": 11,
  "output_tokens": 52,
  "total_tokens": 63,
  "quota_tokens": 50,
  "used_tokens_after_call": 63,
  "timestamp_utc": "2026-04-14T11:46:09.698893+00:00",
  "streaming": false
}

Quota Alert Payload

When the quota is exceeded on_quota_exceeded receives:

{
  "user_id": "anonymous",
  "period_key": "2026-04",
  "used_tokens": 63,
  "quota_tokens": 50,
  "reason": "quota_exceeded_before_call"
}

⚙️ Configuration

`TokenGuardMiddleware`

Parameter	Type	Default	Description
`on_usage`	`Callable[[dict], Any]`	required	Called after every successful request with the usage record
`quota_store`	`QuotaStore`	required	Storage backend for accumulated token counts
`quota_tokens`	`int`	required	Max tokens allowed per period
`on_quota_exceeded`	`Callable[[dict], Any]`	`None`	Called when quota is hit (before raising)
`user_id_getter`	`Callable[[ChatContext], str]`	`default_user_id_getter`	Extracts user/tenant ID from context
`period_key_fn`	`Callable[[], str]`	`month_key`	Returns the current billing period key

Period key functions

from token_guard_middleware.token_tracker import month_key, week_key, day_key

middleware = TokenGuardMiddleware(..., period_key_fn=month_key)   # Monthly (default)
middleware = TokenGuardMiddleware(..., period_key_fn=day_key)     # Daily
middleware = TokenGuardMiddleware(..., period_key_fn=week_key)    # Weekly

# Custom — e.g. per-user-per-day
middleware = TokenGuardMiddleware(
    ...,
    period_key_fn=lambda: f"{get_current_user_id()}-{day_key()}",
)

Per-user quotas

def get_user_id(context):
    return context.metadata.get("user_id", "anonymous")

middleware = TokenGuardMiddleware(
    ...,
    user_id_getter=get_user_id,
)

Custom Storage Backend

Implement the QuotaStore protocol to persist usage in Redis, Postgres, or any other store:

from token_guard_middleware.token_tracker import QuotaStore

class RedisQuotaStore:
    def get_usage(self, user_id: str, period_key: str) -> int:
        return int(redis.get(f"{user_id}:{period_key}") or 0)

    def add_usage(self, user_id: str, period_key: str, tokens: int) -> None:
        redis.incrby(f"{user_id}:{period_key}", tokens)

middleware = TokenGuardMiddleware(
    ...,
    quota_store=RedisQuotaStore(),
)

⚙️ How It Works

1. Intercept  →  middleware captures the outgoing agent request
2. Check      →  quota store is queried for current period usage
3. Block      →  if quota exceeded, raises QuotaExceededError before calling LLM
4. Forward    →  request proceeds to the LLM provider
5. Track      →  response token counts are extracted and written to quota store
6. Notify     →  on_usage callback fires with the full usage record

Provider Compatibility:

Works with any LLM client that implements the agent-framework ChatClient interface.

🤝 Contributing

Contributions are welcome! Please open an issue to discuss what you'd like to change before submitting a pull request.

Fork the repository
Create a feature branch (git checkout -b feature/my-feature)
Commit your changes (git commit -m 'Add my feature')
Push to the branch (git push origin feature/my-feature)
Open a Pull Request

📄 License

MIT — see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azureaicommunity_agent_token_guard-0.1.0.tar.gz (7.4 kB view details)

Uploaded Apr 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

azureaicommunity_agent_token_guard-0.1.0-py3-none-any.whl (9.1 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file azureaicommunity_agent_token_guard-0.1.0.tar.gz.

File metadata

Download URL: azureaicommunity_agent_token_guard-0.1.0.tar.gz
Upload date: Apr 14, 2026
Size: 7.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for azureaicommunity_agent_token_guard-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e63e5e40ae8ab164566ade7b2c01b8324d2617d3172b76f9f22bab2f891bc708`
MD5	`dab66501bbc2e2e1a7bd025a8b6e024d`
BLAKE2b-256	`b2f2789283c7e23b98c2b5ada7dcd7f8119a785e1345bec0418fd2bda1f9af1d`

See more details on using hashes here.

File details

Details for the file azureaicommunity_agent_token_guard-0.1.0-py3-none-any.whl.

File metadata

Download URL: azureaicommunity_agent_token_guard-0.1.0-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 9.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for azureaicommunity_agent_token_guard-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f986a88910a5f58edcaee7a2e77ba4d239bafa3a4111089eadbc1d470bdee38e`
MD5	`aeccfd8f5efee74a3918717d26a3577a`
BLAKE2b-256	`3deb3cddf920de931150b0acb072a669cc2f2f1b1b02112d20a59e56aac2024e`

See more details on using hashes here.

azureaicommunity-agent-token-guard 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🔑 AzureAICommunity - Agent - Token Guard Middleware

Overview

✨ Features

📦 Installation

🚀 Quick Start

🧑‍💻 Usage

Usage Record

Quota Alert Payload

⚙️ Configuration

TokenGuardMiddleware

Period key functions

Per-user quotas

Custom Storage Backend

⚙️ How It Works

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`TokenGuardMiddleware`