Token usage tracking and quota enforcement middleware for AI agent pipelines
Project description
🔑 AzureAICommunity - Agent - Token Guard Middleware
Token usage tracking and quota enforcement middleware for AI agent applications built on the Agent Framework.
Track every token, enforce every limit — supports all providers, both streaming and non-streaming.
Overview
azureaicommunity-agent-token-guard is a plug-and-play token tracking and quota enforcement layer for AI agent pipelines built on agent-framework. It captures token usage per request, accumulates it against a period quota, and blocks future requests once the limit is hit — with zero changes to your existing agent code.
✨ Features
| Feature | |
|---|---|
| 📊 | Track token usage — captures input_tokens, output_tokens, total_tokens, model, and timestamp per request |
| 🚫 | Enforce quotas — blocks requests before they reach the LLM once a period limit is hit |
| 🔔 | Quota alerts — fires a callback when the limit is exceeded (log, notify, charge) |
| 🌊 | Streaming support — works with both stream=True and regular calls |
| 📅 | Period-flexible — built-in month_key, week_key, day_key or bring your own |
| 👥 | Per-user quotas — pluggable user_id_getter for multi-tenant apps |
| 🗄️ | Pluggable storage — implement QuotaStore protocol to use Redis, Postgres, etc. |
| 🔌 | Provider-agnostic — works with any agent-framework compatible LLM client |
📦 Installation
pip install azureaicommunity-agent-token-guard
🚀 Quick Start
import asyncio, json
from agent_framework import Agent
from agent_framework.ollama import OllamaChatClient
from token_guard_middleware import TokenGuardMiddleware
from token_guard_middleware.token_tracker import InMemoryQuotaStore, QuotaExceededError
def save_usage(record):
print(json.dumps(record, indent=2))
def quota_alert(payload):
print("QUOTA EXCEEDED:", json.dumps(payload, indent=2))
quota_store = InMemoryQuotaStore()
middleware = TokenGuardMiddleware(
on_usage=save_usage,
on_quota_exceeded=quota_alert,
quota_store=quota_store,
quota_tokens=50, # intentionally low to show quota enforcement
)
async def main():
client = OllamaChatClient(model="gemma3:4b")
agent = Agent(client)
# First call — succeeds and records ~60 tokens (exceeds quota of 50)
try:
result = await agent.run("Hello!", middleware=[middleware])
print(result.text)
except QuotaExceededError as e:
print(f"Blocked: {e}")
# Second call — quota already exceeded, quota_alert fires and call is blocked
try:
result = await agent.run("How are you?", middleware=[middleware])
print(result.text)
except QuotaExceededError as e:
print(f"Blocked: {e}")
asyncio.run(main())
🧑💻 Usage
Usage Record
Every call to on_usage receives a dict:
{
"user_id": "anonymous",
"period_key": "2026-04",
"model": "gemma3:4b",
"input_tokens": 11,
"output_tokens": 52,
"total_tokens": 63,
"quota_tokens": 50,
"used_tokens_after_call": 63,
"timestamp_utc": "2026-04-14T11:46:09.698893+00:00",
"streaming": false
}
Quota Alert Payload
When the quota is exceeded on_quota_exceeded receives:
{
"user_id": "anonymous",
"period_key": "2026-04",
"used_tokens": 63,
"quota_tokens": 50,
"reason": "quota_exceeded_before_call"
}
⚙️ Configuration
TokenGuardMiddleware
| Parameter | Type | Default | Description |
|---|---|---|---|
on_usage |
Callable[[dict], Any] |
required | Called after every successful request with the usage record |
quota_store |
QuotaStore |
required | Storage backend for accumulated token counts |
quota_tokens |
int |
required | Max tokens allowed per period |
on_quota_exceeded |
Callable[[dict], Any] |
None |
Called when quota is hit (before raising) |
user_id_getter |
Callable[[ChatContext], str] |
default_user_id_getter |
Extracts user/tenant ID from context |
period_key_fn |
Callable[[], str] |
month_key |
Returns the current billing period key |
Period key functions
from token_guard_middleware.token_tracker import month_key, week_key, day_key
middleware = TokenGuardMiddleware(..., period_key_fn=month_key) # Monthly (default)
middleware = TokenGuardMiddleware(..., period_key_fn=day_key) # Daily
middleware = TokenGuardMiddleware(..., period_key_fn=week_key) # Weekly
# Custom — e.g. per-user-per-day
middleware = TokenGuardMiddleware(
...,
period_key_fn=lambda: f"{get_current_user_id()}-{day_key()}",
)
Per-user quotas
def get_user_id(context):
return context.metadata.get("user_id", "anonymous")
middleware = TokenGuardMiddleware(
...,
user_id_getter=get_user_id,
)
Custom Storage Backend
Implement the QuotaStore protocol to persist usage in Redis, Postgres, or any other store:
from token_guard_middleware.token_tracker import QuotaStore
class RedisQuotaStore:
def get_usage(self, user_id: str, period_key: str) -> int:
return int(redis.get(f"{user_id}:{period_key}") or 0)
def add_usage(self, user_id: str, period_key: str, tokens: int) -> None:
redis.incrby(f"{user_id}:{period_key}", tokens)
middleware = TokenGuardMiddleware(
...,
quota_store=RedisQuotaStore(),
)
⚙️ How It Works
1. Intercept → middleware captures the outgoing agent request
2. Check → quota store is queried for current period usage
3. Block → if quota exceeded, raises QuotaExceededError before calling LLM
4. Forward → request proceeds to the LLM provider
5. Track → response token counts are extracted and written to quota store
6. Notify → on_usage callback fires with the full usage record
Provider Compatibility:
Works with any LLM client that implements the agent-framework ChatClient interface.
🤝 Contributing
Contributions are welcome! Please open an issue to discuss what you'd like to change before submitting a pull request.
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-feature) - Commit your changes (
git commit -m 'Add my feature') - Push to the branch (
git push origin feature/my-feature) - Open a Pull Request
📄 License
MIT — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file azureaicommunity_agent_token_guard-0.1.0.tar.gz.
File metadata
- Download URL: azureaicommunity_agent_token_guard-0.1.0.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e63e5e40ae8ab164566ade7b2c01b8324d2617d3172b76f9f22bab2f891bc708
|
|
| MD5 |
dab66501bbc2e2e1a7bd025a8b6e024d
|
|
| BLAKE2b-256 |
b2f2789283c7e23b98c2b5ada7dcd7f8119a785e1345bec0418fd2bda1f9af1d
|
File details
Details for the file azureaicommunity_agent_token_guard-0.1.0-py3-none-any.whl.
File metadata
- Download URL: azureaicommunity_agent_token_guard-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f986a88910a5f58edcaee7a2e77ba4d239bafa3a4111089eadbc1d470bdee38e
|
|
| MD5 |
aeccfd8f5efee74a3918717d26a3577a
|
|
| BLAKE2b-256 |
3deb3cddf920de931150b0acb072a669cc2f2f1b1b02112d20a59e56aac2024e
|