Skip to main content

Agent Token Manager (agent-atm) - A lightweight, privacy-first LLM token usage metering & quota capping SDK.

Project description

Interactive Theme Toggling Demo

Agent Token Manager (agent-atm)

PyPI Version Python Versions License Tests

agent-atm is a lightweight, privacy-first Python SDK built to monitor, measure, and cap LLM token consumption natively inside application workflows.

Designed as a high-performance observability and control utility for agentic systems, it plugs seamlessly into any model or agent framework to record precise token metrics, manage nested metadata scopes, and enforce real-time budget quotas over daily, hourly, or minute-level windows.


✨ Key Features

  • Plug-and-Play Observability: An intuitive API designed to feel as familiar as Python's standard logging library.
  • Extensible Interface Architecture: Built on clean, developer-friendly abstractions (interfaces for storage managers and tokenizers), making custom integrations simple.
  • Privacy-First Guarantee: Zero raw prompt or response text storage. All incoming text is parsed strictly in-memory to calculate token metrics and instantly discarded.
  • Flexible Storage Managers: Shipped with a thread-safe InMemoryManager for rapid local testing and a robust SqliteManager for persistent single-node deployments.
  • Centralized Telemetry Daemon: A built-in FastAPI server serving as a centralized collector to asynchronously gather token events from distributed client instances.
  • Premium Visual Analytics: A modern, interactive dark-mode dashboard to view token trends, audit active configurations, and track live budget allocations.

🚀 Quick Setup

Get up and running in less than 60 seconds using either the local SDK or a centralized standalone telemetry server.

1. Installation

pip install agent-atm

2. Setup Storage & Record Telemetry

Perfect for Python applications running in-process. Initialize the SQLite storage engine and begin logging request and response telemetry:

import agent_atm as atm

# Initialize local SQLite persistent database
atm.init(data_manager="sqlite", db_path="agent_atm.db", default_app_id="customer-bot")

# Record a request event with token count and tags
atm.add_user_request(token_count=32, _additional_metadata_tags=["user-prompt"])

# Record a response event with token count and tags
atm.add_model_response(token_count=120, _additional_metadata_tags=["gemini-response"])

Granular Observability: For token usage analysis, atm allow the recording of context attributes like, model_id, username, session_id, app_id, token_count, list tags (_additional_metadata_tags), and arbitrary key-value config dicts (_additional_metadata_config).

Feature: Direct LLMPayload Dataclass Logging

For advanced configurations, wrap LLM inputs in an explicit LLMPayload object:

from agent_atm.types import LLMPayload

payload = LLMPayload(
    token_count_override=45,
    model_id="example-model",
    event_type="request",  # or "response"
    _additional_metadata_tags=["dev-test"],
    _additional_metadata_config={"node_id": "emea-east-1"}
)
atm.add_user_request(payload)

Feature: Nested Context Scoping

Cascade session IDs, user attributes, and tags cleanly across deeply nested function calls without passing parameters down the stack:

with atm.context(
    session_id="session-abc-123", 
    username="vip-user", 
    _additional_metadata_tags=["production"],
    department="finance" # Custom key-value configs are dynamically captured!
):
    # Seamlessly inherits session_id, username, tags, and department configs
    atm.add_user_request("How does compound interest work?", model_id="gemini-2.5-pro")

Feature: Native Google Gemini Observability

When passing a real google-genai SDK client response, agent-atm automatically extracts precise metrics directly from the native Google usage metadata:

import os
from google import genai
import agent_atm as atm

# 1. Initialize ATM
atm.init(data_manager="sqlite", db_path="usage.db")

# 2. Initialize standard GenAI client
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

# 3. Track LLM workflow
with atm.context(session_id="sess-vip-99", username="alice@example.com"):
    prompt = "Draft a professional email response regarding refund query."
    
    # Count and log the request prompt
    atm.add_user_request(prompt, model_id="gemini-2.5-flash")
    
    # Call Gemini
    response = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=prompt,
    )
    
    # Record native response: ATM auto-extracts exact candidate and prompt counts from usage_metadata!
    atm.add_model_response(response, model_id="gemini-2.5-flash")

3. Pure Web API Style: Central Telemetry Server & curl Commands

Perfect for enterprise microservice environments. Run the agent-atm server standalone and report events from any programming language via standard REST HTTP requests:

Launch the Standalone Telemetry Daemon:

ATM_DB_PATH=agent_atm.db uvicorn agent_atm.dashboard.server:app --host 127.0.0.1 --port 8000

Push Telemetry via curl (fully independent of Python/SDK):

# Log a User Request Event
curl -X POST http://127.0.0.1:8000/api/events \
  -H "Content-Type: application/json" \
  -d '{
    "event_type": "request",
    "token_count": 45,
    "model_id": "gemini-2.5-pro",
    "username": "alice@company.com",
    "session_id": "session-abc-999",
    "app_id": "finance-agent",
    "tags": ["api-call", "production"],
    "config": {"node_id": "aws-east-1"}
  }'

# Log a Model Response Event
curl -X POST http://127.0.0.1:8000/api/events \
  -H "Content-Type: application/json" \
  -d '{
    "event_type": "response",
    "token_count": 180,
    "model_id": "gemini-2.5-pro",
    "username": "alice@company.com",
    "session_id": "session-abc-999",
    "app_id": "finance-agent",
    "tags": ["api-response", "production"]
  }'

🤖 Native Tokenizer Integrations

For specific families like Google Gemini (google-genai SDK) and Google Gemma (Gemma3Tokenizer), agent-atm provides in-built tokenizer mappings.

Instead of calculating token counts manually, you can pass the raw string content directly and let the SDK compute and track metrics automatically:

# Pass the raw prompt: ATM automatically tokenizes and counts the metrics under the hood!
atm.add_user_request("Explain quantum computing in simple terms.", model_id="gemma-3")

[!TIP] Custom Tokenizer Extensibility: Using another provider? Check our baseline BaseTokenizerIntegration class to see how easy it is to implement a custom tokenizer module.


🛡️ ATM Controls: Rules, Hooks & Quota Caps

Take complete control of your LLM consumption using reactive quota caps and custom event interceptors:

1. Dynamic Quota Budgeting

Enforce strict token ceilings over minute, hourly, or daily windows per user or app scope. Exceeding a blocking quota raises a TokenQuotaExceeded exception, allowing you to gracefully handle and reject further LLM calls:

# Limit free-tier users to 500 tokens per minute
atm.limits.add(
    scope=atm.Scope(user="free-tier"),
    quota=atm.Quota(minute_limit=500),
    alert_level=atm.AlertLevel.BLOCKING
)

with atm.context(username="free-tier"):
    try:
        # If minute consumption exceeds 500, this raises TokenQuotaExceeded
        atm.add_user_request("Some very long prompt text...", token_count=600)
    except atm.TokenQuotaExceeded as e:
        print(f"Request Blocked: {e}")

2. Pre & Post Hook Interceptors

Register custom hook decorators to validate contexts, mutate event scopes, or trigger asynchronous notifications (like Slack webhooks) around database writes:

@atm.hook("pre")
def pre_save_audit(event):
    # Mutate or validate event metadata BEFORE the database write
    event._additional_metadata_tags.append("audited")

@atm.hook("post")
def slack_alert(event):
    # Trigger non-blocking alerts AFTER the event is successfully written
    if event.token_count > 10000:
        trigger_slack_notification(f"Warning: High token consumption detected: {event.token_count}")

📊 Real-Time Analytics Dashboard

Start the telemetry metrics daemon to view real-time consumption trend lines, aggregate app metrics, top-consuming users, and live event telemetry logs inside a premium visual dashboard:

ATM_DB_PATH=agent_atm.db uvicorn agent_atm.dashboard.server:app --reload --host 127.0.0.1 --port 8000

Open your web browser to http://127.0.0.1:8000 to access the visual console.


📖 Additional Resources

  • GEMINI.md: Native Gemini & Gemma Tokenizer Integration Handbook.
  • CONTRIBUTING.md: Contribution Rules, Virtual Env Setup, and Automated Testing Suite Guide.
  • FUTURE.md: TimescaleDB, Distributed Redis Lock, and Remote Buffer scaling roadmaps. s scaling roadmaps.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_atm-0.1.1.tar.gz (36.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_atm-0.1.1-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file agent_atm-0.1.1.tar.gz.

File metadata

  • Download URL: agent_atm-0.1.1.tar.gz
  • Upload date:
  • Size: 36.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.4

File hashes

Hashes for agent_atm-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c2bcab52da67212ebf4587e7300aa6d4d42eeb38a602fab3ab2069c71198ee0b
MD5 bcccfeaba16b5de622b645932af6c60b
BLAKE2b-256 e127fcc29bed8ace61a0b1f1b29e63e7913db12b54f8131b638b6e502bde0312

See more details on using hashes here.

File details

Details for the file agent_atm-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: agent_atm-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.4

File hashes

Hashes for agent_atm-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c7a6b9e988f84d82689e270ef53695b151e2c159e694093e27656f2fdd04a46a
MD5 c72681a15934472320cf95decf90c14f
BLAKE2b-256 c791853b901e19632d1129dd76891653441d37aeed62497cf5b772d9adffb427

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page