A short description of your package

These details have not been verified by PyPI

Project description

a2a-llm-tracker

a2a-llm-tracker is a Python package that helps AI agents and applications track LLM usage and cost from a single place, across providers like OpenAI, Gemini, Anthropic, and others.

It is designed for agent-to-agent (A2A) systems where:

multiple agents make LLM calls
multiple providers are used
usage and cost need to be tracked centrally
streaming, async, and sync calls must all be supported

The package is LiteLLM-first, giving you multi-provider support with minimal integration effort.

Why a2a-llm-tracker?

LLM providers differ in:

SDKs and APIs
tokenization
pricing models
usage reporting (especially for streaming)

a2a-llm-tracker solves this by:

wrapping LLM calls instead of guessing usage
normalizing provider-reported usage into a single schema
computing cost using configurable pricing
attaching agent / user / session context
writing usage events to pluggable storage backends

Exact cost is recorded only when providers report usage.
This package does not fabricate billing data.

Features

✅ Multi-provider support via LiteLLM
✅ Sync, async, and streaming calls
✅ Exact token usage when available
✅ Cost calculation with user-defined pricing
✅ Context propagation for agents and sessions
✅ JSONL and SQLite sinks
✅ No heavy work on import
✅ No vendor lock-in

Installation

pip install a2a-llm-tracker[litellm]

Quickstart

1️⃣ Set your API key

Example for OpenAI:

export OPENAI_API_KEY=sk-xxxxxxxx

Import

from a2a_llm_tracker import Meter, PricingRegistry, meter_context

Create a tracker

from a2a_llm_tracker import Meter, PricingRegistry
from a2a_llm_tracker.sinks.jsonl import JSONLSink

pricing = PricingRegistry()
pricing.set_price(
    provider="openai",
    model="openai/gpt-4.1",
    input_per_million=2.0,
    output_per_million=8.0,
)

meter = Meter(
    pricing=pricing,
    sinks=[JSONLSink("usage.jsonl")],
    project="my-a2a-system",
)

Wrap Litellm

from a2a_llm_tracker.integrations.litellm import LiteLLM

llm = LiteLLM(meter=meter)

Use The package Sync

response = llm.completion(
    model="openai/gpt-4.1",
    messages=[
        {"role": "user", "content": "Say hello in one sentence."}
    ],
)

print(response)

Use package for Sync streaming

for chunk in llm.completion(
    model="openai/gpt-4.1",
    messages=[{"role": "user", "content": "Write a short poem."}],
    stream=True,
):
    print(chunk, end="", flush=True)

Streaming output is yielded as usual

Usage is recorded after the stream finishes

If the provider does not return usage for streams, accuracy is marked as unknown

Async Non Stereaming

response = await llm.acompletion(
    model="openai/gpt-4.1",
    messages=[{"role": "user", "content": "Async hello!"}],
)

Async Steraming

stream = await llm.acompletion(
    model="openai/gpt-4.1",
    messages=[{"role": "user", "content": "Stream async output"}],
    stream=True,
)

async for chunk in stream:
    print(chunk, end="", flush=True)

Agent and Session Context

from a2a_llm_tracker import meter_context

with meter_context(
    agent_id="planner-agent",
    session_id="session-123",
    user_id="user-456",
):
    llm.completion(
        model="openai/gpt-4.1",
        messages=[{"role": "user", "content": "Plan my day"}],
    )

Pricing is fully user controlled

Pricing Model

Pricing is fully user-controlled.

pricing.set_price(
    provider="openai",
    model="openai/gpt-4.1",
    input_per_million=2.0,
    output_per_million=8.0,
)

This supports:

enterprise pricing

price changes over time

multiple vendors with different rates

CCS Integration

The package supports sending usage events to mftsccs (CCS) for centralized tracking and analytics.

Setup

Set your CCS credentials as environment variables:

export CLIENT_ID=your_client_id
export CLIENT_SECRET=your_client_secret

Or create a .env file:

CLIENT_ID=your_client_id
CLIENT_SECRET=your_client_secret

Initialize with CCS using the async init function:

import os
import asyncio
from a2a_llm_tracker import init, get_llm

async def setup_ccs_meter():
    """Setup meter with CCS sink for mftsccs integration."""

    client_id = os.getenv("CLIENT_ID")
    client_secret = os.getenv("CLIENT_SECRET")

    if not client_id or not client_secret:
        raise ValueError(
            "CLIENT_ID and CLIENT_SECRET environment variables are required.\n"
            "Set them in your .env file or export them."
        )

    # Initialize meter with CCS sink
    meter = await init(
        client_id=client_id,
        client_secret=client_secret,
        application_name="my-app-name",
    )

    # Get the LiteLLM wrapper
    return get_llm(meter)

# Run the async setup
async def main():
    llm = await setup_ccs_meter()

    # Now use llm for completions
    response = await llm.acompletion(
        model="openai/gpt-4.1",
        messages=[{"role": "user", "content": "Hello!"}],
    )
    print(response)

asyncio.run(main())

Parameters

client_id: Your mftsccs client ID (used for authentication and entity connection)
client_secret: Your mftsccs client secret (authentication)
application_name: Name of your application (creates a tracker concept in CCS)

What Gets Tracked

The CCS sink creates and connects the following concepts:

the_llm_tracker - Your application tracker
the_llm_usage - Individual usage events with all metadata
the_llm_provider - Provider concepts (OpenAI, Anthropic, etc.)
the_llm_model - Model concepts
the_cost - Cost tracking
the_token_count - Token usage

Direct Response Analysis (Without Proxy)

If you're making LLM calls directly using provider SDKs (OpenAI, Gemini, Anthropic, etc.) and want to track usage without routing through the LiteLLM wrapper, use analyze_response.

Supported Providers

Provider	ResponseType
OpenAI	`ResponseType.OPENAI`
Google Gemini	`ResponseType.GEMINI`
Anthropic	`ResponseType.ANTHROPIC`
Cohere	`ResponseType.COHERE`
Mistral	`ResponseType.MISTRAL`
Groq	`ResponseType.GROQ`
Together AI	`ResponseType.TOGETHER`
AWS Bedrock	`ResponseType.BEDROCK`
Google Vertex AI	`ResponseType.VERTEX`
LiteLLM	`ResponseType.LITELLM`

Example: Track OpenAI Direct Calls

import asyncio
from openai import OpenAI
from a2a_llm_tracker import init, get_meter, analyze_response, ResponseType

async def main():
    # Initialize the tracker with CCS
    meter = await init(
        client_id="your_client_id",
        client_secret="your_client_secret",
        application_name="my-app",
    )

    # Make a direct OpenAI call (not through LiteLLM wrapper)
    openai_client = OpenAI()
    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Say hello!"},
        ],
    )

    # Analyze and record the response to CCS
    event = analyze_response(
        response=response,
        response_type=ResponseType.OPENAI,
        meter=meter,
        agent_id="my-agent",
    )

    print(f"Tracked: {event.total_tokens} tokens, ${event.cost_usd:.6f}")

asyncio.run(main())

Example: Track Gemini Direct Calls

import google.generativeai as genai
from a2a_llm_tracker import get_meter, analyze_response, ResponseType

genai.configure(api_key="your_gemini_api_key")
model = genai.GenerativeModel("gemini-1.5-flash")

response = model.generate_content("Hello!")

event = analyze_response(
    response=response,
    response_type=ResponseType.GEMINI,
    meter=get_meter(),
)

Example: Track Anthropic Direct Calls

from anthropic import Anthropic
from a2a_llm_tracker import get_meter, analyze_response, ResponseType

client = Anthropic()
response = client.messages.create(
    model="claude-3-5-sonnet-latest",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)

event = analyze_response(
    response=response,
    response_type=ResponseType.ANTHROPIC,
    meter=get_meter(),
)

Async Version

For async code, use analyze_response_async for better performance:

from a2a_llm_tracker import analyze_response_async, ResponseType

event = await analyze_response_async(
    response=response,
    response_type=ResponseType.OPENAI,
    meter=meter,
    agent_id="my-agent",
)

Parameters

Parameter	Type	Required	Description
`response`	Any	Yes	Raw response from the LLM provider (dict or SDK object)
`response_type`	ResponseType or str	Yes	Provider type (e.g., `ResponseType.OPENAI` or `"openai"`)
`meter`	Meter	Yes	The meter instance for cost calculation and recording
`model_override`	str	No	Override the model name from the response
`latency_ms`	int	No	Request latency in milliseconds
`agent_id`	str	No	Agent ID for attribution
`user_id`	str	No	User ID for attribution
`session_id`	str	No	Session ID for attribution
`trace_id`	str	No	Trace ID for attribution
`metadata`	dict	No	Additional metadata to include
`record`	bool	No	If True (default), record to sinks. Set False to only analyze.

What Gets Extracted

The analyzer extracts provider-specific information:

Tokens: input/output/total tokens
Cost: calculated from meter's pricing registry
Finish reason: why the generation stopped
Request ID: provider's request identifier
Cached tokens: (OpenAI) prompt caching info
Safety ratings: (Gemini) content safety metadata

Singleton Pattern (Recommended)

For applications making multiple LLM calls across different modules, use a singleton pattern to initialize the meter once and reuse it everywhere.

Step 1: Create a tracking module (e.g., tracking.py or db.py)

# tracking.py
import os
import asyncio
import concurrent.futures
from a2a_llm_tracker import init

_meter = None

def get_meter():
    """Get or initialize the global meter singleton."""
    global _meter
    if _meter is None:
        try:
            client_id = os.getenv("CLIENT_ID", "")
            client_secret = os.getenv("CLIENT_SECRET", "")

            # Run async init synchronously using a thread pool
            with concurrent.futures.ThreadPoolExecutor() as executor:
                future = executor.submit(
                    asyncio.run,
                    init(client_id, client_secret, "my-app")
                )
                _meter = future.result(timeout=5)

        except Exception as e:
            print(f"LLM tracking initialization failed: {e}")
            return None
    return _meter

Step 2: Use it anywhere in your application

# any_module.py
from openai import OpenAI
from a2a_llm_tracker import analyze_response, ResponseType
from tracking import get_meter

def call_openai(prompt: str):
    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
    )

    # Track LLM usage (fails silently if tracking not available)
    try:
        meter = get_meter()
        if meter:
            analyze_response(response, ResponseType.OPENAI, meter)
    except Exception as e:
        print(f"LLM tracking skipped: {e}")

    return response

This pattern:

Initializes the CCS connection only once on first use
Handles async initialization from sync code
Fails gracefully if credentials are missing
Works across multiple modules without re-initialization

This package does not

What This Package Does NOT Do

❌ Guess exact billing from raw text

❌ Replace provider SDKs

❌ Upload data anywhere automatically

❌ Require a backend or SaaS

Building this project

use the venv to build the environment

python -m venv .venv

pip install -e .

To build the project

python -m build

To publish the project

python -m twine upload dist/*

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.0.32

Apr 13, 2026

0.0.31

Apr 13, 2026

0.0.30

Apr 13, 2026

0.0.29

Apr 13, 2026

0.0.28

Apr 13, 2026

0.0.27

Apr 13, 2026

0.0.26

Apr 13, 2026

0.0.25

Apr 13, 2026

0.0.24

Apr 10, 2026

0.0.23

Apr 10, 2026

0.0.22

Jan 19, 2026

0.0.21

Jan 19, 2026

0.0.20

Jan 16, 2026

0.0.18

Jan 16, 2026

0.0.17

Jan 16, 2026

0.0.16

Jan 16, 2026

0.0.15

Jan 16, 2026

0.0.14

Jan 15, 2026

0.0.13

Jan 15, 2026

0.0.12

Jan 15, 2026

0.0.11

Dec 25, 2025

0.0.10

Dec 25, 2025

0.0.9

Dec 25, 2025

This version

0.0.8

Dec 24, 2025

0.0.7

Dec 23, 2025

0.0.6

Dec 23, 2025

0.0.5

Dec 23, 2025

0.0.4

Dec 23, 2025

0.0.3

Dec 21, 2025

0.0.2

Dec 21, 2025

0.0.1

Dec 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

a2a_llm_tracker-0.0.8.tar.gz (35.6 kB view details)

Uploaded Dec 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

a2a_llm_tracker-0.0.8-py3-none-any.whl (36.8 kB view details)

Uploaded Dec 24, 2025 Python 3

File details

Details for the file a2a_llm_tracker-0.0.8.tar.gz.

File metadata

Download URL: a2a_llm_tracker-0.0.8.tar.gz
Upload date: Dec 24, 2025
Size: 35.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for a2a_llm_tracker-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`8ceadaf1e71e14cdd35bf55e4d1cf8fa52a6fe44107bdb13e9e03ccc370931e0`
MD5	`8a220a5f806b5d744a8fc456edb03501`
BLAKE2b-256	`e3e66fbf6ec55f3282b947a932e1bcaa893c7a250db7df6e6a25051dd1662c34`

See more details on using hashes here.

File details

Details for the file a2a_llm_tracker-0.0.8-py3-none-any.whl.

File metadata

Download URL: a2a_llm_tracker-0.0.8-py3-none-any.whl
Upload date: Dec 24, 2025
Size: 36.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for a2a_llm_tracker-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fff2c87a1dab75ed5a6783db1fcebaa9e91c8a579ecbba4197876aaad1ab13ef`
MD5	`603c4c4a08f03181065bc31aa127d761`
BLAKE2b-256	`c939c0b94e3eb8a64141143dbde57b6ea1598574518248e1a87977db00439d48`

See more details on using hashes here.

a2a-llm-tracker 0.0.8

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

a2a-llm-tracker

Why a2a-llm-tracker?

Features

Installation

Quickstart

1️⃣ Set your API key

Import

Create a tracker

Wrap Litellm

Use The package Sync

Use package for Sync streaming

Async Non Stereaming

Async Steraming

Agent and Session Context

Pricing is fully user controlled

CCS Integration

Setup

Parameters

What Gets Tracked

Direct Response Analysis (Without Proxy)

Supported Providers

Example: Track OpenAI Direct Calls

Example: Track Gemini Direct Calls

Example: Track Anthropic Direct Calls

Async Version

Parameters

What Gets Extracted

Singleton Pattern (Recommended)

This package does not

Building this project

To build the project

To publish the project

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes