Skip to main content

Tokenome wrapper-client SDK for metadata-only LLM telemetry

Project description

Tokenome Python SDK

Tokenome SDK is a wrapper-client telemetry SDK for LLM applications. It captures metadata (model, tokens, latency, status) without capturing prompt or response content by default.

Design Philosophy

  • Wrapper-client only: No monkey patching, no framework middleware as primary path
  • Metadata-only telemetry: No prompt or response content capture by default
  • Fail-open: Telemetry failures never crash user application code
  • Durable by default: durable_local uses SQLite spool; best_effort is opt-out memory-only
  • Transparent: Same request, same response, same exception, same stream — plus telemetry

Supported Providers

OpenAI

Operation Status Usage Extraction Notes
responses.create input_tokens, output_tokens, total_tokens, cached_input_tokens, reasoning_tokens Supports streaming
responses.parse Same as responses.create Structured output
chat.completions.create input_tokens, output_tokens, total_tokens, cached_input_tokens, reasoning_tokens, audio_input_tokens, audio_output_tokens, accepted_prediction_tokens, rejected_prediction_tokens Supports streaming
chat.completions.parse Same as chat.completions.create Structured output
embeddings.create input_tokens, total_tokens
images.generate input_tokens, output_tokens, total_tokens, text_input_tokens, image_input_tokens, text_output_tokens, image_output_tokens, images_generated
images.edit Same as images.generate
images.create_variation Same as images.generate
batches.create total_tokens←total, input_tokens←completed, output_tokens←failed Proxy via request_counts
batches.retrieve Same proxy
batches.list Same proxy
batches.cancel Same proxy
audio.speech.create None No usage field
audio.transcriptions.create None
audio.translations.create None
moderations.create None
threads.runs.create input_tokens, output_tokens, total_tokens
threads.runs.retrieve Same
threads.runs.modify Same
threads.runs.cancel Same (may be empty)
threads.runs.submit_tool_outputs Same
threads.runs.create_and_stream Same Supports streaming
files.create None
files.retrieve None
files.content None Returns raw bytes
files.delete None
files.list None
fine_tuning.jobs.create input_tokens, output_tokens, total_tokens If usage present
fine_tuning.jobs.retrieve Same
fine_tuning.jobs.list Same
fine_tuning.jobs.cancel Same
uploads.create None
uploads.retrieve None
uploads.complete None
uploads.cancel None
vector_stores.create None
vector_stores.retrieve None
vector_stores.list None
vector_stores.delete None

Version policy: official OpenAI Python SDK >=2.0,<3

Anthropic

Operation Status Usage Extraction Notes
messages.create input_tokens, output_tokens, total_tokens Sync only

Version policy: official Anthropic Python SDK >=0.40,<1

Install

uv add tokenome-sdk
uv add 'tokenome-sdk[openai]'

Optional extras:

uv add 'tokenome-sdk[anthropic]'
uv add 'tokenome-sdk[openai,anthropic]'
uv add 'tokenome-sdk[all]'

Quick Start

User-created OpenAI client

from openai import OpenAI
from tokenome import TokenLens

tl = TokenLens(
    api_key="tl_pk_project_...",
    project_id="proj_...",
    environment="prod",
)

client = OpenAI(api_key="sk_...")
client = tl.wrap_openai(
    client,
    route="/api/chat",
    feature="chatbot",
    user_id="user_123",
    session_id="sess_456",
    tags={"tenant_id": "tenant_abc"},
)

response = client.responses.create(
    model="gpt-4.1-mini",
    input="Hello",
)

tl.flush()
tl.close()

TokenLens-created OpenAI client

from tokenome import TokenLens

tl = TokenLens(
    api_key="tl_pk_project_...",
    project_id="proj_...",
    environment="prod",
)

client = tl.OpenAI(api_key="sk_...")
response = client.responses.create(model="gpt-4.1-mini", input="Hello")

Async OpenAI client

from openai import AsyncOpenAI
from tokenome import TokenLens

tl = TokenLens(api_key="tl_pk_project_...", project_id="proj_...")

client = AsyncOpenAI(api_key="sk_...")
client = tl.wrap_openai(client)

response = await client.responses.create(model="gpt-4.1-mini", input="Hello")

Environment Bootstrap

export TOKENLENS_API_KEY="***"
export TOKENLENS_PROJECT_ID="***"
export TOKENLENS_ENVIRONMENT="***"
export TOKENLENS_ENDPOINT="https://api.tokenome.ai/v1/events/batch"
export TOKENLENS_DURABILITY="durable_local"
export TOKENLENS_SPOOL_PATH="$HOME/.cache/tokenome/events.sqlite3"
export TOKENLENS_TAGS='{"service": "api", "environment": "prod"}'
export TOKENLENS_ENABLED="true"
export TOKENLENS_DEBUG="false"
export TOKENLENS_TIMEOUT_SECONDS="1.5"
export TOKENLENS_QUEUE_MAXSIZE="10000"
from tokenome import TokenLens

tl = TokenLens.init_from_env()
client = tl.OpenAI(api_key="sk_...")

Environment Variables

Variable Required Default Description
TOKENLENS_API_KEY Yes Project API key
TOKENLENS_PROJECT_ID No Project identifier
TOKENLENS_ENVIRONMENT No Environment label (e.g., prod, staging)
TOKENLENS_ENDPOINT No https://api.tokenome.ai/v1/events/batch Ingest endpoint
TOKENLENS_DURABILITY No durable_local durable_local, best_effort, or agent (future)
TOKENLENS_SPOOL_PATH No Platform cache dir SQLite spool file path
TOKENLENS_TAGS No JSON object merged into wrapper tags
TOKENLENS_ENABLED No true Enable/disable telemetry
TOKENLENS_DEBUG No false Debug logging
TOKENLENS_TIMEOUT_SECONDS No 1.5 HTTP request timeout
TOKENLENS_QUEUE_MAXSIZE No 10000 In-memory queue max size

Public API

tl = TokenLens(...)
client = tl.OpenAI(...)
client = tl.AsyncOpenAI(...)
client = tl.wrap_openai(client)
client = tl.wrap(client)
tl.flush()
tl.close()

Legacy helpers:

tl = TokenLens.init(...)
tl = TokenLens.init_from_env()
TokenLens.is_initialized()

Telemetry Semantics

Tokenome captures metadata, not analytics:

  • Provider name and SDK version
  • Model identifier
  • Operation name
  • Request mode: sync, stream, provider_batch
  • Token usage (input, output, cached, reasoning, audio, image)
  • Latency and status
  • Error metadata
  • Project / environment / route / feature / session context
  • Safe request/response metadata (no content)

Tokenome does not:

  • Capture prompt text by default
  • Capture completion text by default
  • Calculate billing cost in the SDK
  • Monkey-patch provider modules
  • Block provider calls on telemetry failure

Event Payload Shape

Batches are sent to POST /v1/events/batch.

Batch Envelope

{
  "batch_id": "batch_abc123",
  "sdk": {
    "language": "python",
    "version": "0.1.0"
  },
  "events": []
}

Per-Event Shape

{
  "event_id": "evt_123",
  "event_type": "request",
  "provider": "openai",
  "provider_sdk": "openai-python",
  "provider_sdk_version": "2.41.1",
  "model": "gpt-4.1-mini",
  "operation": "responses.create",
  "request_mode": "sync",
  "request_started_at": "2026-05-03T00:00:00Z",
  "response_completed_at": "2026-05-03T00:00:01Z",
  "latency_ms": 812,
  "input_tokens": 1200,
  "output_tokens": 340,
  "total_tokens": 1540,
  "cached_input_tokens": 0,
  "reasoning_tokens": 0,
  "status": "success",
  "cost_status": "final",
  "route": "/api/chat",
  "feature": "chatbot",
  "user_id_hash": "<sha256>",
  "session_id": "sess_456",
  "tags": {
    "environment": "prod"
  }
}

Batching and Delivery Behavior

Parameter Default Description
Durability durable_local SQLite spool or in-memory
Flush interval 5s Time-based flush trigger
Max events per batch 100 Count-based flush trigger
Max payload size 256 KiB Size-based split in sender
Queue max size 10000 Memory spool capacity
Request timeout 1.5s HTTP POST timeout

Delivery behavior:

  • durable_local commits events to local SQLite spool before async send
  • best_effort keeps events in memory only
  • tl.flush() forces immediate send attempt (bypasses backoff)
  • tl.close() flushes and shuts down cleanly
  • HTTP 429 respects Retry-After header
  • Transient network and 5xx errors retry with exponential backoff
  • Delivery is fail-open; app path never crashes on telemetry failure

Context Helpers

from tokenome import clear_context, set_context, set_default_context

set_default_context(tags={"service": "gateway", "environment": "prod"})
set_context(
    route="/api/chat",
    feature="chatbot",
    user_id="user_123",
    session_id="sess_456",
    tags={"request_id": "req_001"},
)

Resolution rules:

  • Runtime context overrides default context for scalar fields
  • Tags merge as default_tags → wrapper tags → runtime context tags
  • If user_id_hash is absent but user_id exists, SDK emits SHA-256 hash of user_id

Architecture Deep Dive

Overview

The SDK is organized into four layers:

  1. Public API (client.py) — TokenLens facade, env bootstrap, provider client creation
  2. Provider Wrappers (providers/) — Thin transparent wrappers around OpenAI and Anthropic clients
  3. Event Model (models.py) — TelemetryEvent dataclass, normalization, serialization
  4. Delivery Core (delivery/, spool/) — Background worker, HTTP sender, SQLite spool
flowchart TB
    subgraph L1["Public API (TokenLens)"]
        A1["wrap_openai(), wrap()"]
        A2["flush(), close()"]
        A3["init_from_env()"]
    end

    subgraph L2["Provider Wrappers"]
        B1["OpenAI: responses, chat, images,<br/>embeddings, batches, audio,<br/>moderations, threads, files,<br/>fine_tuning, uploads, vector_stores"]
        B2["Anthropic: messages.create"]
    end

    subgraph L3["TelemetryEvent (models.py)"]
        C1["Metadata extraction"]
        C2["Usage normalization"]
        C3["Safe metadata filtering"]
    end

    subgraph L4["Delivery Core"]
        D1["Spool: SQLite / Memory"]
        D2["Worker: lease → send → ack/release"]
        D3["Sender: HTTP / batch split / retry"]
    end

    L1 --> L2
    L2 --> L3
    L3 --> L4

Event Lifecycle

An event flows through the SDK in six stages:

flowchart LR
    A["Provider Call"] --> B["Wrap"]
    B --> C["Extract"]
    C --> D["Enqueue"]
    D --> E["Spool"]
    E --> F["Worker"]
    F --> G["Sender"]
    G --> H["Server"]

Stage 1: Provider Call

User calls client.responses.create(...). The wrapper intercepts the call before it reaches the provider SDK.

Stage 2: Wrap

The wrapper:

  1. Records request_started_at
  2. Calls the original provider method
  3. Records response_completed_at
  4. Computes latency_ms
  5. Extracts usage metadata from the response object
  6. Builds a TelemetryEvent with all metadata fields

Stage 3: Extract

Usage extraction is provider-specific:

  • OpenAI chat.completions: response.usage.input_tokens, output_tokens, total_tokens, cached_input_tokens, reasoning_tokens, etc.
  • OpenAI batches: No usage field. Uses request_counts.totaltotal_tokens, completedinput_tokens, failedoutput_tokens as proxy.
  • Anthropic messages: response.usage.input_tokens, output_tokens
  • Operations without usage: audio.*, files.*, uploads.*, vector_stores.*, moderations.create — return empty UsageSnapshot

Stage 4: Enqueue

wrapper.pyclient._state.enqueue(event)spool.append(event)

  • If enabled=False, event is silently dropped
  • If spool.append() raises, exception is caught and swallowed (fail-open)
  • On successful append, worker.notify_event_available() wakes the background thread

Stage 5: Spool

The spool is the durability boundary:

durable_local (SQLiteEventSpool):

  • Serializes event to JSON
  • Inserts into tokenome_event_spool table with status='pending'
  • WAL mode (journal_mode=WAL, synchronous=NORMAL)
  • Prunes expired events on every append
  • Enforces max_bytes capacity with drop_policy (drop_oldest, drop_newest, block)

best_effort (MemorySpool):

  • Stores event in _pending list
  • Drops new events when _max_size reached
  • No persistence across process restarts

Stage 6: Worker + Sender

The DeliveryWorker runs in a daemon thread named tokenome-delivery.

Worker Loop (_run()):

while not stopped:
    _loop_iteration()

Loop Iteration (_loop_iteration()):

  1. Compute sleep timeout: If timer is running, sleep until deadline; otherwise sleep indefinitely until woken.
  2. Wait on wake queue: queue.Queue(maxsize=1)notify_event_available() puts a sentinel. Manual flush() and close() also put sentinels.
  3. Lease batch: spool.lease_batch(limit=100, flush_mode=...)
    • flush_mode=True (manual flush): ignores next_attempt_at backoff
    • flush_mode=False (normal): only leases events whose next_attempt_at <= now
  4. Timer management: On first event, start interval timer (5s with 80-120% jitter). Timer does NOT reset on every batch.
  5. Early return: If batch size < 100 and timer hasn't fired and not force-flush, release events back to spool and sleep.
  6. Send: Call sender.send(events).
  7. Handle result:
    • ackedspool.mark_delivered() (deletes from SQLite)
    • retryable_idsspool.release() with exponential backoff (delay = min(2^attempt_count, 300s))
    • droppedspool.drop() (sets status='failed' or deletes)

Sender (HttpSender):

  1. Payload-size splitting: _split_batches() serializes each event and splits into sub-batches that fit within 256 KiB. Single oversized events are sent anyway (server returns 413, sender marks as dropped).
  2. HTTP POST: httpx.Client with HTTP/2, max_keepalive_connections=2, max_connections=4, timeout=1.5s.
  3. Response handling:
    • 200/202 → acked
    • 429 → retryable (respects Retry-After)
    • 500/502/503/504 → retryable
    • 400/401/403/413 → dropped (fatal client error)
    • Network errors / timeout → retryable
  4. Composite result: If a batch was split into sub-batches with mixed results, SendResult combines them.

Lease Lifecycle:

stateDiagram-v2
    [*] --> pending
    pending --> sending : lease
    sending --> delivered : success
    sending --> pending : retryable (backoff)
    sending --> failed : fatal
    failed --> [*]
    delivered --> [*]

Crash Recovery (recover_inflight()):

On spool initialization, any status='sending' rows with leased_at IS NULL OR leased_at < now-5min are reset to pending. Fresh leases (within 5 minutes) are left as sending to avoid races with a still-running worker.

Durability Modes

Mode Persistence Crash Recovery Use Case
durable_local SQLite WAL spool Yes — events survive process restart Production default
best_effort In-memory list No — events lost on crash Development, low-latency requirements
agent Future — remote agent Future Not implemented

Threading Model

sequenceDiagram
    participant MT as Main Thread
    participant BT as Background Thread (daemon)

    MT->>BT: wrap() → enqueue()
    MT->>BT: spool.append()
    MT->>BT: wake.put(None)
    BT->>BT: _wake.get()
    BT->>BT: lease_batch()
    BT->>BT: _send_batch()

    MT->>BT: flush()
    BT->>BT: force_flush=True
    BT->>BT: lease_batch(flush_mode=True)

    MT->>BT: close()
    BT->>BT: _stop.set()
    BT->>BT: final _do_flush()
    BT->>BT: thread.join()

All spool operations are protected by a threading.Lock. SQLite connection is created with check_same_thread=False to allow the lock to serialize access.

Jitter and Thundering Herd Prevention

The flush interval (5s) is multiplied by 0.8 + random.random() * 0.4 on worker startup. This means each SDK instance flushes on a slightly different cadence, preventing synchronized spikes against the ingest server.

Payload Size Enforcement

Payload size is enforced at two levels:

  1. Sender split: HttpSender._split_batches() ensures each HTTP POST body is ≤ 256 KiB. This is the hard boundary.
  2. Worker batch size: DeliveryWorker leases up to 100 events. The worker does not enforce payload size; it relies on the sender to split oversized batches.

This separation keeps the worker simple (count-based only) while ensuring the sender never violates server limits.

Retry and Backoff

Exponential backoff is computed in SQLiteEventSpool.release():

delay = min(2 ** attempt_count, 300)  # cap at 5 minutes
next_attempt_at = now + delay

The worker's lease_batch() respects next_attempt_at unless flush_mode=True (manual flush bypasses backoff).

Fail-Open Guarantees

The SDK guarantees that telemetry failures never propagate to user code:

  1. Wrapper level: Exceptions during metadata extraction are caught; the original provider response is still returned.
  2. Enqueue level: spool.append() exceptions are caught and swallowed.
  3. Worker level: Loop iteration exceptions are caught, logged at debug, and the worker sleeps 0.1s before retrying.
  4. Sender level: All HTTP/network exceptions are caught and converted to retryable or dropped results.

Configuration Boundaries

Batch size (100), flush interval (5s), and payload cap (256 KiB) are not user-configurable. They are part of the SDK/server contract. Server-side enforcement is the real trust boundary; SDK hardcodes values as cooperation, not security.

User-configurable parameters:

  • api_key, endpoint, project_id, environment
  • enabled, debug
  • timeout_seconds, queue_maxsize
  • durability, spool_path, max_spool_bytes, max_spool_age_days
  • drop_policy, default_tags

Unsupported Patterns

Not supported in this SDK version:

  • OpenAI 1.x
  • Old module-level openai==0.28 APIs
  • Monkey patch instrumentation
  • Framework middleware as primary integration path
  • Automatic prompt/content capture
  • Arbitrary provider clients without supported wrapper shape

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenome-0.1.0.tar.gz (127.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenome-0.1.0-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file tokenome-0.1.0.tar.gz.

File metadata

  • Download URL: tokenome-0.1.0.tar.gz
  • Upload date:
  • Size: 127.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tokenome-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5f9a4c63be110b6a3be5f8b0002984e0c8959f8e8b34a171a874054f165cc4aa
MD5 d5a935916e6ad160cf01705de30a70a1
BLAKE2b-256 47bf62cbf82809c0ff25ede31f6d8608bb1c7b50eac34f42d66ebbd30ef45999

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenome-0.1.0.tar.gz:

Publisher: publish.yml on khodex-rei/tokenome-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tokenome-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tokenome-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tokenome-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 94afa3ce2a30f8636c5b1643e107895c61e96605abb9a5f9572635db4c4c8c8b
MD5 2b3280372f7aa0e472a2ae6b42f957ec
BLAKE2b-256 221c66cade4bc7ae1aaa2a974682a873858cbe0b05d32506fa2406307437de3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenome-0.1.0-py3-none-any.whl:

Publisher: publish.yml on khodex-rei/tokenome-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page