Tokenome wrapper-client SDK for metadata-only LLM telemetry

These details have not been verified by PyPI

Project description

Tokenome Python SDK

Tokenome SDK is a wrapper-client telemetry SDK for LLM applications. It captures metadata (model, tokens, latency, status) without capturing prompt or response content by default.

Design Philosophy

Wrapper-client only: No monkey patching, no framework middleware as primary path
Metadata-only telemetry: No prompt or response content capture by default
Fail-open: Telemetry failures never crash user application code
Durable by default: durable_local uses SQLite spool; best_effort is opt-out memory-only
Transparent: Same request, same response, same exception, same stream — plus telemetry

Supported Providers

OpenAI

Operation	Status	Usage Extraction	Notes
`responses.create`	✅	`input_tokens`, `output_tokens`, `total_tokens`, `cached_input_tokens`, `reasoning_tokens`	Supports streaming
`responses.parse`	✅	Same as `responses.create`	Structured output
`chat.completions.create`	✅	`input_tokens`, `output_tokens`, `total_tokens`, `cached_input_tokens`, `reasoning_tokens`, `audio_input_tokens`, `audio_output_tokens`, `accepted_prediction_tokens`, `rejected_prediction_tokens`	Supports streaming
`chat.completions.parse`	✅	Same as `chat.completions.create`	Structured output
`embeddings.create`	✅	`input_tokens`, `total_tokens`
`images.generate`	✅	`input_tokens`, `output_tokens`, `total_tokens`, `text_input_tokens`, `image_input_tokens`, `text_output_tokens`, `image_output_tokens`, `images_generated`
`images.edit`	✅	Same as `images.generate`
`images.create_variation`	✅	Same as `images.generate`
`batches.create`	✅	`total_tokens`←total, `input_tokens`←completed, `output_tokens`←failed	Proxy via `request_counts`
`batches.retrieve`	✅	Same proxy
`batches.list`	✅	Same proxy
`batches.cancel`	✅	Same proxy
`audio.speech.create`	✅	None	No usage field
`audio.transcriptions.create`	✅	None
`audio.translations.create`	✅	None
`moderations.create`	✅	None
`threads.runs.create`	✅	`input_tokens`, `output_tokens`, `total_tokens`
`threads.runs.retrieve`	✅	Same
`threads.runs.modify`	✅	Same
`threads.runs.cancel`	✅	Same (may be empty)
`threads.runs.submit_tool_outputs`	✅	Same
`threads.runs.create_and_stream`	✅	Same	Supports streaming
`files.create`	✅	None
`files.retrieve`	✅	None
`files.content`	✅	None	Returns raw bytes
`files.delete`	✅	None
`files.list`	✅	None
`fine_tuning.jobs.create`	✅	`input_tokens`, `output_tokens`, `total_tokens`	If usage present
`fine_tuning.jobs.retrieve`	✅	Same
`fine_tuning.jobs.list`	✅	Same
`fine_tuning.jobs.cancel`	✅	Same
`uploads.create`	✅	None
`uploads.retrieve`	✅	None
`uploads.complete`	✅	None
`uploads.cancel`	✅	None
`vector_stores.create`	✅	None
`vector_stores.retrieve`	✅	None
`vector_stores.list`	✅	None
`vector_stores.delete`	✅	None

Version policy: official OpenAI Python SDK >=2.0,<3

Anthropic

Operation	Status	Usage Extraction	Notes
`messages.create`	✅	`input_tokens`, `output_tokens`, `total_tokens`	Sync only

Version policy: official Anthropic Python SDK >=0.40,<1

Install

uv add tokenome-sdk
uv add 'tokenome-sdk[openai]'

Optional extras:

uv add 'tokenome-sdk[anthropic]'
uv add 'tokenome-sdk[openai,anthropic]'
uv add 'tokenome-sdk[all]'

Quick Start

User-created OpenAI client

from openai import OpenAI
from tokenome import TokenLens

tl = TokenLens(
    api_key="tl_pk_project_...",
    project_id="proj_...",
    environment="prod",
)

client = OpenAI(api_key="sk_...")
client = tl.wrap_openai(
    client,
    route="/api/chat",
    feature="chatbot",
    user_id="user_123",
    session_id="sess_456",
    tags={"tenant_id": "tenant_abc"},
)

response = client.responses.create(
    model="gpt-4.1-mini",
    input="Hello",
)

tl.flush()
tl.close()

TokenLens-created OpenAI client

from tokenome import TokenLens

tl = TokenLens(
    api_key="tl_pk_project_...",
    project_id="proj_...",
    environment="prod",
)

client = tl.OpenAI(api_key="sk_...")
response = client.responses.create(model="gpt-4.1-mini", input="Hello")

Async OpenAI client

from openai import AsyncOpenAI
from tokenome import TokenLens

tl = TokenLens(api_key="tl_pk_project_...", project_id="proj_...")

client = AsyncOpenAI(api_key="sk_...")
client = tl.wrap_openai(client)

response = await client.responses.create(model="gpt-4.1-mini", input="Hello")

Environment Bootstrap

export TOKENLENS_API_KEY="***"
export TOKENLENS_PROJECT_ID="***"
export TOKENLENS_ENVIRONMENT="***"
export TOKENLENS_ENDPOINT="https://api.tokenome.ai/v1/events/batch"
export TOKENLENS_DURABILITY="durable_local"
export TOKENLENS_SPOOL_PATH="$HOME/.cache/tokenome/events.sqlite3"
export TOKENLENS_TAGS='{"service": "api", "environment": "prod"}'
export TOKENLENS_ENABLED="true"
export TOKENLENS_DEBUG="false"
export TOKENLENS_TIMEOUT_SECONDS="1.5"
export TOKENLENS_QUEUE_MAXSIZE="10000"

from tokenome import TokenLens

tl = TokenLens.init_from_env()
client = tl.OpenAI(api_key="sk_...")

Environment Variables

Variable	Required	Default	Description
`TOKENLENS_API_KEY`	Yes	—	Project API key
`TOKENLENS_PROJECT_ID`	No	—	Project identifier
`TOKENLENS_ENVIRONMENT`	No	—	Environment label (e.g., `prod`, `staging`)
`TOKENLENS_ENDPOINT`	No	`https://api.tokenome.ai/v1/events/batch`	Ingest endpoint
`TOKENLENS_DURABILITY`	No	`durable_local`	`durable_local`, `best_effort`, or `agent` (future)
`TOKENLENS_SPOOL_PATH`	No	Platform cache dir	SQLite spool file path
`TOKENLENS_TAGS`	No	—	JSON object merged into wrapper tags
`TOKENLENS_ENABLED`	No	`true`	Enable/disable telemetry
`TOKENLENS_DEBUG`	No	`false`	Debug logging
`TOKENLENS_TIMEOUT_SECONDS`	No	`1.5`	HTTP request timeout
`TOKENLENS_QUEUE_MAXSIZE`	No	`10000`	In-memory queue max size

Public API

tl = TokenLens(...)
client = tl.OpenAI(...)
client = tl.AsyncOpenAI(...)
client = tl.wrap_openai(client)
client = tl.wrap(client)
tl.flush()
tl.close()

Legacy helpers:

tl = TokenLens.init(...)
tl = TokenLens.init_from_env()
TokenLens.is_initialized()

Telemetry Semantics

Tokenome captures metadata, not analytics:

Provider name and SDK version
Model identifier
Operation name
Request mode: sync, stream, provider_batch
Token usage (input, output, cached, reasoning, audio, image)
Latency and status
Error metadata
Project / environment / route / feature / session context
Safe request/response metadata (no content)

Tokenome does not:

Capture prompt text by default
Capture completion text by default
Calculate billing cost in the SDK
Monkey-patch provider modules
Block provider calls on telemetry failure

Event Payload Shape

Batches are sent to POST /v1/events/batch.

Batch Envelope

{
  "batch_id": "batch_abc123",
  "sdk": {
    "language": "python",
    "version": "0.1.0"
  },
  "events": []
}

Per-Event Shape

{
  "event_id": "evt_123",
  "event_type": "request",
  "provider": "openai",
  "provider_sdk": "openai-python",
  "provider_sdk_version": "2.41.1",
  "model": "gpt-4.1-mini",
  "operation": "responses.create",
  "request_mode": "sync",
  "request_started_at": "2026-05-03T00:00:00Z",
  "response_completed_at": "2026-05-03T00:00:01Z",
  "latency_ms": 812,
  "input_tokens": 1200,
  "output_tokens": 340,
  "total_tokens": 1540,
  "cached_input_tokens": 0,
  "reasoning_tokens": 0,
  "status": "success",
  "cost_status": "final",
  "route": "/api/chat",
  "feature": "chatbot",
  "user_id_hash": "<sha256>",
  "session_id": "sess_456",
  "tags": {
    "environment": "prod"
  }
}

Batching and Delivery Behavior

Parameter	Default	Description
Durability	`durable_local`	SQLite spool or in-memory
Flush interval	`5s`	Time-based flush trigger
Max events per batch	`100`	Count-based flush trigger
Max payload size	`256 KiB`	Size-based split in sender
Queue max size	`10000`	Memory spool capacity
Request timeout	`1.5s`	HTTP POST timeout

Delivery behavior:

durable_local commits events to local SQLite spool before async send
best_effort keeps events in memory only
tl.flush() forces immediate send attempt (bypasses backoff)
tl.close() flushes and shuts down cleanly
HTTP 429 respects Retry-After header
Transient network and 5xx errors retry with exponential backoff
Delivery is fail-open; app path never crashes on telemetry failure

Context Helpers

from tokenome import clear_context, set_context, set_default_context

set_default_context(tags={"service": "gateway", "environment": "prod"})
set_context(
    route="/api/chat",
    feature="chatbot",
    user_id="user_123",
    session_id="sess_456",
    tags={"request_id": "req_001"},
)

Resolution rules:

Runtime context overrides default context for scalar fields
Tags merge as default_tags → wrapper tags → runtime context tags
If user_id_hash is absent but user_id exists, SDK emits SHA-256 hash of user_id

Architecture Deep Dive

Overview

The SDK is organized into four layers:

Public API (client.py) — TokenLens facade, env bootstrap, provider client creation
Provider Wrappers (providers/) — Thin transparent wrappers around OpenAI and Anthropic clients
Event Model (models.py) — TelemetryEvent dataclass, normalization, serialization
Delivery Core (delivery/, spool/) — Background worker, HTTP sender, SQLite spool

flowchart TB
    subgraph L1["Public API (TokenLens)"]
        A1["wrap_openai(), wrap()"]
        A2["flush(), close()"]
        A3["init_from_env()"]
    end

    subgraph L2["Provider Wrappers"]
        B1["OpenAI: responses, chat, images,<br/>embeddings, batches, audio,<br/>moderations, threads, files,<br/>fine_tuning, uploads, vector_stores"]
        B2["Anthropic: messages.create"]
    end

    subgraph L3["TelemetryEvent (models.py)"]
        C1["Metadata extraction"]
        C2["Usage normalization"]
        C3["Safe metadata filtering"]
    end

    subgraph L4["Delivery Core"]
        D1["Spool: SQLite / Memory"]
        D2["Worker: lease → send → ack/release"]
        D3["Sender: HTTP / batch split / retry"]
    end

    L1 --> L2
    L2 --> L3
    L3 --> L4

Event Lifecycle

An event flows through the SDK in six stages:

flowchart LR
    A["Provider Call"] --> B["Wrap"]
    B --> C["Extract"]
    C --> D["Enqueue"]
    D --> E["Spool"]
    E --> F["Worker"]
    F --> G["Sender"]
    G --> H["Server"]

Stage 1: Provider Call

User calls client.responses.create(...). The wrapper intercepts the call before it reaches the provider SDK.

Stage 2: Wrap

The wrapper:

Records request_started_at
Calls the original provider method
Records response_completed_at
Computes latency_ms
Extracts usage metadata from the response object
Builds a TelemetryEvent with all metadata fields

Stage 3: Extract

Usage extraction is provider-specific:

OpenAI chat.completions: response.usage.input_tokens, output_tokens, total_tokens, cached_input_tokens, reasoning_tokens, etc.
OpenAI batches: No usage field. Uses request_counts.total → total_tokens, completed → input_tokens, failed → output_tokens as proxy.
Anthropic messages: response.usage.input_tokens, output_tokens
Operations without usage: audio.*, files.*, uploads.*, vector_stores.*, moderations.create — return empty UsageSnapshot

Stage 4: Enqueue

wrapper.py → client._state.enqueue(event) → spool.append(event)

If enabled=False, event is silently dropped
If spool.append() raises, exception is caught and swallowed (fail-open)
On successful append, worker.notify_event_available() wakes the background thread

Stage 5: Spool

The spool is the durability boundary:

durable_local (SQLiteEventSpool):

Serializes event to JSON
Inserts into tokenome_event_spool table with status='pending'
WAL mode (journal_mode=WAL, synchronous=NORMAL)
Prunes expired events on every append
Enforces max_bytes capacity with drop_policy (drop_oldest, drop_newest, block)

best_effort (MemorySpool):

Stores event in _pending list
Drops new events when _max_size reached
No persistence across process restarts

Stage 6: Worker + Sender

The DeliveryWorker runs in a daemon thread named tokenome-delivery.

Worker Loop (_run()):

while not stopped:
    _loop_iteration()

Loop Iteration (_loop_iteration()):

Compute sleep timeout: If timer is running, sleep until deadline; otherwise sleep indefinitely until woken.
Wait on wake queue: queue.Queue(maxsize=1) — notify_event_available() puts a sentinel. Manual flush() and close() also put sentinels.
Lease batch: spool.lease_batch(limit=100, flush_mode=...)
- flush_mode=True (manual flush): ignores next_attempt_at backoff
- flush_mode=False (normal): only leases events whose next_attempt_at <= now
Timer management: On first event, start interval timer (5s with 80-120% jitter). Timer does NOT reset on every batch.
Early return: If batch size < 100 and timer hasn't fired and not force-flush, release events back to spool and sleep.
Send: Call sender.send(events).
Handle result:
- acked → spool.mark_delivered() (deletes from SQLite)
- retryable_ids → spool.release() with exponential backoff (delay = min(2^attempt_count, 300s))
- dropped → spool.drop() (sets status='failed' or deletes)

Sender (HttpSender):

Payload-size splitting: _split_batches() serializes each event and splits into sub-batches that fit within 256 KiB. Single oversized events are sent anyway (server returns 413, sender marks as dropped).
HTTP POST: httpx.Client with HTTP/2, max_keepalive_connections=2, max_connections=4, timeout=1.5s.
Response handling:
- 200/202 → acked
- 429 → retryable (respects Retry-After)
- 500/502/503/504 → retryable
- 400/401/403/413 → dropped (fatal client error)
- Network errors / timeout → retryable
Composite result: If a batch was split into sub-batches with mixed results, SendResult combines them.

Lease Lifecycle:

stateDiagram-v2
    [*] --> pending
    pending --> sending : lease
    sending --> delivered : success
    sending --> pending : retryable (backoff)
    sending --> failed : fatal
    failed --> [*]
    delivered --> [*]

Crash Recovery (recover_inflight()):

On spool initialization, any status='sending' rows with leased_at IS NULL OR leased_at < now-5min are reset to pending. Fresh leases (within 5 minutes) are left as sending to avoid races with a still-running worker.

Durability Modes

Mode	Persistence	Crash Recovery	Use Case
`durable_local`	SQLite WAL spool	Yes — events survive process restart	Production default
`best_effort`	In-memory list	No — events lost on crash	Development, low-latency requirements
`agent`	Future — remote agent	Future	Not implemented

Threading Model

sequenceDiagram
    participant MT as Main Thread
    participant BT as Background Thread (daemon)

    MT->>BT: wrap() → enqueue()
    MT->>BT: spool.append()
    MT->>BT: wake.put(None)
    BT->>BT: _wake.get()
    BT->>BT: lease_batch()
    BT->>BT: _send_batch()

    MT->>BT: flush()
    BT->>BT: force_flush=True
    BT->>BT: lease_batch(flush_mode=True)

    MT->>BT: close()
    BT->>BT: _stop.set()
    BT->>BT: final _do_flush()
    BT->>BT: thread.join()

All spool operations are protected by a threading.Lock. SQLite connection is created with check_same_thread=False to allow the lock to serialize access.

Jitter and Thundering Herd Prevention

The flush interval (5s) is multiplied by 0.8 + random.random() * 0.4 on worker startup. This means each SDK instance flushes on a slightly different cadence, preventing synchronized spikes against the ingest server.

Payload Size Enforcement

Payload size is enforced at two levels:

Sender split: HttpSender._split_batches() ensures each HTTP POST body is ≤ 256 KiB. This is the hard boundary.
Worker batch size: DeliveryWorker leases up to 100 events. The worker does not enforce payload size; it relies on the sender to split oversized batches.

This separation keeps the worker simple (count-based only) while ensuring the sender never violates server limits.

Retry and Backoff

Exponential backoff is computed in SQLiteEventSpool.release():

delay = min(2 ** attempt_count, 300)  # cap at 5 minutes
next_attempt_at = now + delay

The worker's lease_batch() respects next_attempt_at unless flush_mode=True (manual flush bypasses backoff).

Fail-Open Guarantees

The SDK guarantees that telemetry failures never propagate to user code:

Wrapper level: Exceptions during metadata extraction are caught; the original provider response is still returned.
Enqueue level: spool.append() exceptions are caught and swallowed.
Worker level: Loop iteration exceptions are caught, logged at debug, and the worker sleeps 0.1s before retrying.
Sender level: All HTTP/network exceptions are caught and converted to retryable or dropped results.

Configuration Boundaries

Batch size (100), flush interval (5s), and payload cap (256 KiB) are not user-configurable. They are part of the SDK/server contract. Server-side enforcement is the real trust boundary; SDK hardcodes values as cooperation, not security.

User-configurable parameters:

api_key, endpoint, project_id, environment
enabled, debug
timeout_seconds, queue_maxsize
durability, spool_path, max_spool_bytes, max_spool_age_days
drop_policy, default_tags

Unsupported Patterns

Not supported in this SDK version:

OpenAI 1.x
Old module-level openai==0.28 APIs
Monkey patch instrumentation
Framework middleware as primary integration path
Automatic prompt/content capture
Arbitrary provider clients without supported wrapper shape

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenome-0.1.0.tar.gz (127.7 kB view details)

Uploaded Jun 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tokenome-0.1.0-py3-none-any.whl (35.3 kB view details)

Uploaded Jun 23, 2026 Python 3

File details

Details for the file tokenome-0.1.0.tar.gz.

File metadata

Download URL: tokenome-0.1.0.tar.gz
Upload date: Jun 23, 2026
Size: 127.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tokenome-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5f9a4c63be110b6a3be5f8b0002984e0c8959f8e8b34a171a874054f165cc4aa`
MD5	`d5a935916e6ad160cf01705de30a70a1`
BLAKE2b-256	`47bf62cbf82809c0ff25ede31f6d8608bb1c7b50eac34f42d66ebbd30ef45999`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenome-0.1.0.tar.gz:

Publisher: publish.yml on khodex-rei/tokenome-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tokenome-0.1.0.tar.gz
- Subject digest: 5f9a4c63be110b6a3be5f8b0002984e0c8959f8e8b34a171a874054f165cc4aa
- Sigstore transparency entry: 1928825044
- Sigstore integration time: Jun 23, 2026
Source repository:
- Permalink: khodex-rei/tokenome-sdk@52d1c3d5684421b4e2a89a57e09b43be0fe2995d
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/khodex-rei
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@52d1c3d5684421b4e2a89a57e09b43be0fe2995d
- Trigger Event: release

File details

Details for the file tokenome-0.1.0-py3-none-any.whl.

File metadata

Download URL: tokenome-0.1.0-py3-none-any.whl
Upload date: Jun 23, 2026
Size: 35.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tokenome-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`94afa3ce2a30f8636c5b1643e107895c61e96605abb9a5f9572635db4c4c8c8b`
MD5	`2b3280372f7aa0e472a2ae6b42f957ec`
BLAKE2b-256	`221c66cade4bc7ae1aaa2a974682a873858cbe0b05d32506fa2406307437de3e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tokenome-0.1.0-py3-none-any.whl:

Publisher: publish.yml on khodex-rei/tokenome-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tokenome-0.1.0-py3-none-any.whl
- Subject digest: 94afa3ce2a30f8636c5b1643e107895c61e96605abb9a5f9572635db4c4c8c8b
- Sigstore transparency entry: 1928825194
- Sigstore integration time: Jun 23, 2026
Source repository:
- Permalink: khodex-rei/tokenome-sdk@52d1c3d5684421b4e2a89a57e09b43be0fe2995d
- Branch / Tag: refs/tags/v0.0.1
- Owner: https://github.com/khodex-rei
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@52d1c3d5684421b4e2a89a57e09b43be0fe2995d
- Trigger Event: release

tokenome 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Tokenome Python SDK

Design Philosophy

Supported Providers

OpenAI

Anthropic

Install

Quick Start

User-created OpenAI client

TokenLens-created OpenAI client

Async OpenAI client

Environment Bootstrap

Environment Variables

Public API

Telemetry Semantics

Event Payload Shape

Batch Envelope

Per-Event Shape

Batching and Delivery Behavior

Context Helpers

Architecture Deep Dive

Overview

Event Lifecycle

Stage 1: Provider Call

Stage 2: Wrap

Stage 3: Extract

Stage 4: Enqueue

Stage 5: Spool

Stage 6: Worker + Sender

Durability Modes

Threading Model

Jitter and Thundering Herd Prevention

Payload Size Enforcement

Retry and Backoff

Fail-Open Guarantees

Configuration Boundaries

Unsupported Patterns

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance