Tokenome wrapper-client SDK for metadata-only LLM telemetry
Project description
Tokenome Python SDK
Tokenome SDK is a wrapper-client telemetry SDK for LLM applications. It captures metadata (model, tokens, latency, status) without capturing prompt or response content by default.
Design Philosophy
- Wrapper-client only: No monkey patching, no framework middleware as primary path
- Metadata-only telemetry: No prompt or response content capture by default
- Fail-open: Telemetry failures never crash user application code
- Durable by default:
durable_localuses SQLite spool;best_effortis opt-out memory-only - Transparent: Same request, same response, same exception, same stream — plus telemetry
Supported Providers
OpenAI
| Operation | Status | Usage Extraction | Notes |
|---|---|---|---|
responses.create |
✅ | input_tokens, output_tokens, total_tokens, cached_input_tokens, reasoning_tokens |
Supports streaming |
responses.parse |
✅ | Same as responses.create |
Structured output |
chat.completions.create |
✅ | input_tokens, output_tokens, total_tokens, cached_input_tokens, reasoning_tokens, audio_input_tokens, audio_output_tokens, accepted_prediction_tokens, rejected_prediction_tokens |
Supports streaming |
chat.completions.parse |
✅ | Same as chat.completions.create |
Structured output |
embeddings.create |
✅ | input_tokens, total_tokens |
|
images.generate |
✅ | input_tokens, output_tokens, total_tokens, text_input_tokens, image_input_tokens, text_output_tokens, image_output_tokens, images_generated |
|
images.edit |
✅ | Same as images.generate |
|
images.create_variation |
✅ | Same as images.generate |
|
batches.create |
✅ | total_tokens←total, input_tokens←completed, output_tokens←failed |
Proxy via request_counts |
batches.retrieve |
✅ | Same proxy | |
batches.list |
✅ | Same proxy | |
batches.cancel |
✅ | Same proxy | |
audio.speech.create |
✅ | None | No usage field |
audio.transcriptions.create |
✅ | None | |
audio.translations.create |
✅ | None | |
moderations.create |
✅ | None | |
threads.runs.create |
✅ | input_tokens, output_tokens, total_tokens |
|
threads.runs.retrieve |
✅ | Same | |
threads.runs.modify |
✅ | Same | |
threads.runs.cancel |
✅ | Same (may be empty) | |
threads.runs.submit_tool_outputs |
✅ | Same | |
threads.runs.create_and_stream |
✅ | Same | Supports streaming |
files.create |
✅ | None | |
files.retrieve |
✅ | None | |
files.content |
✅ | None | Returns raw bytes |
files.delete |
✅ | None | |
files.list |
✅ | None | |
fine_tuning.jobs.create |
✅ | input_tokens, output_tokens, total_tokens |
If usage present |
fine_tuning.jobs.retrieve |
✅ | Same | |
fine_tuning.jobs.list |
✅ | Same | |
fine_tuning.jobs.cancel |
✅ | Same | |
uploads.create |
✅ | None | |
uploads.retrieve |
✅ | None | |
uploads.complete |
✅ | None | |
uploads.cancel |
✅ | None | |
vector_stores.create |
✅ | None | |
vector_stores.retrieve |
✅ | None | |
vector_stores.list |
✅ | None | |
vector_stores.delete |
✅ | None |
Version policy: official OpenAI Python SDK >=2.0,<3
Anthropic
| Operation | Status | Usage Extraction | Notes |
|---|---|---|---|
messages.create |
✅ | input_tokens, output_tokens, total_tokens |
Sync only |
Version policy: official Anthropic Python SDK >=0.40,<1
Install
uv add tokenome-sdk
uv add 'tokenome-sdk[openai]'
Optional extras:
uv add 'tokenome-sdk[anthropic]'
uv add 'tokenome-sdk[openai,anthropic]'
uv add 'tokenome-sdk[all]'
Quick Start
User-created OpenAI client
from openai import OpenAI
from tokenome import TokenLens
tl = TokenLens(
api_key="tl_pk_project_...",
project_id="proj_...",
environment="prod",
)
client = OpenAI(api_key="sk_...")
client = tl.wrap_openai(
client,
route="/api/chat",
feature="chatbot",
user_id="user_123",
session_id="sess_456",
tags={"tenant_id": "tenant_abc"},
)
response = client.responses.create(
model="gpt-4.1-mini",
input="Hello",
)
tl.flush()
tl.close()
TokenLens-created OpenAI client
from tokenome import TokenLens
tl = TokenLens(
api_key="tl_pk_project_...",
project_id="proj_...",
environment="prod",
)
client = tl.OpenAI(api_key="sk_...")
response = client.responses.create(model="gpt-4.1-mini", input="Hello")
Async OpenAI client
from openai import AsyncOpenAI
from tokenome import TokenLens
tl = TokenLens(api_key="tl_pk_project_...", project_id="proj_...")
client = AsyncOpenAI(api_key="sk_...")
client = tl.wrap_openai(client)
response = await client.responses.create(model="gpt-4.1-mini", input="Hello")
Environment Bootstrap
export TOKENLENS_API_KEY="***"
export TOKENLENS_PROJECT_ID="***"
export TOKENLENS_ENVIRONMENT="***"
export TOKENLENS_ENDPOINT="https://api.tokenome.ai/v1/events/batch"
export TOKENLENS_DURABILITY="durable_local"
export TOKENLENS_SPOOL_PATH="$HOME/.cache/tokenome/events.sqlite3"
export TOKENLENS_TAGS='{"service": "api", "environment": "prod"}'
export TOKENLENS_ENABLED="true"
export TOKENLENS_DEBUG="false"
export TOKENLENS_TIMEOUT_SECONDS="1.5"
export TOKENLENS_QUEUE_MAXSIZE="10000"
from tokenome import TokenLens
tl = TokenLens.init_from_env()
client = tl.OpenAI(api_key="sk_...")
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
TOKENLENS_API_KEY |
Yes | — | Project API key |
TOKENLENS_PROJECT_ID |
No | — | Project identifier |
TOKENLENS_ENVIRONMENT |
No | — | Environment label (e.g., prod, staging) |
TOKENLENS_ENDPOINT |
No | https://api.tokenome.ai/v1/events/batch |
Ingest endpoint |
TOKENLENS_DURABILITY |
No | durable_local |
durable_local, best_effort, or agent (future) |
TOKENLENS_SPOOL_PATH |
No | Platform cache dir | SQLite spool file path |
TOKENLENS_TAGS |
No | — | JSON object merged into wrapper tags |
TOKENLENS_ENABLED |
No | true |
Enable/disable telemetry |
TOKENLENS_DEBUG |
No | false |
Debug logging |
TOKENLENS_TIMEOUT_SECONDS |
No | 1.5 |
HTTP request timeout |
TOKENLENS_QUEUE_MAXSIZE |
No | 10000 |
In-memory queue max size |
Public API
tl = TokenLens(...)
client = tl.OpenAI(...)
client = tl.AsyncOpenAI(...)
client = tl.wrap_openai(client)
client = tl.wrap(client)
tl.flush()
tl.close()
Legacy helpers:
tl = TokenLens.init(...)
tl = TokenLens.init_from_env()
TokenLens.is_initialized()
Telemetry Semantics
Tokenome captures metadata, not analytics:
- Provider name and SDK version
- Model identifier
- Operation name
- Request mode:
sync,stream,provider_batch - Token usage (input, output, cached, reasoning, audio, image)
- Latency and status
- Error metadata
- Project / environment / route / feature / session context
- Safe request/response metadata (no content)
Tokenome does not:
- Capture prompt text by default
- Capture completion text by default
- Calculate billing cost in the SDK
- Monkey-patch provider modules
- Block provider calls on telemetry failure
Event Payload Shape
Batches are sent to POST /v1/events/batch.
Batch Envelope
{
"batch_id": "batch_abc123",
"sdk": {
"language": "python",
"version": "0.1.0"
},
"events": []
}
Per-Event Shape
{
"event_id": "evt_123",
"event_type": "request",
"provider": "openai",
"provider_sdk": "openai-python",
"provider_sdk_version": "2.41.1",
"model": "gpt-4.1-mini",
"operation": "responses.create",
"request_mode": "sync",
"request_started_at": "2026-05-03T00:00:00Z",
"response_completed_at": "2026-05-03T00:00:01Z",
"latency_ms": 812,
"input_tokens": 1200,
"output_tokens": 340,
"total_tokens": 1540,
"cached_input_tokens": 0,
"reasoning_tokens": 0,
"status": "success",
"cost_status": "final",
"route": "/api/chat",
"feature": "chatbot",
"user_id_hash": "<sha256>",
"session_id": "sess_456",
"tags": {
"environment": "prod"
}
}
Batching and Delivery Behavior
| Parameter | Default | Description |
|---|---|---|
| Durability | durable_local |
SQLite spool or in-memory |
| Flush interval | 5s |
Time-based flush trigger |
| Max events per batch | 100 |
Count-based flush trigger |
| Max payload size | 256 KiB |
Size-based split in sender |
| Queue max size | 10000 |
Memory spool capacity |
| Request timeout | 1.5s |
HTTP POST timeout |
Delivery behavior:
durable_localcommits events to local SQLite spool before async sendbest_effortkeeps events in memory onlytl.flush()forces immediate send attempt (bypasses backoff)tl.close()flushes and shuts down cleanly- HTTP
429respectsRetry-Afterheader - Transient network and
5xxerrors retry with exponential backoff - Delivery is fail-open; app path never crashes on telemetry failure
Context Helpers
from tokenome import clear_context, set_context, set_default_context
set_default_context(tags={"service": "gateway", "environment": "prod"})
set_context(
route="/api/chat",
feature="chatbot",
user_id="user_123",
session_id="sess_456",
tags={"request_id": "req_001"},
)
Resolution rules:
- Runtime context overrides default context for scalar fields
- Tags merge as
default_tags → wrapper tags → runtime context tags - If
user_id_hashis absent butuser_idexists, SDK emits SHA-256 hash ofuser_id
Architecture Deep Dive
Overview
The SDK is organized into four layers:
- Public API (
client.py) —TokenLensfacade, env bootstrap, provider client creation - Provider Wrappers (
providers/) — Thin transparent wrappers around OpenAI and Anthropic clients - Event Model (
models.py) —TelemetryEventdataclass, normalization, serialization - Delivery Core (
delivery/,spool/) — Background worker, HTTP sender, SQLite spool
flowchart TB
subgraph L1["Public API (TokenLens)"]
A1["wrap_openai(), wrap()"]
A2["flush(), close()"]
A3["init_from_env()"]
end
subgraph L2["Provider Wrappers"]
B1["OpenAI: responses, chat, images,<br/>embeddings, batches, audio,<br/>moderations, threads, files,<br/>fine_tuning, uploads, vector_stores"]
B2["Anthropic: messages.create"]
end
subgraph L3["TelemetryEvent (models.py)"]
C1["Metadata extraction"]
C2["Usage normalization"]
C3["Safe metadata filtering"]
end
subgraph L4["Delivery Core"]
D1["Spool: SQLite / Memory"]
D2["Worker: lease → send → ack/release"]
D3["Sender: HTTP / batch split / retry"]
end
L1 --> L2
L2 --> L3
L3 --> L4
Event Lifecycle
An event flows through the SDK in six stages:
flowchart LR
A["Provider Call"] --> B["Wrap"]
B --> C["Extract"]
C --> D["Enqueue"]
D --> E["Spool"]
E --> F["Worker"]
F --> G["Sender"]
G --> H["Server"]
Stage 1: Provider Call
User calls client.responses.create(...). The wrapper intercepts the call before it reaches the provider SDK.
Stage 2: Wrap
The wrapper:
- Records
request_started_at - Calls the original provider method
- Records
response_completed_at - Computes
latency_ms - Extracts usage metadata from the response object
- Builds a
TelemetryEventwith all metadata fields
Stage 3: Extract
Usage extraction is provider-specific:
- OpenAI chat.completions:
response.usage.input_tokens,output_tokens,total_tokens,cached_input_tokens,reasoning_tokens, etc. - OpenAI batches: No
usagefield. Usesrequest_counts.total→total_tokens,completed→input_tokens,failed→output_tokensas proxy. - Anthropic messages:
response.usage.input_tokens,output_tokens - Operations without usage:
audio.*,files.*,uploads.*,vector_stores.*,moderations.create— return emptyUsageSnapshot
Stage 4: Enqueue
wrapper.py → client._state.enqueue(event) → spool.append(event)
- If
enabled=False, event is silently dropped - If
spool.append()raises, exception is caught and swallowed (fail-open) - On successful append,
worker.notify_event_available()wakes the background thread
Stage 5: Spool
The spool is the durability boundary:
durable_local (SQLiteEventSpool):
- Serializes event to JSON
- Inserts into
tokenome_event_spooltable withstatus='pending' - WAL mode (
journal_mode=WAL,synchronous=NORMAL) - Prunes expired events on every append
- Enforces
max_bytescapacity withdrop_policy(drop_oldest,drop_newest,block)
best_effort (MemorySpool):
- Stores event in
_pendinglist - Drops new events when
_max_sizereached - No persistence across process restarts
Stage 6: Worker + Sender
The DeliveryWorker runs in a daemon thread named tokenome-delivery.
Worker Loop (_run()):
while not stopped:
_loop_iteration()
Loop Iteration (_loop_iteration()):
- Compute sleep timeout: If timer is running, sleep until deadline; otherwise sleep indefinitely until woken.
- Wait on wake queue:
queue.Queue(maxsize=1)—notify_event_available()puts a sentinel. Manualflush()andclose()also put sentinels. - Lease batch:
spool.lease_batch(limit=100, flush_mode=...)flush_mode=True(manual flush): ignoresnext_attempt_atbackoffflush_mode=False(normal): only leases events whosenext_attempt_at <= now
- Timer management: On first event, start interval timer (
5swith 80-120% jitter). Timer does NOT reset on every batch. - Early return: If batch size < 100 and timer hasn't fired and not force-flush, release events back to spool and sleep.
- Send: Call
sender.send(events). - Handle result:
acked→spool.mark_delivered()(deletes from SQLite)retryable_ids→spool.release()with exponential backoff (delay = min(2^attempt_count, 300s))dropped→spool.drop()(setsstatus='failed'or deletes)
Sender (HttpSender):
- Payload-size splitting:
_split_batches()serializes each event and splits into sub-batches that fit within256 KiB. Single oversized events are sent anyway (server returns413, sender marks as dropped). - HTTP POST:
httpx.Clientwith HTTP/2,max_keepalive_connections=2,max_connections=4,timeout=1.5s. - Response handling:
200/202→ acked429→ retryable (respectsRetry-After)500/502/503/504→ retryable400/401/403/413→ dropped (fatal client error)- Network errors / timeout → retryable
- Composite result: If a batch was split into sub-batches with mixed results,
SendResultcombines them.
Lease Lifecycle:
stateDiagram-v2
[*] --> pending
pending --> sending : lease
sending --> delivered : success
sending --> pending : retryable (backoff)
sending --> failed : fatal
failed --> [*]
delivered --> [*]
Crash Recovery (recover_inflight()):
On spool initialization, any status='sending' rows with leased_at IS NULL OR leased_at < now-5min are reset to pending. Fresh leases (within 5 minutes) are left as sending to avoid races with a still-running worker.
Durability Modes
| Mode | Persistence | Crash Recovery | Use Case |
|---|---|---|---|
durable_local |
SQLite WAL spool | Yes — events survive process restart | Production default |
best_effort |
In-memory list | No — events lost on crash | Development, low-latency requirements |
agent |
Future — remote agent | Future | Not implemented |
Threading Model
sequenceDiagram
participant MT as Main Thread
participant BT as Background Thread (daemon)
MT->>BT: wrap() → enqueue()
MT->>BT: spool.append()
MT->>BT: wake.put(None)
BT->>BT: _wake.get()
BT->>BT: lease_batch()
BT->>BT: _send_batch()
MT->>BT: flush()
BT->>BT: force_flush=True
BT->>BT: lease_batch(flush_mode=True)
MT->>BT: close()
BT->>BT: _stop.set()
BT->>BT: final _do_flush()
BT->>BT: thread.join()
All spool operations are protected by a threading.Lock. SQLite connection is created with check_same_thread=False to allow the lock to serialize access.
Jitter and Thundering Herd Prevention
The flush interval (5s) is multiplied by 0.8 + random.random() * 0.4 on worker startup. This means each SDK instance flushes on a slightly different cadence, preventing synchronized spikes against the ingest server.
Payload Size Enforcement
Payload size is enforced at two levels:
- Sender split:
HttpSender._split_batches()ensures each HTTP POST body is ≤256 KiB. This is the hard boundary. - Worker batch size:
DeliveryWorkerleases up to100events. The worker does not enforce payload size; it relies on the sender to split oversized batches.
This separation keeps the worker simple (count-based only) while ensuring the sender never violates server limits.
Retry and Backoff
Exponential backoff is computed in SQLiteEventSpool.release():
delay = min(2 ** attempt_count, 300) # cap at 5 minutes
next_attempt_at = now + delay
The worker's lease_batch() respects next_attempt_at unless flush_mode=True (manual flush bypasses backoff).
Fail-Open Guarantees
The SDK guarantees that telemetry failures never propagate to user code:
- Wrapper level: Exceptions during metadata extraction are caught; the original provider response is still returned.
- Enqueue level:
spool.append()exceptions are caught and swallowed. - Worker level: Loop iteration exceptions are caught, logged at
debug, and the worker sleeps0.1sbefore retrying. - Sender level: All HTTP/network exceptions are caught and converted to
retryableordroppedresults.
Configuration Boundaries
Batch size (100), flush interval (5s), and payload cap (256 KiB) are not user-configurable. They are part of the SDK/server contract. Server-side enforcement is the real trust boundary; SDK hardcodes values as cooperation, not security.
User-configurable parameters:
api_key,endpoint,project_id,environmentenabled,debugtimeout_seconds,queue_maxsizedurability,spool_path,max_spool_bytes,max_spool_age_daysdrop_policy,default_tags
Unsupported Patterns
Not supported in this SDK version:
- OpenAI
1.x - Old module-level
openai==0.28APIs - Monkey patch instrumentation
- Framework middleware as primary integration path
- Automatic prompt/content capture
- Arbitrary provider clients without supported wrapper shape
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokenome-0.1.0.tar.gz.
File metadata
- Download URL: tokenome-0.1.0.tar.gz
- Upload date:
- Size: 127.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f9a4c63be110b6a3be5f8b0002984e0c8959f8e8b34a171a874054f165cc4aa
|
|
| MD5 |
d5a935916e6ad160cf01705de30a70a1
|
|
| BLAKE2b-256 |
47bf62cbf82809c0ff25ede31f6d8608bb1c7b50eac34f42d66ebbd30ef45999
|
Provenance
The following attestation bundles were made for tokenome-0.1.0.tar.gz:
Publisher:
publish.yml on khodex-rei/tokenome-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tokenome-0.1.0.tar.gz -
Subject digest:
5f9a4c63be110b6a3be5f8b0002984e0c8959f8e8b34a171a874054f165cc4aa - Sigstore transparency entry: 1928825044
- Sigstore integration time:
-
Permalink:
khodex-rei/tokenome-sdk@52d1c3d5684421b4e2a89a57e09b43be0fe2995d -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/khodex-rei
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@52d1c3d5684421b4e2a89a57e09b43be0fe2995d -
Trigger Event:
release
-
Statement type:
File details
Details for the file tokenome-0.1.0-py3-none-any.whl.
File metadata
- Download URL: tokenome-0.1.0-py3-none-any.whl
- Upload date:
- Size: 35.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94afa3ce2a30f8636c5b1643e107895c61e96605abb9a5f9572635db4c4c8c8b
|
|
| MD5 |
2b3280372f7aa0e472a2ae6b42f957ec
|
|
| BLAKE2b-256 |
221c66cade4bc7ae1aaa2a974682a873858cbe0b05d32506fa2406307437de3e
|
Provenance
The following attestation bundles were made for tokenome-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on khodex-rei/tokenome-sdk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tokenome-0.1.0-py3-none-any.whl -
Subject digest:
94afa3ce2a30f8636c5b1643e107895c61e96605abb9a5f9572635db4c4c8c8b - Sigstore transparency entry: 1928825194
- Sigstore integration time:
-
Permalink:
khodex-rei/tokenome-sdk@52d1c3d5684421b4e2a89a57e09b43be0fe2995d -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/khodex-rei
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@52d1c3d5684421b4e2a89a57e09b43be0fe2995d -
Trigger Event:
release
-
Statement type: