Skip to main content

Auto-instrumentation SDK for LLM API observability

Project description

Caliper Python SDK

The Caliper SDK auto-instruments the OpenAI and Anthropic SDKs for observability.

It captures token usage, latency, TTFT and any number of custom features from LLM SDK calls with zero code changes beyond a single init() call for basic metrics and only a single line to annotate a call with custom metrics.

Install

pip install caliper-sdk              # auto-detects installed provider SDKs
pip install caliper-sdk[s3]          # S3 export support

Quick start

make install-dev
import caliper
import anthropic

caliper.init(target="dev")  # writes to caliper_records.jsonl

client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=128,
    messages=[{"role": "user", "content": "Hello"}],
)

caliper.shutdown()

Configuration

Parameter Type Default Description
target str "dev" Export handler: "dev" (local JSONL) or "s3" (S3 bucket)
s3_bucket str | None None Required for "s3" target. Also reads CALIPER_S3_BUCKET env var
s3_access_key str | None None S3 access key (optional if using IAM roles)
s3_secret_key str | None None S3 secret key (optional if using IAM roles)
s3_region str us-east-1 S3 region
s3_prefix str "" Key prefix for S3 objects
s3_endpoint str | None None Custom S3-compatible endpoint URL
flush_interval float 2.0 Seconds between background flushes
batch_size int 250 Max records per export call
max_queue_size int 10_000 Backpressure limit — oldest records dropped
max_retries int 3 Retry count for HTTP transport
file_path str caliper_records.jsonl File path for "dev" target records
annotations_file_path str | None None File path for "dev" target annotations. Defaults to caliper_annotations.jsonl
debug bool False Verbose logging

All parameters can also be set via environment variables with CALIPER_ prefix (e.g. CALIPER_S3_BUCKET).

Metadata

Context-scoped (block)

with caliper.features(user_id="123", feature="chat"):
    client.messages.create(...)  # gets {user_id: "123", feature: "chat"}

Per-request (kwarg)

client.messages.create(
    ...,
    caliper_metadata={"campaign": "q4"},
)

Per-request metadata takes precedence over context metadata.

Linking requests

Link LLM calls to a prior request to track multi-turn conversations, retry chains, etc.

response = client.messages.create(...)
first_id = caliper.last_request_id()

with caliper.features(previous_request=first_id, feature="followup"):
    followup = client.messages.create(...)  # record gets linked_request_id=first_id

Post-request annotations

Attach metadata to requests after they complete — user feedback, classification labels, eval scores, etc.

# Get the ID of the most recent request
request_id = caliper.last_request_id()

# Annotate implicitly (uses last request)
caliper.annotate(sentiment="positive")

# Annotate explicitly by request ID
caliper.annotate(request_id, user_feedback="thumbs_up")

# Multiple annotations per request are allowed
caliper.annotate(request_id, reviewed_by="human")

Each annotation is keyed by the caliper-generated request_id for joining with the main request record.

Example output

Record JSON

Each flush writes a JSON array of records, this is an example of what that could look like

[
  {
    "request_id": "a1b1c1d1-1e1f-1a1b-1c1d-e1f1a1b1c1d1",
    "timestamp": "2025-06-15T14:32:01.482319+00:00",
    "provider": "anthropic",
    "model": "claude-sonnet-4-20250514",
    "endpoint": "/v1/messages",
    "duration_ms": 1243,
    "tokens_input": 42,
    "tokens_output": 156,
    "tokens_total": 198,
    "status": "success",
    "cached": false,
    "sdk_version": "0.1.0",
    "sdk_language": "python 3.12.4",
    "ttft_ms": 312,
    "http_status": 200,
    "metadata": {"user_id": "123", "feature": "chat", "user_tier": "Premium"}
  },
  {
    "request_id": "a1b2c3d4-5e6f-7a8b-9c0d-e1f2a3b4c5d6",
    "timestamp": "2025-06-15T14:32:05.917442+00:00",
    "provider": "openai",
    "model": "gpt-4o",
    "endpoint": "/v1/chat/completions",
    "duration_ms": 892,
    "tokens_input": 1045,
    "tokens_output": 0,
    "tokens_total": 1045,
    "status": "error",
    "cached": false,
    "sdk_version": "0.1.0",
    "sdk_language": "python 3.12.4",
    "http_status": 429,
    "error_code": "rate_limit_exceeded",
    "metadata": {"user_id": "456", "feature": "dashboard assistant", "user_tier": "Enterprise"}
  }
]

Annotation JSON

Annotations are flushed as a separate JSON array, joined to records by request_id.

[
  {
    "request_id": "a1b2c3d4-5e6f-7a8b-9c0d-e1f2a3b4c5d6",
    "timestamp": "2025-06-15T14:32:08.115200+00:00",
    "metadata": {"sentiment": "positive", "user_feedback": "thumbs_up"}
  },
  {
    "request_id": "a1b2c3d4-5e6f-7a8b-9c0d-e1f2a3b4c5d6",
    "timestamp": "2025-06-15T14:32:12.330100+00:00",
    "metadata": {"reviewed_by": "custom-post-request-guardrails", "review_output": "may contain pii"}
  }
]

Versioning and releases

The package version is derived from git tags at build time using hatch-vcs. There is no hardcoded version in pyproject.toml.

Releasing a new version:

git tag v0.2.0
git push origin v0.2.0

CI detects the tag, builds the package (uv build), and publishes to PyPI. The v prefix is stripped automatically — tag v0.2.0 produces package version 0.2.0.

During development (no tag on HEAD), editable installs get a dev version like 0.1.0.dev3+gabc1234 based on the last tag and number of commits since.

Checking the version:

import caliper
print(caliper.__version__)

Or from the command line:

uv run python -c "import caliper; print(caliper.__version__)"

Make targets

make install                   Install production dependencies
make install-dev               Install development dependencies
make lint                      Run ruff linter
make format                    Run ruff formatter
make test                      Run all tests
make test-sample P=10          Run random P% of tests
make up                        Start services with docker compose
make down                      Stop services with docker compose
make reload                    Rebuild and restart containers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

caliper_sdk-0.1.0.tar.gz (80.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

caliper_sdk-0.1.0-py3-none-any.whl (35.4 kB view details)

Uploaded Python 3

File details

Details for the file caliper_sdk-0.1.0.tar.gz.

File metadata

  • Download URL: caliper_sdk-0.1.0.tar.gz
  • Upload date:
  • Size: 80.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for caliper_sdk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4cb0358102bde2c0cb3af0bf74d4ba92fd78a26978591c361744dc9f7891972e
MD5 7c88d9c4194349be18fb7ccf42808f3c
BLAKE2b-256 4ee19f4ac981fcc1a4109efa85e27701c081b611cdb5037f2a9467fc98080c36

See more details on using hashes here.

File details

Details for the file caliper_sdk-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: caliper_sdk-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 35.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for caliper_sdk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 11f7381d22278fa5b523e2864e63f328de4d808ae214ff031578a40f3c7eb976
MD5 feedf37c91f0dd104e81d13972495a7f
BLAKE2b-256 f6e55d5b22199fde36414e27227812ec44b4fe99b64604e62ce0dee8fe11e31d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page