Agent analytics for Amplitude
Project description
amplitude-ai
Agent analytics for Amplitude. Track every LLM call, user message, tool call, and quality signal as events in your Amplitude project — then build funnels, cohorts, and retention charts across AI and product behavior.
pip install amplitude-ai
from amplitude import Amplitude
from amplitude_ai import AmplitudeAI, OpenAI # or AsyncOpenAI for async frameworks
ai = AmplitudeAI(amplitude=Amplitude("YOUR_API_KEY"))
openai = OpenAI(amplitude=ai, api_key="sk-...")
agent = ai.agent("my-agent")
@app.post("/chat")
async def chat(request):
data = await request.json()
with agent.session(user_id=data["user_id"], session_id=data["session_id"]) as session:
session.track_user_message(data["message"])
response = openai.chat.completions.create(
model="gpt-4o", messages=data["messages"],
)
return {"content": response.choices[0].message.content}
# Events: [Agent] User Message, [Agent] AI Response (with model, tokens, cost, latency),
# [Agent] Session End — all tied to user_id and session_id
How to Get Started
Instrument with a coding agent (recommended)
pip install amplitude-ai
amplitude-ai
The CLI prints a prompt to paste into any AI coding agent (Cursor, Claude Code, Windsurf, Copilot, Codex, etc.):
Instrument this app with amplitude-ai. Follow <path-to-site-packages>/amplitude-ai.md
The agent reads the guide, scans your project, discovers your agents and LLM call sites, and instruments everything — provider wrappers, session lifecycle, multi-agent delegation, tool tracking, scoring, and a verification test. You review and approve each step.
Manual setup
Whether you use a coding agent or set up manually, the goal is the same: full instrumentation — agents + sessions + provider wrappers. This gives you every event type, per-user analytics, and server-side enrichment.
Follow the code example above to get started. The pattern is:
- Swap your LLM import —
from amplitude_ai import OpenAI(orAsyncOpenAI,Anthropic, etc.) - Create an agent —
ai.agent("my-agent")to name and track your AI component - Wrap in a session —
agent.session(user_id=..., session_id=...)for per-user analytics, funnels, cohorts, and server-side enrichment - Track user messages —
session.track_user_message(...)for conversation context - Score responses —
session.score(...)for quality measurement
patch()exists for quick verification or legacy codebases where you can't modify call sites, but it only captures[Agent] AI Responsewithout user identity — no funnels, no cohorts, no retention. Start with full instrumentation; fall back topatch()only if you can't modify call sites.
Table of Contents
- How to Get Started
- Quickstart (5 minutes)
- Current Limitations
- Is this for me?
- Why this SDK?
- What you can build
- Integration Approaches
- Multi-service / Distributed Tracing
- Developer Experience
- Model Tier Auto-Inference
- Semantic Cache Tracking
- FastAPI / Starlette Middleware
- Get Started
- Core Concepts
- Integration Patterns
- Going Deeper
- Event Schema
- Event Property Reference
- Sending Events Without the SDK
- Register Event Schema in Your Data Catalog
- Content Storage (
$llm_message) - Troubleshooting
- For AI Coding Agents
- Reporting Issues
Quickstart (5 minutes)
- Install:
pip install amplitude-ai - Get your API key: In Amplitude, go to Settings > Projects and copy the API key.
- Instrument: Run
amplitude-aiand paste the printed prompt into your AI coding agent. Or follow the manual setup steps — the goal is the same: agents + sessions + provider wrappers. - Set your API key in the generated
.envfile and replace the placeholderuser_id/session_id. - Run your app. You should see
[Agent] User Message,[Agent] AI Response, and[Agent] Session Endwithin 30 seconds.
To verify locally before checking Amplitude, add debug=True:
ai = AmplitudeAI(amplitude=Amplitude("YOUR_API_KEY"), config=AIConfig(debug=True))
# Prints: [amplitude-ai] [Agent] AI Response | model=gpt-4o | tokens=847 | cost=$0.0042 | latency=1,203ms
What full instrumentation gives you
Full instrumentation means agents + sessions + provider wrappers. This is the recommended setup for both coding agent and manual workflows.
| What you set | What you unlock |
|---|---|
Provider wrapper (from amplitude_ai import OpenAI) |
Model, tokens, cost, latency, TTFB — auto-captured per call |
+ user_id |
Per-user funnels, cohorts, retention |
+ session_id (via agent.session(...)) |
Session grouping, server-side enrichment, quality scoring, behavioral patterns |
+ score() calls |
Explicit quality signals alongside automated evals |
Adding user_id is one parameter per call. Adding session context is two lines. See User Identity and Integration Approaches.
Tip: Call
enable_live_price_updates()at startup so cost tracking stays accurate when new models are released. See Cache-Aware Cost Calculation.
Current Limitations
| Area | Status |
|---|---|
| Language support | Python only. JS/TS SDK is the next major investment (no public ETA yet). |
| Zero-code patching | OpenAI, Anthropic, Azure OpenAI, Gemini, Mistral. Bedrock: use wrap() or swap import. CLI wrapper available for env-var-only setup. |
| Proxy/gateway instrumentation | Use the OTEL bridge for proxy setups (LiteLLM, Portkey, custom gateways). See Path B. |
| Streaming cost tracking | Automatic for OpenAI and Anthropic. Manual token counts for other providers' streamed responses. |
Is this for me?
Yes, if you're building an AI-powered feature (chatbot, copilot, agent, RAG pipeline) and you want to measure how it impacts real user behavior. AI events land in the same Amplitude project as your product events, so you can build funnels from "user asks a question" to "user converts," create cohorts of users with low AI quality scores, and measure retention without stitching data across tools.
Already using an LLM observability tool? Keep it. The OTEL bridge adds Amplitude as a second destination in one line. Your existing traces stay, and you get product analytics on top.
This SDK is for teams who want AI session review, automated enrichment, and business impact measurement in the same place they measure product behavior. The quickstart takes under 5 minutes.
Why this SDK?
Most AI observability tools give you traces. This SDK gives you per-turn events that live in your product analytics so you can build funnels from "user opens chat" through "AI responds" to "user converts," create cohorts of users with low AI quality scores and measure their 7-day retention, and answer "is this AI feature helping or hurting?" without moving data between tools.
The structural difference is the event model. Trace-centric tools typically produce spans per LLM call. This SDK produces one event per conversation turn with 40+ properties: model, tokens, cost, latency, reasoning, implicit feedback signals (regeneration, copy, abandonment), cache breakdowns, agent hierarchy, and experiment context. Each event is independently queryable in Amplitude's charts, cohorts, funnels, and retention analysis.
Every AI event carries your product user_id. No separate identity system, no data joining required. Build a funnel from "user opens chat" to "AI responds" to "user upgrades" directly in Amplitude, using the same user properties and cohort definitions you already have.
Server-side enrichment does the evals for you. When content is available (content_mode="full"), Amplitude's enrichment pipeline runs automatically on every session after it closes. You get topic classifications, quality rubrics, behavioral flags, and session outcomes without writing or maintaining any eval code. Define your own topics and scoring rubrics; the pipeline applies them to every session automatically. Results appear as [Agent] Score events with rubric scores, [Agent] Topic Classification events with category labels, and [Agent] Session Evaluation summaries, all queryable in charts, cohorts, and funnels alongside your product events.
Quality signals from every source in one event type. User thumbs up/down (source="user"), automated rubric scores from the enrichment pipeline (source="ai"), and reviewer assessments (source="reviewer") all produce [Agent] Score events differentiated by [Agent] Evaluation Source. One chart shows all three side by side. Filter by source or view them together. Filter by [Agent] Agent ID for per-agent quality attribution.
Three content-control tiers. full sends content and Amplitude runs enrichments for you. metadata_only sends zero content (you still get cost, latency, tokens, session grouping, and everything that doesn't require text). customer_enriched sends zero content but lets you provide your own structured labels via track_session_enrichment() for the same analytics value with full data control. See Privacy & Content Control for what each tier enables.
Cache-aware cost tracking. Pass cache_read_tokens and cache_creation_tokens for accurate blended costs. With Anthropic's prompt caching, naive tokens x price overestimates by 2-5x on multi-turn sessions. The SDK uses cache-aware pricing automatically via genai-prices when you provide the token breakdown. Supported for OpenAI, Anthropic, Gemini, Azure OpenAI, and AWS Bedrock.
Works alongside your existing LLM tools. Add the OTEL GenAI exporter to your pipeline to send spans to Amplitude alongside Langfuse, OpenLIT, or other destinations with no changes to your existing instrumentation code. Use LangChain, LlamaIndex, OpenAI Agents SDK, Anthropic tool_use loop, or CrewAI integrations for framework-level tracking. Or swap in provider wrappers (OpenAI, Anthropic, Gemini, Azure, Bedrock, Mistral) for the richest field coverage.
Multi-agent and multi-tenant from day one. ai.agent() creates a bound handle that carries agent_id, description, agent_version, env, and optional multi-tenant fields so you never repeat them. ai.tenant() pre-fills customer_org_id and groups for platforms serving multiple customers. agent.child() auto-sets parent_agent_id and inherits agent_version. agent.session() manages lifecycle automatically and propagates context to provider wrappers and the OTEL bridge via Python's contextvars.
What you can build
Once AI events are in Amplitude alongside your product events:
Cohorts. "Users who had 3+ task failures in the last 30 days." "Users with low task completion scores." Target them with Guides, measure churn impact.
Funnels. "AI session about charts -> Chart Created." "Sign Up -> First AI Session -> Conversion." Measure whether AI drives feature adoption and onboarding.
Retention. Do users with successful AI sessions retain better than those with failures? Segment retention curves by [Agent] Overall Outcome or task completion score.
Agent analytics. Compare quality, cost, and failure rate across agents in one chart. Identify which agent in a multi-agent chain introduced a failure.
How quality measurement works
The SDK captures quality signals at three layers, from most direct to most comprehensive:
1. Explicit user feedback — Instrument thumbs up/down, star ratings, or CSAT scores via score(). Each call produces an [Agent] Score event with source="user":
ai.score(user_id="u1", name="user-feedback", value=1,
target_id=ai_msg_id, target_type="message", source="user")
2. Implicit behavioral signals — The SDK auto-tracks behavioral proxies for quality on every turn, with zero additional instrumentation:
| Signal | Property | Event | Interpretation |
|---|---|---|---|
| Copy | [Agent] Was Copied |
[Agent] AI Response |
User copied the output — positive |
| Regeneration | [Agent] Is Regeneration |
[Agent] User Message |
User asked for a redo — negative |
| Edit | [Agent] Is Edit |
[Agent] User Message |
User refined their prompt — friction |
| Abandonment | [Agent] Abandonment Turn |
[Agent] Session End |
User left after N turns — potential failure |
3. Automated server-side evaluation — When content_mode="full", Amplitude's enrichment pipeline runs LLM-as-judge evaluators on every session after it closes. No eval code to write or maintain:
| Rubric | What it measures | Scale |
|---|---|---|
task_completion |
Did the agent accomplish what the user asked? | 0–2 |
response_quality |
Was the response clear, accurate, and helpful? | 0–2 |
user_satisfaction |
Did the user seem satisfied based on conversation signals? | 0–2 |
agent_confusion |
Did the agent misunderstand or go off track? | 0–2 |
Plus boolean detectors: negative_feedback (frustration phrases), task_failure (agent failed to deliver), data_quality_issues, and behavioral_patterns (clarification loops, topic drift). All results are emitted as [Agent] Score events with source="ai".
All three layers use the same [Agent] Score event type, differentiated by [Agent] Evaluation Source ("user", "ai", or "reviewer"). One chart shows user feedback alongside automated evals. No joins, no separate tables.
What You Set vs What You Get
| You set | Where it comes from | What you unlock |
|---|---|---|
| API key | Amplitude project settings | Events reach Amplitude |
| user_id | Your auth layer (JWT, session cookie, API token) | Per-user analytics, cohorts, retention |
| agent_id | Your choice (e.g. 'chat-handler') |
Per-agent cost, latency, quality dashboards |
| session_id | Your conversation/thread/ticket ID | Multi-turn analysis, session enrichment, quality scores |
| description | Your choice (e.g. 'Handles support queries via GPT-4o') |
Human-readable agent registry from event streams |
| content_mode + redact_pii | Config (defaults work) | Server enrichment (automatic), PII scrubbing |
| model, tokens, cost | Auto-captured by provider wrappers | Cost analytics, latency monitoring |
| parent_agent_id | Auto via child()/run_as() |
Multi-agent hierarchy |
| env, agent_version, context | Your deploy pipeline | Segmentation, regression detection |
Italicized rows require zero developer effort — they're automatic or have sensible defaults.
The minimum viable setup is 4 fields: API key, user_id, agent_id, session_id. Everything else is either automatic or a progressive enhancement.
Integration Approaches
Start with full instrumentation. Use agents + sessions + provider wrappers. This is the recommended approach for both coding agent and manual workflows — it gives you every event type, per-user analytics, and server-side enrichment.
| Approach | When to use | What you get |
|---|---|---|
| Full control (recommended) | Any project, new or existing | BoundAgent + Session + provider wrappers — all event types, per-user funnels, cohorts, retention, quality scoring, enrichments |
| FastAPI middleware | Web app, auto-session per request | Same as full control with automatic session lifecycle via AmplitudeAIMiddleware |
| Swap import | Existing codebase, incremental adoption | from amplitude_ai import OpenAI — auto-tracking per call, add sessions when ready |
| Wrap | You've already created a client | wrap(client, amplitude=amp) — instruments an existing client instance |
| Managed / hosted agents | Anthropic Managed Agents, OpenAI Assistants, agent-as-a-service | Manual track_user_message + track_ai_message + track_tool_call with tokens/cost from the API response, or ManagedAgentTracker adapter |
| Claude Agent SDK hooks | Claude Agent SDK (claude-code, claude-agent) |
ClaudeAgentSDKTracker — PreToolUse/PostToolUse hooks for real tool latency + message stream processing |
Zero-code / patch() |
Verification or legacy codebases only | amplitude_ai.patch(amplitude=amp) — [Agent] AI Response + auto-extracted [Agent] Tool Call, no user identity, no funnels |
The first four approaches all support the full event model. Choose based on how you want to integrate — the analytics capabilities are the same.
patch()is the exception: it only captures aggregate[Agent] AI Responseevents without user identity, useful only for verifying the SDK works or for codebases where you can't modify call sites.
New in v1.4.0: patch() now automatically extracts [Agent] Tool Call events from LLM message arrays — no manual track_tool_call() needed for basic tool tracking. It scans conversation messages for tool calls and their results across OpenAI Chat Completions (tool_calls arrays), OpenAI Responses API (function_call / function_call_output), and Anthropic Messages (tool_use / tool_result blocks). Extracted tool calls have latency_ms=0 since execution timing isn't available through message inspection; use the @tool decorator or ClaudeAgentSDKTracker hooks for real latency.
Zero-code patches provider modules so existing calls are tracked without code changes:
import amplitude_ai
amplitude_ai.patch(amplitude=amp)
# All subsequent openai, anthropic, gemini, mistral calls are instrumented
amplitude_ai.unpatch() # Restore all originals -- critical for test isolation
patch() auto-detects installed providers and returns a list of what it patched (e.g. ["openai", "anthropic", "gemini"]). If you only want to patch a specific provider, use the per-provider functions:
amplitude_ai.patch_openai(amplitude=amp)
amplitude_ai.patch_async_openai(amplitude=amp)
amplitude_ai.patch_anthropic(amplitude=amp)
amplitude_ai.patch_gemini(amplitude=amp)
amplitude_ai.patch_mistral(amplitude=amp)
amplitude_ai.patch_azure_openai(amplitude=amp)
Declaring expected providers (optional). You can declare which providers your application expects to use and have the SDK log a one-time warning if the set it actually patches differs:
amplitude_ai.patch(
amplitude=amp,
expected_providers=["openai"], # declared set
app_key="support-bot", # optional, used to dedupe warnings
)
# If the code ends up importing Anthropic too, a single warning is logged:
# amplitude-ai: application 'support-bot'; declared providers ['openai']
# do not match providers patched at runtime ['anthropic', 'openai'];
# unexpected: ['anthropic']. Events will still be emitted; ...
This is warn-only — it never throws, never blocks patch(), and never
interferes with event emission. Useful for catching drift between your
declared configuration and what your code actually imports.
Zero-code patching is available for OpenAI (sync and async), Anthropic, Azure OpenAI, Gemini, and Mistral. For Bedrock, use the Swap import provider class directly (from amplitude_ai.providers.bedrock import Bedrock) because the boto3.client() factory pattern doesn't support clean monkey-patching.
amplitude-ai-instrument — run any Python application with automatic LLM instrumentation, using only environment variables and no source code changes:
pip install amplitude-ai
AMPLITUDE_AI_API_KEY=xxx AMPLITUDE_AI_AUTO_PATCH=true amplitude-ai-instrument python app.py
amplitude-ai-instrument initializes the Amplitude SDK, auto-detects installed LLM providers (OpenAI, Anthropic, Gemini, Mistral, Azure OpenAI), patches them via patch(), then executes your command. All LLM calls in your app are instrumented automatically. This is equivalent to adding amplitude_ai.patch(amplitude=amp) at the top of your code, but without modifying any source files.
Limitation: Because
amplitude-ai-instrumentusespatch()under the hood, it only captures[Agent] AI Responseevents without user identity or session context. Use it for quick verification, demos, or legacy codebases. For per-user analytics, funnels, and enrichments, use full instrumentation instead.
| Variable | Description |
|---|---|
AMPLITUDE_AI_API_KEY |
(required) Amplitude API key |
AMPLITUDE_AI_AUTO_PATCH |
Must be "true" to enable patching |
AMPLITUDE_AI_CONTENT_MODE |
"full" (default), "metadata_only", or "customer_enriched" |
AMPLITUDE_AI_DEBUG |
"true" for colored event summaries on stderr |
Doctor CLI:
Validate setup (env, provider deps, mock event capture, mock flush path):
amplitude-ai-doctor
Useful flags:
amplitude-ai-doctor --no-mock-check
Status CLI:
Print the SDK's current state (version, patched providers, active sessions):
amplitude-ai-status
MCP server:
Run the SDK-local MCP server over stdio:
amplitude-ai-mcp
MCP surface:
| Tool | Description |
|---|---|
get_event_schema |
Return the full event schema and property definitions |
get_integration_pattern |
Return canonical instrumentation code patterns |
validate_setup |
Check env vars and dependency presence |
suggest_instrumentation |
Context-aware next steps based on your framework and provider |
validate_file |
Analyze source code to detect uninstrumented LLM call sites |
Resources: amplitude-ai://event-schema, amplitude-ai://integration-patterns
Prompt: instrument_app — guided walkthrough for instrumenting an application
Examples and AI coding agent guide:
- Mock-based examples demonstrating the event model (also used as CI smoke tests):
examples/zero_code_example.pyexamples/wrap_openai_example.pyexamples/multi_agent_example.pyexamples/framework_integration_example.py
- Real provider examples (require API keys):
examples/real_openai_example.py— end-to-end OpenAI integration with session tracking and flushexamples/real_anthropic_example.py— end-to-end Anthropic integration with session tracking and flush
- AI coding agent guide:
amplitude-ai.md— self-contained 4-phase instrumentation guide for any AI coding agent
Wrap instruments a client you've already created (OpenAI, Anthropic, Azure OpenAI):
from openai import OpenAI
client = OpenAI(api_key="sk-...")
wrapped = amplitude_ai.wrap(client, amplitude=amp, user_id="u1")
# wrapped is a real amplitude_ai.OpenAI instance
Move to Full control when you need multi-agent hierarchy, custom scoring, or session lifecycle management.
Multi-service / Distributed Tracing
If your LLM pipeline spans multiple services (e.g., an orchestrator calling a retrieval service that calls an LLM), enable context propagation so sessions link across service boundaries:
from amplitude_ai import AmplitudeAI
from amplitude_ai.config import AIConfig
ai = AmplitudeAI(
amplitude=amplitude,
config=AIConfig(propagate_context=True),
)
When enabled, provider wrappers inject W3C traceparent and x-amplitude-session-id headers on outgoing LLM calls. Downstream services running the SDK (or the AmplitudeAIMiddleware) automatically pick up this context, linking the sessions into a single distributed trace.
You can also inject/extract context manually for non-LLM HTTP calls:
from amplitude_ai.propagation import inject_context, extract_context
# Sender: inject context into outgoing headers
headers = inject_context(existing_headers)
requests.post("https://downstream-service/api", headers=headers)
# Receiver: extract context from incoming headers
ctx = extract_context(request.headers)
# ctx = {"trace_id": "...", "session_id": "...", "agent_id": "..."}
Context propagation is opt-in (default False) because injecting extra headers into LLM API calls is harmless for most providers (they ignore unknown headers), but some proxies or custom endpoints may reject them.
Developer Experience
Enable debug mode to see every tracked event in your terminal. Set it on AIConfig (or pass debug=True to patch() for the zero-code fallback):
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(debug=True))
# [amplitude-ai] [Agent] AI Response | user=u1 | session=a3f8... | model=gpt-4o | tokens=1,247 | cost=$0.0089 | latency=1,203ms
Use dry-run mode in CI to validate events without sending them:
ai = AmplitudeAI(api_key="unused", config=AIConfig(dry_run=True))
Enable strict validation to catch bad inputs early (empty user_id, negative latency_ms, non-numeric scores):
ai = AmplitudeAI(api_key="...", config=AIConfig(validate=True))
# Raises ValidationError on bad inputs instead of silently continuing
Combine all three for the strictest CI configuration:
ai = AmplitudeAI(api_key="unused", config=AIConfig(debug=True, dry_run=True, validate=True))
Inspect current configuration at any time:
ai.status()
# {"content_mode": "full", "debug": False, "dry_run": False,
# "redact_pii": True, "providers_available": ["openai", "anthropic"],
# "patched_providers": ["openai"]}
Model Tier Auto-Inference
Every [Agent] AI Response event automatically includes a [Agent] Model Tier property ("fast", "standard", or "reasoning") inferred from the model name. This enables cost optimization insights like "70% of simple sessions use your most expensive model."
Override when the auto-inference is wrong:
ai.track_ai_message(..., model_tier="reasoning")
Coverage at launch: GPT-4o-mini/Haiku/Flash = fast, GPT-4o/Sonnet/Pro = standard, o1/o3/DeepSeek-R1 = reasoning.
Semantic Cache Tracking
Track full-response semantic cache hits (distinct from token-level prompt caching):
ai.track_ai_message(..., was_cached=True) # Served from Redis/semantic cache
Maps to [Agent] Was Cached. Enables "cache hit rate" charts and cost optimization analysis.
FastAPI / Starlette Middleware
Auto-create sessions per HTTP request with context propagation to all SDK calls within the handler:
from amplitude_ai.middleware import AmplitudeAIMiddleware
app.add_middleware(
AmplitudeAIMiddleware,
amplitude_ai=ai,
user_id_resolver=lambda request: request.state.user.id,
)
Provider wrappers and @tool calls within the request handler automatically inherit the session context. No manual session_id passing needed.
Get Started
Both paths below lead to the same outcome: full instrumentation — agents + sessions + provider wrappers — giving you every event type, per-user funnels, cohorts, retention, and server-side enrichment.
Path A: You use Amplitude for product analytics
You already have amplitude-analytics sending product events. Now you're adding AI features and want those events in the same project.
Step 1: Install and create your agent
pip install "amplitude-ai[openai]" # or [anthropic], [gemini], [bedrock], [mistral]
from amplitude import Amplitude
from amplitude_ai import AmplitudeAI, OpenAI # drop-in replacement
# Share your existing Amplitude instance -- same pipeline, no duplicate queues
amplitude = Amplitude("YOUR_API_KEY")
ai = AmplitudeAI(amplitude=amplitude)
agent = ai.agent("my-agent", env="production")
# Use the wrapped client exactly like the original
client = OpenAI(amplitude=ai, api_key="sk-...")
Step 2: Wrap in a session and track
The provider wrapper call automatically inherits session_id, trace_id, agent_id, and turn_id from the active session via Python's contextvars. No extra parameters on the LLM call.
with agent.session(user_id="user-1", session_id="conv-123") as s:
s.track_user_message(content="What is retention?")
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is retention?"}],
)
# [Agent] User Message + [Agent] AI Response with session context
s.score(name="helpful", value=1.0, target_id="...")
# [Agent] Session End auto-emitted here, triggering enrichment immediately.
# Without the `with` block, Amplitude auto-closes after 30 min of inactivity.
You're now getting the full event model: [Agent] User Message, [Agent] AI Response (with model, tokens, cost, latency), [Agent] Score, and [Agent] Session End — all tied to user_id and session_id, all appearing alongside your existing product events.
Step 3: Progressive enhancement (add as needed)
For sessions where gaps between messages may exceed 30 minutes (e.g., coding assistants, support agents waiting on customer replies), pass idle_timeout_minutes so Amplitude knows the session is still active:
with agent.session(idle_timeout_minutes=240) as s: # expect up to 4-hour gaps
...
Without this, sessions with long idle periods may be closed and evaluated prematurely. The default is 30 minutes.
Link to Session Replay (optional)
If your frontend uses Amplitude's Session Replay, you can link browser recordings to AI sessions. Pass the browser's device_id and session_id to agent.session() and every [Agent] event will automatically include the [Amplitude] Session Replay ID property (device_id/session_id), enabling one-click navigation from an AI session to the corresponding replay.
# The frontend sends device_id and session_id to your backend
# (e.g., via request headers, query params, or the request body).
with agent.session(
user_id="user-1",
device_id=request.headers["X-Amp-Device-Id"],
browser_session_id=request.headers["X-Amp-Session-Id"],
) as s:
s.new_trace()
s.track_user_message(content="What is retention?")
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is retention?"}],
)
# All events now carry [Amplitude] Session Replay ID
Provider wrappers, @tool calls, and manual track_* calls all inherit the replay ID automatically when inside the session block.
What unlocks with full instrumentation:
| Capability | Events | What it enables |
|---|---|---|
| Provider wrapper (auto) | [Agent] User Message, [Agent] AI Response |
Model, tokens, cost, latency, TTFB, reasoning, system prompt, implicit feedback signals |
| + agent + session (Steps 1-2 above) | + [Agent] Session End, [Agent] Score |
Per-user funnels, cohorts, retention, session grouping, abandonment analysis, server-side enrichments |
| + manual track_* calls (optional) | + [Agent] Tool Call, [Agent] Embedding, [Agent] Span, [Agent] Session Enrichment |
Full event graph, customer-provided enrichments, multi-agent hierarchies |
Next: Scoring | Enrichments | All providers | Privacy
Path B: You already use an OTEL LLM tool
Already using Langfuse, OpenLIT, or Datadog for tracing? Keep them. Add Amplitude as a second destination in one line. You get product analytics for AI (funnels, cohorts, retention across AI and product events) without ripping out your existing setup. The OTEL GenAI exporter consumes any OTEL GenAI semantic convention spans and maps them to Amplitude [Agent] events.
Step 1: Add the bridge
pip install "amplitude-ai[otel]"
from amplitude import Amplitude
from amplitude_ai import AmplitudeAI
from opentelemetry import trace
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from amplitude_ai.integrations.opentelemetry import AmplitudeAgentExporter
amplitude = Amplitude("YOUR_API_KEY")
ai = AmplitudeAI(amplitude=amplitude)
# Add alongside your existing TracerProvider (e.g. Langfuse, OpenLIT, etc.)
trace.get_tracer_provider().add_span_processor(
SimpleSpanProcessor(AmplitudeAgentExporter(amplitude=amplitude, user_id="user-123"))
)
# All GenAI spans now flow to Amplitude as [Agent] events -- zero changes to your
# existing instrumentation. Your OTEL tool keeps working exactly as before.
Step 2: Add session context
Wrap your code in agent.session() and the OTEL bridge automatically inherits session/agent context via the same ContextVar mechanism:
agent = ai.agent("support-bot", env="production")
with agent.session(user_id="user-123") as s:
s.new_trace()
s.track_user_message(content="What is retention?")
# Any OTEL-instrumented GenAI calls inside this block automatically get
# session_id, trace_id, turn_id, and agent_id in Amplitude
result = my_instrumented_function(...) # Langfuse @observe, OpenLIT, etc.
s.score(name="helpful", value=1.0, target_id="...")
# Session auto-ends, server-side enrichment kicks in
What unlocks at each step:
| Step | Events | Key fields | Not available from OTEL |
|---|---|---|---|
| Add bridge | [Agent] User Message, [Agent] AI Response, [Agent] Embedding, [Agent] Tool Call |
model, provider, tokens (input/output/total), cache tokens (read/creation), cost (cache-aware), latency, system prompt, temperature, top_p, max_output_tokens, content (if opted-in), errors | Reasoning content/tokens, TTFB, streaming detection, implicit feedback, file attachments, event graph linking (parent_message_id) |
| + session context | + [Agent] Score |
+ session_id, trace_id, turn_id, agent_id, description, agent_version, env, abandonment_turn | Same field gaps, but now: session grouping, scoring, abandonment analysis, server-side enrichments. Compare agent versions. Build funnels from product events through AI sessions. |
| + selective native wrappers | Same events, richer fields on wrapped providers | + reasoning content, TTFB, streaming, implicit feedback (is_regeneration, is_edit, was_copied), file attachments for those providers | Gaps closed progressively per provider you wrap. See Provider Wrappers. |
The third row is the natural upgrade path: start with the OTEL bridge for everything, then selectively wrap your most important provider calls for full field coverage. The bridge and native wrappers coexist; you don't have to choose one or the other.
Next: OTEL Bridge details | Scoring | Privacy | Provider Wrappers
User Identity
User identity flows through the session or per-call, not at agent creation or patch time. This keeps the agent reusable across users.
Via sessions (recommended): pass user_id when opening a session:
agent = ai.agent("support-bot", env="production")
with agent.session(user_id="user-42") as s:
s.new_trace()
s.track_user_message(content="Hello")
response = client.chat.completions.create(model="gpt-4o", messages=[...])
Per-call: pass amplitude_user_id on each LLM call (useful with patch() or when not using sessions):
response = client.chat.completions.create(
model="gpt-4o",
messages=[...],
amplitude_user_id="user-42",
)
Via middleware: AmplitudeAIMiddleware extracts user identity from the request (see FastAPI / Starlette Middleware).
Initialization Options
from amplitude import Amplitude
# Recommended -- share your existing Amplitude pipeline
amplitude = Amplitude("YOUR_API_KEY")
ai = AmplitudeAI(amplitude=amplitude)
# Standalone (creates an Amplitude instance internally)
ai = AmplitudeAI(api_key="YOUR_API_KEY")
# EU Data Residency
amplitude = Amplitude("YOUR_API_KEY")
amplitude.configuration.server_zone = "EU"
ai = AmplitudeAI(amplitude=amplitude)
Core Concepts
The SDK organizes AI interactions into sessions, traces, turns, and spans:
What you instrument:
SESSION
Trace ← one per user message (new_trace())
[Agent] User Message automatic from provider wrapper
[Agent] Tool Call automatic or track_tool_call()
[Agent] AI Response automatic from provider wrapper
[Agent] Score s.score() — rate this response
...repeat per conversation turn...
[Agent] Session Enrichment track_session_enrichment()
[Agent] Score s.score() — rate the session
[Agent] Session End track_session_end()
What Amplitude adds automatically (content_mode="full" only):
[Agent] Session Evaluation outcome, flags, behavioral patterns
[Agent] Topic Classification one per topic model you define
[Agent] Score (ai) one per rubric you define
Scores can attach at message level (rate a specific response) or session level (rate the whole conversation). Enrichments attach at session level only.
| Concept | Property | Description |
|---|---|---|
| Session | session_id |
A conversation between a user and the AI. All events in one conversation share the same session_id. |
| Trace | trace_id |
One user-message-to-AI-response cycle. Generate a new trace_id (UUID) each time the user sends a message. All events in that cycle (the user message, any tool calls, and the AI response) share the same trace_id. Use new_trace() or pass a UUID directly. |
| Turn | turn_id |
Monotonically increasing counter for event ordering. The SDK auto-increments per session when omitted. For custom ordering (e.g., per-trace numbering), pass explicit values. |
| Span | span_id |
A tracked operation: tool call, embedding, vector search, or custom step. |
| Agent | agent_id |
Which agent handled the interaction (for multi-agent systems). |
What is a BoundAgent?
A BoundAgent is a pre-configured handle that carries context (agent_id, user_id, session_id, env, description, agent_version, etc.) so you don't repeat these fields on every tracking call. You create one via ai.agent(...):
agent = ai.agent("support-bot", env="production", description="Handles support queries")
Use agent.session(user_id=..., session_id=...) for request-scoped sessions (the with block auto-manages lifecycle), or call agent.track_*() methods directly for long-lived conversations that span multiple requests.
Key capabilities:
agent.session(...)— creates aSessioncontext manager for request-scoped conversationsagent.child("sub-agent")— creates a child agent that inherits context and auto-setsparent_agent_idagent.track_user_message(...),agent.track_ai_message(...), etc. — tracking methods with all context pre-filledagent.track_session_end()— explicit session close for long-lived conversations
Events at a Glance
The SDK produces 8 event types. When content_mode="full", Amplitude's server adds 3 more per session automatically.
| Event | What it captures |
|---|---|
[Agent] User Message |
User's input, attachments (file uploads), regeneration/edit signals |
[Agent] AI Response |
Model output, tokens, cost, latency, reasoning, system prompt, model config, copy signal |
[Agent] Tool Call |
Function/tool invocation by the AI |
[Agent] Embedding |
Vector embedding operation |
[Agent] Span |
Any pipeline step (search, rerank, guardrails) |
[Agent] Session End |
Explicit session close, abandonment tracking |
[Agent] Session Enrichment |
Your structured labels (topics, rubrics, outcomes) |
[Agent] Score |
Quality signal on a message or session (user, automated, or annotator) |
| Server-side (automatic when content_mode="full"): | |
[Agent] Session Evaluation |
Session summary with outcome and behavioral flags |
[Agent] Topic Classification |
Category label per configured topic model |
[Agent] Score (automated) |
Rubric score per configured rubric |
See Event Schema for the full property reference. Most tracking methods return a unique ID (message_id, invocation_id, or span_id) that you can use to link related events into a graph. Session lifecycle methods (track_session_end, track_session_enrichment) and score return None. See Event Linking for the full table and code examples.
What You Actually Get
Every SDK tracking call produces a standard Amplitude event. Here's what they look like in practice. These are the events you query in charts, cohorts, and funnels. Notice that user_id is the same one your product events use.
A [Agent] AI Response event (SDK-emitted, immediate):
{
"event_type": "[Agent] AI Response",
"user_id": "user-42",
"event_properties": {
"[Agent] Session ID": "sess-abc-123",
"[Agent] Trace ID": "trace-7f3a",
"[Agent] Turn ID": 2,
"[Agent] Message ID": "msg-9e2f-4a1b",
"[Agent] Model Name": "claude-sonnet-4-20250514",
"[Agent] Provider": "anthropic",
"[Agent] Latency Ms": 1240.5,
"[Agent] TTFB Ms": 89.3,
"[Agent] Input Tokens": 4850,
"[Agent] Output Tokens": 312,
"[Agent] Cache Read Tokens": 4200,
"[Agent] Cost USD": 0.0019,
"[Agent] Finish Reason": "end_turn",
"[Agent] Is Streaming": true,
"[Agent] Temperature": 0.7,
"[Agent] Agent ID": "support-bot",
"[Agent] Agent Version": "v4.2",
"[Agent] Env": "production",
"[Agent] Context": "{\"experiment_variant\": \"prompt-v2\", \"surface\": \"chat\"}",
"[Agent] Was Copied": true,
"[Agent] Is Error": false,
"[Agent] Component Type": "llm",
"[Agent] SDK Version": "1.0.2",
"[Agent] Runtime": "python"
}
}
A [Agent] Session Evaluation event (server-generated after session closes, when content_mode="full"):
{
"event_type": "[Agent] Session Evaluation",
"user_id": "user-42",
"event_properties": {
"[Agent] Session ID": "sess-abc-123",
"[Agent] Overall Outcome": "response_provided",
"[Agent] Turn Count": 4,
"[Agent] Has Task Failure": false,
"[Agent] Has Negative Feedback": false,
"[Agent] Has Technical Failure": false,
"[Agent] Behavioral Patterns": ["multi_turn_refinement"],
"[Agent] Agent Chain Depth": 1,
"[Agent] Models Used": ["claude-sonnet-4-20250514"],
"[Agent] Session Cost USD": 0.0087,
"[Agent] Evaluation Source": "ai",
"[Agent] Taxonomy Version": "2.0"
}
}
The first event is queryable immediately. The second appears within minutes of session close. Both carry the same user_id and session_id. Build a cohort from Session Evaluation properties (e.g., Has Task Failure = true) and measure that cohort's 7-day retention using your existing product events.
Data Flow
The SDK uses composition: it wraps an Amplitude instance rather than subclassing it. When amplitude is passed in, the SDK shares your existing event pipeline with no duplicate queues. It never opens its own network connections:
Your Code amplitude-ai SDK Amplitude
─────────────────────────────────────────────────────────────────────────────
┌─────────────────┐
ai.track_ai_message(...) ────→ │ AmplitudeAI │
│ - Apply privacy │
│ - Build event │
└────────┬─────────┘
▼
┌─────────────────┐
│ Amplitude │ ────→ Amplitude API
│ (your instance) │ (US or EU)
└─────────────────┘
Privacy controls (content mode, PII redaction) are applied before events leave your process. Content is never sent unfiltered and then redacted server-side.
Integration Patterns
The SDK supports three integration patterns. Pick the one that matches your architecture.
Pattern A: Single-Request Handler
Use when: Your session starts and ends in a single code path (Lambda functions, synchronous API endpoints, CLI tools).
agent = ai.agent("support-bot", env="production")
with agent.session(user_id="user-1") as s:
s.new_trace()
s.track_user_message(content="What is retention?")
ai_msg = s.track_ai_message(
content="Retention measures...",
model="gpt-4o",
provider="openai",
latency_ms=350.0,
input_tokens=50,
output_tokens=200,
)
s.score(name="helpful", value=1.0, target_id=ai_msg)
# Session auto-ends here -- track_session_end() called on __exit__
# Server-side enrichment kicks in (when content_mode="full")
The Session context manager auto-generates a session_id, handles track_session_end() on exit (even on exception), and publishes session context into Python's contextvars so provider wrappers and the OTEL bridge inherit session_id, trace_id, agent_id, and turn_id automatically.
Pattern B: Long-Lived Conversation
Use when: Your session spans multiple HTTP requests, WebSocket messages, or Slack interactions. This is the most common real-world pattern for chatbots and conversational agents.
Use BoundAgent directly, NOT the Session context manager, because the session outlives any single code path.
agent = ai.agent(
"support-bot",
user_id="user-1",
env="production",
session_id="thread-abc",
)
# --- Request 1: user sends a message ---
agent.track_user_message(content="What is retention?", trace_id="req-1", turn_id=1)
ai_msg = agent.track_ai_message(
content="Retention measures...",
model="gpt-4o",
provider="openai",
latency_ms=350.0,
trace_id="req-1",
turn_id=2,
)
# --- Request 2: user follows up ---
agent.track_user_message(content="Show me an example", trace_id="req-2", turn_id=1)
agent.track_ai_message(
content="Here's a retention chart...",
model="gpt-4o",
provider="openai",
latency_ms=200.0,
trace_id="req-2",
turn_id=2,
)
# --- Optional: trigger enrichment immediately when conversation ends ---
agent.track_session_end()
# Without this call, Amplitude auto-closes the session after 30 min of
# inactivity and runs enrichment at that point.
Key differences from Pattern A:
BoundAgentcarries all context (user_id,agent_id,session_id,env, etc.) without awithblock- Generate a new
trace_idfor each user-message-to-AI-response cycle - Pass explicit
turn_idvalues (or omit to auto-increment per session) track_session_end()is optional — it triggers enrichment immediately. Without it, sessions auto-close after 30 minutes of inactivity and enrichment runs then
Pattern C: Multi-Agent Orchestrator
Use when: Multiple agents collaborate on a task. agent.child() creates a sub-agent that inherits session context and auto-sets parent_agent_id. Use session.run_as() to automatically propagate the child agent's identity to both manual tracking calls and provider wrappers:
orchestrator = ai.agent("orchestrator", env="production")
researcher = orchestrator.child("researcher")
writer = orchestrator.child("writer")
# researcher.parent_agent_id == "orchestrator" (automatic)
with orchestrator.session(user_id="u1") as s:
s.track_user_message(content="Compare our pricing to competitors")
# Research phase — provider calls automatically tagged with agent_id='researcher'
with s.run_as(researcher) as rs:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Research competitor pricing"}],
)
rs.track_tool_call(tool_name="web_search", latency_ms=500, success=True)
# Writing phase — provider calls automatically tagged with agent_id='writer'
with s.run_as(writer) as ws:
draft = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Write summary: {response}"}],
)
# Fan in: orchestrator synthesizes results
s.track_ai_message(
content="Based on research...",
model="gpt-4o",
provider="openai",
latency_ms=500,
)
# Events emitted:
# [Agent] User Message → agent_id='orchestrator'
# [Agent] AI Response → agent_id='researcher', parent_agent_id='orchestrator'
# [Agent] Tool Call → agent_id='researcher', parent_agent_id='orchestrator'
# [Agent] AI Response → agent_id='writer', parent_agent_id='orchestrator'
# [Agent] AI Response → agent_id='orchestrator'
# [Agent] Session End → agent_id='orchestrator' (one session end, not per-child)
How run_as works:
- Shares the parent session's
session_id,trace_id, and turn counter - Overrides
agent_idandparent_agent_idincontextvarsfor the block's duration - Provider wrappers automatically read the child's identity — no explicit overrides needed
- Does not emit
[Agent] Session End(the child operates within the parent session) - Restores the parent context when the block exits, even on exceptions
- Supports nesting:
with s.run_as(child) as cs: with cs.run_as(grandchild) as gs: ... - Async variant:
async with s.arun_as(child) as cs: ...
See Multi-Agent Patterns for more examples (linear chains, fan-out/fan-in, dynamic routing).
Which API Should I Use?
Want automatic LLM call tracking?
YES --> Provider Wrappers (OpenAI, Anthropic, etc.)
+ BoundAgent for session/agent context
NO --> Manual track_*() calls
Want to track a Python function as a tool call automatically?
YES --> @tool decorator (zero boilerplate)
NO --> track_tool_call() manually
Want to track a function as a span (pipeline step, retriever, etc.)?
YES --> @observe decorator (auto session lifecycle)
NO --> track_span() manually
Session contained in one code path (Lambda, sync handler)?
YES --> agent.session() context manager (Pattern A)
NO --> BoundAgent directly + explicit track_session_end() (Pattern B)
Multiple agents collaborating?
YES --> agent.child() for sub-agents (Pattern C)
NO --> Single BoundAgent is sufficient
Going Deeper
Privacy & Content Control
Three tiers control who does the enrichment and what data leaves your environment:
| Mode | What you send | Who enriches | Best for |
|---|---|---|---|
full |
Content + metrics | Amplitude automatically classifies every session: topic models, quality rubrics, behavioral flags, outcomes | Maximum insight, zero eval code, works out of the box |
metadata_only |
Metrics only (no content) | Nobody | Strict environments where no conversation text can leave your infrastructure |
customer_enriched |
Your labels + metrics | You run your own classifiers, send structured labels via track_session_enrichment() |
Teams in regulated industries who want full analytics value and full data control |
This is a control gradient, not a quality gradient.
customer_enrichedgives the same analytics output asfull. The difference is who runs the enrichment. Infullmode, Amplitude does it for you. Incustomer_enrichedmode, you do it yourself and send structured labels. The result in your charts, cohorts, and funnels is the same.
For teams in regulated industries or with strict data residency requirements, customer_enriched is the recommended path: you get full analytics value without sending any conversation content to Amplitude.
The table below shows what analytics patterns each tier enables:
| Analytics pattern | full |
metadata_only |
customer_enriched |
|---|---|---|---|
| Cohort by topic | Yes | No | Yes (your labels) |
| Cohort by task failure | Yes | No | No |
| Cohort by quality score | Yes | No | Yes (your scores) |
| Retention by AI engagement | Yes | Yes | Yes |
| Behavioral pattern detection (retry_storm, etc.) | Yes | No | No |
| Cost analytics | Yes | Yes | Yes |
In full mode, message content is stored at full length with no truncation or size limits. See Content Storage for details.
from amplitude_ai import AmplitudeAI, AIConfig, ContentMode
# Full (default) -- raw content, server enrichments enabled
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(content_mode=ContentMode.FULL))
# Metadata only -- no content at all
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(content_mode=ContentMode.METADATA_ONLY))
# Customer enriched -- you provide your own classifications
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(content_mode=ContentMode.CUSTOMER_ENRICHED))
# PII redaction (works with any mode -- strips emails, phone numbers, credit cards, SSNs, IP addresses)
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(redact_pii=True))
Upgrading to 1.5.0 with
redact_pii=True? This release adds IPv4/IPv6 →[ip_address], international phone →[phone], and space-separated SSN →[ssn]placeholders. If any downstream pipeline or dashboard regex matches on raw IP or phone content in event properties, update those filters before upgrading.
AIConfig options (complete surface):
| Name | Type | Default | Description |
|---|---|---|---|
content_mode |
ContentMode |
FULL |
Privacy tier. See above. |
redact_pii |
bool |
True |
Scrub emails, phone numbers, credit cards, SSNs, IP addresses before sending. Set to False if you explicitly want raw content in tracked events. |
custom_redaction_patterns |
list[str | tuple[str, str]] |
[] |
Additional regex patterns to redact. Plain strings replace with [REDACTED]; (pattern, replacement) tuples use the given label. |
custom_redaction_fn |
Callable[[str], str] | None |
None |
Optional callback for custom redaction logic (e.g. Presidio NER). Called after all regex-based redaction. |
debug |
bool |
False |
Print colored one-line event summaries to stderr. See Developer Experience. |
dry_run |
bool |
False |
Validate and print events without sending to Amplitude. |
validate |
bool |
False |
Raise ValidationError on bad inputs instead of silently continuing. |
on_event_callback |
Callable |
None |
Per-event delivery callback: (event, status_code, message) -> None. |
OTEL bridge privacy (two-gate model): If you're using the OTEL GenAI Bridge, the OTEL GenAI spec marks message content as Opt-In. The SDK's content_mode setting (via AIConfig) acts as a second gate, so you control exactly what reaches Amplitude regardless of what your OTEL source captures:
| Customer intent | OTEL tool setting | Amplitude content_mode |
Result |
|---|---|---|---|
| Maximum insight | Content capture ON | full |
Content + server enrichments in Amplitude |
| No conversation text in Amplitude | Content capture ON | metadata_only |
Amplitude receives model, tokens, cost, latency (no message text) |
| No content anywhere | Content capture OFF | any | No content in the span to begin with |
| Own classifications | Content capture OFF | customer_enriched + track_session_enrichment() |
Your structured labels in Amplitude, no raw content |
Built-in patterns cover emails, US/international phone numbers, credit cards, SSNs (dashed and spaced), and IPv4/IPv6 addresses. Add custom patterns for domain-specific PII:
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(
redact_pii=True,
custom_redaction_patterns=[r"\bACCT-\d{6,}\b"],
))
Named replacements — use (pattern, replacement) tuples for descriptive labels:
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(
redact_pii=True,
custom_redaction_patterns=[
(r"\bACME-\d+\b", "[ticket_id]"),
(r"\bORD-[A-Z0-9]+\b", "[order_id]"),
],
))
Custom redaction function — plug in any external PII engine:
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(
redact_pii=True,
custom_redaction_fn=my_custom_scrubber, # (text: str) -> str
))
The function runs after all built-in and custom-pattern redaction, receives the partially-redacted text, and must return a string. If it throws an exception, the SDK logs a warning and preserves the text from prior tiers unchanged.
Recipe: Presidio for name/address detection
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()
def redact_names_addresses(text: str) -> str:
results = analyzer.analyze(text=text, language="en",
entities=["PERSON", "LOCATION"])
return anonymizer.anonymize(text=text, analyzer_results=results).text
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(
redact_pii=True,
custom_redaction_fn=redact_names_addresses,
))
Bound Agents & Sessions
Bound Agents. ai.agent() creates a pre-configured handle that carries context fields so you never repeat them:
agent = ai.agent(
"support-bot",
description="Handles customer support queries via OpenAI GPT-4o",
user_id="user-1",
agent_version="v4.2",
env="production",
context={
"experiment_variant": "prompt-v2-treatment",
"prompt_revision": "abc123",
},
)
# Every call inherits user_id, agent_id, description, agent_version, env, context
msg = agent.track_user_message(content="How do I set up a funnel?", session_id="s1")
ai_msg = agent.track_ai_message(
content="To create a funnel...",
session_id="s1",
model="gpt-4o",
provider="openai",
latency_ms=450.0,
input_tokens=120,
output_tokens=340,
)
agent.score(name="user-feedback", value=1.0, target_id=ai_msg)
Explicit kwargs always override bound defaults:
# Uses "override-agent" for this call only, not "support-bot"
agent.track_ai_message(agent_id="override-agent", ...)
Child Agents. For multi-agent orchestration, child() creates a new handle that inherits env, session_id, trace_id, and groups from the parent. It automatically sets parent_agent_id:
orchestrator = ai.agent("orchestrator", env="production")
researcher = orchestrator.child("researcher")
# researcher.agent_id = "researcher"
# researcher.parent_agent_id = "orchestrator" (automatic)
# researcher.env = "production" (inherited)
executor = researcher.child("executor")
# executor.parent_agent_id = "researcher" (chains correctly)
Session lifecycle and enrichment. You do not need to call track_session_end() for sessions to work. Amplitude's server automatically closes sessions after 30 minutes of inactivity and queues them for enrichment (topic classification, quality scoring, session evaluation) at that point. The only reason to call track_session_end() is to trigger enrichment sooner — for example, if you know the conversation is over and want evaluation results immediately rather than waiting for the idle timeout.
"Closed" is a server-side concept meaning "queued for enrichment" — it does not prevent new events from flowing into the same session. If the user resumes a conversation after session end, new messages with the same session_id are still associated with that session. The SDK has no local "closed" state.
If you use a Session context manager (with agent.session(...) as s:), track_session_end() is called automatically when the with block exits. For long-lived conversations (chatbots, support agents), you can skip explicit session end entirely and let the server handle it.
Session Context Manager. agent.session() returns a context manager that auto-calls track_session_end when the block exits (even on exception):
agent = ai.agent("support-bot", env="prod")
with agent.session(user_id="u1") as s:
s.new_trace() # auto-generate trace_id (UUID)
msg = s.track_user_message(content="How do I set up a funnel?")
ai_msg = s.track_ai_message(
content="To create a funnel...",
model="gpt-4o",
provider="openai",
latency_ms=450.0,
)
s.score(name="user-feedback", value=1.0, target_id=ai_msg)
# session auto-ended here -- track_session_end("sess-1") called automatically
Switch traces mid-session, set enrichments, auto-generate session IDs:
with agent.session() as s: # auto-generated UUID session_id
t1 = s.new_trace()
s.track_user_message(content="First question")
s.track_ai_message(content="Answer 1", model="gpt-4o", provider="openai", latency_ms=200)
t2 = s.new_trace() # new trace for a follow-up
s.track_user_message(content="Follow-up question")
s.track_ai_message(content="Answer 2", model="gpt-4o", provider="openai", latency_ms=150)
s.set_enrichments(SessionEnrichments(overall_outcome="response_provided"))
# session auto-ended with enrichments
Works with async too:
async with agent.session("sess-1") as s:
s.new_trace()
...
How it works: Session.__enter__() publishes a ContextVar with the active session/agent context. Provider wrappers and the OTEL bridge read this ContextVar and auto-fill any missing fields. Session.__exit__() restores the previous context. This is the same pattern used by OpenTelemetry and works correctly with threads and asyncio.
Context Dict Conventions
The context parameter on ai.agent() accepts an arbitrary dict[str, Any] that is JSON-serialized and attached to every event as [Agent] Context. This is the recommended way to add segmentation dimensions without requiring new global properties.
Recommended keys:
| Key | Example Values | Use Case |
|---|---|---|
agent_type |
"planner", "executor", "retriever", "router", "evaluator" |
Filter/group analytics by agent role in multi-agent systems. Build charts like "latency by agent type" or "error rate by agent role." |
experiment_variant |
"control", "treatment-v2", "prompt-rewrite-a" |
Segment AI sessions by A/B test variant. Compare quality scores, abandonment rates, or cost across experiment arms. See note below. |
feature_flag |
"new-rag-pipeline", "reasoning-model-enabled" |
Track which feature flags were active during the session. Correlate flag states with quality regressions. |
surface |
"chat", "search", "copilot", "email-draft" |
Identify which UI surface or product area triggered the AI interaction. Build per-surface quality dashboards. |
prompt_revision |
"v7", "abc123", "2026-02-15" |
Track which prompt version was used. Detect prompt regression when combined with agent_version. |
deployment_region |
"us-east-1", "eu-west-1" |
Segment by deployment region for latency analysis or compliance tracking. |
canary_group |
"canary", "stable" |
Identify canary vs. stable deployments for progressive rollout monitoring. |
Example:
agent = ai.agent(
"support-bot",
user_id="u1",
agent_version="4.2.0",
context={
"agent_type": "executor",
"experiment_variant": "reasoning-enabled",
"surface": "chat",
"feature_flag": "new-rag-pipeline",
},
)
# All events from this agent (and its sessions, child agents, and provider
# wrappers) will include [Agent] Context with these keys.
Context merging in child agents:
parent = ai.agent("orchestrator", context={"experiment_variant": "treatment", "surface": "chat"})
child = parent.child("researcher", context={"agent_type": "retriever"})
# child.context == {"experiment_variant": "treatment", "surface": "chat", "agent_type": "retriever"}
# Child keys override parent keys; parent keys absent from the child are preserved.
Querying in Amplitude: The [Agent] Context property is a JSON string. Use Amplitude's JSON property parsing to extract individual keys for charts, cohorts, and funnels. For example, group by [Agent] Context.agent_type to see metrics by agent role.
Note on
experiment_variantand server-generated events: Context keys appear on all SDK-emitted events ([Agent] User Message,[Agent] AI Response, etc.). Server-generated events ([Agent] Session Evaluation,[Agent] Scorewithsource="ai") do not yet inherit context keys. To segment server-generated quality scores by experiment arm, use Amplitude Derived Properties to extract from[Agent] Contexton SDK events. First-class support is planned.
Why a dict instead of first-class fields? Context is a dict for flexibility without schema migrations. Adding a new segmentation dimension takes one line of code, not a data catalog update. First-class properties exist for universal, stable dimensions (
agent_id,description,agent_version,env). The context dict exists for customer-specific, evolving dimensions (experiment_variant,feature_flags,prompt_revision). Adding dedicated event properties for each dimension would consume global property slots, which are limited per organization. If usage patterns converge and the Amplitude product builds dedicated chart support for specific keys, they can be promoted to first-class fields later.
Multi-Agent Patterns
The SDK supports multi-agent orchestration via BoundAgent.child() and parent_agent_id. Here are common patterns:
Pattern 1: Linear delegation chain
A simple pipeline where each agent hands off to the next:
orchestrator = ai.agent("orchestrator", env="production")
with orchestrator.session(user_id="u1") as s:
# Orchestrator decides to delegate to researcher
researcher = orchestrator.child("researcher")
with researcher.session(session_id=s.session_id) as rs:
rs.track_user_message(content="Find pricing info")
rs.track_ai_message(content="Found 3 articles...", model="gpt-4o",
provider="openai", latency_ms=200)
# Researcher done, orchestrator delegates to writer
writer = orchestrator.child("writer")
with writer.session(session_id=s.session_id) as ws:
ws.track_ai_message(content="Here is a summary...", model="gpt-4o",
provider="openai", latency_ms=300)
# Events show: orchestrator -> researcher -> writer
# Each agent's events carry its own agent_id and parent_agent_id
Pattern 2: Fan-out / fan-in
An orchestrator dispatches multiple sub-agents in parallel:
orchestrator = ai.agent("orchestrator", context={"agent_type": "router"})
with orchestrator.session(user_id="u1") as s:
s.track_user_message(content="Compare our pricing to competitors")
# Fan out to parallel agents
researcher_a = orchestrator.child("researcher-web", context={"agent_type": "retriever"})
researcher_b = orchestrator.child("researcher-db", context={"agent_type": "retriever"})
# Both share the same session and parent_agent_id="orchestrator"
# Run in parallel (via asyncio, threads, etc.)
# ...
# Fan in: orchestrator synthesizes results
s.track_ai_message(content="Based on research...", model="gpt-4o",
provider="openai", latency_ms=500)
Pattern 3: Dynamic routing
A router agent selects from a pool of specialist agents at runtime:
router = ai.agent("router", context={"agent_type": "router"})
with router.session(user_id="u1") as s:
user_msg = s.track_user_message(content="I need a refund")
# Router decides based on intent
intent = classify_intent(user_msg)
specialist = router.child(f"specialist-{intent}", context={"agent_type": "executor"})
with specialist.session(session_id=s.session_id) as ss:
ss.track_ai_message(content="I can help with your refund...",
model="gpt-4o", provider="openai", latency_ms=400)
Analytics this enables:
- Per-agent quality scores: Filter
[Agent] Scoreby[Agent] Agent IDto see which agents produce high-quality responses and which don't, across user feedback, automated evals, and server-generated rubric scores. - Cost attribution: Group cost by
[Agent] Agent IDto see which sub-agent is expensive relative to its quality contribution. Find the agent that accounts for 60% of token spend but only 20% of task completions. - Failure attribution: When a multi-agent chain produces a bad outcome, per-agent quality scores help identify which agent introduced the failure. Filter
[Agent] Session Evaluationsessions wherehas_task_failure=True, then drill into individual agent scores. - Handoff analysis: Build funnels across agent boundaries using
[Agent] Parent Agent ID: "orchestrator dispatches → researcher completes → writer delivers." Measure conversion and drop-off at each handoff. - Role-based dashboards: Use
[Agent] Context.agent_type(see Context Dict Conventions) to compare latency, error rate, and cost across agent roles (router, retriever, executor).
Event Linking
Message, tool, and span tracking calls return a unique ID. Use these IDs to wire events into a graph:
| Method | Returns | ID Name |
|---|---|---|
track_user_message() |
str |
message_id |
track_ai_message() |
str |
message_id |
track_tool_call() |
str |
invocation_id |
track_embedding() |
str |
span_id |
track_span() |
str |
span_id |
track_session_end() |
None |
— |
track_session_enrichment() |
None |
— |
score() |
None |
— |
Link events together:
agent = ai.agent("support-bot", env="prod")
with agent.session(user_id="u1") as s:
s.new_trace()
# 1. User asks a question
msg = s.track_user_message(content="Explain funnels")
# 2. AI decides to call a tool -- link to the user message
tool_inv = s.track_tool_call(
tool_name="search_docs",
latency_ms=85.0,
success=True,
parent_message_id=msg, # ← links tool call to the user message
)
# 3. AI responds
ai_msg = s.track_ai_message(
content="A funnel measures conversion...",
model="gpt-4o",
provider="openai",
latency_ms=450.0,
)
# 4. Score the AI response
s.score(name="user-feedback", value=1.0, target_id=ai_msg) # ← links score to AI response
# 5. Nested spans for pipeline operations
parent_span = s.track_span(span_name="rag_pipeline", latency_ms=200.0)
child_span = s.track_span(
span_name="vector_search",
latency_ms=50.0,
parent_span_id=parent_span, # ← links span to parent
)
Scoring
Attach quality signals to any message or session. Covers user feedback, AI evals, and human reviews. Use source to distinguish origin ("user", "ai", "reviewer").
# User feedback (thumbs up/down on a specific response)
ai.score(user_id="user-1", name="user-feedback", value=1.0,
target_id=ai_msg_id, target_type="message", source="user")
# Automated evaluation (LLM-as-judge)
ai.score(user_id="user-1", name="accuracy", value=0.92,
target_id=ai_msg_id, source="ai", comment="Matches ground truth")
# Human review (internal review queue, RLHF labeling)
ai.score(user_id="reviewer-1", name="groundedness", value=0.8,
target_id=ai_msg_id, source="reviewer", comment="Minor hallucination in step 3")
# Session-level rating
ai.score(user_id="user-1", name="csat", value=4.0,
target_id="sess-1", target_type="session", source="user")
Common scoring patterns:
| Use Case | Example |
|---|---|
| User thumbs up/down | score(name="user-feedback", value=1, target_type="message", source="user") |
| Star rating (1-5) | score(name="user-rating", value=4, target_type="message", source="user") |
| LLM-as-judge eval | score(name="accuracy", value=0.92, target_type="message", source="ai") |
| Human reviewer | score(name="quality", value=0.8, target_type="message", source="reviewer") |
| Session-level CSAT | score(name="csat", value=4, target_type="session", source="user") |
| Server rubric score | Emitted automatically by enrichment pipeline with source="ai" for each configured rubric |
Each score() produces a [Agent] Score event. The server enrichment pipeline also emits [Agent] Score events with source="ai" for each configured rubric. User feedback, AI evals, and server-generated rubric scores all share the same event type, enabling unified queries across all quality signals in a single chart.
All quality signals in one event type. User feedback (
source="user"), human reviewer annotations (source="reviewer"), and automated rubric scores from the enrichment pipeline (source="ai") all produce[Agent] Scoreevents. A single chart shows all three side by side. No joins, no separate tables. Filter by[Agent] Evaluation Sourceto compare signal types. Filter by[Agent] Agent IDfor per-agent quality attribution.
Labeling and Tagging Messages
Attach custom key-value labels to any message event for filtering and segmentation in Amplitude. Labels are flexible; use whatever keys make sense for your product.
Common use cases:
- Routing tags:
flow,surface,experiment_variant. Segment by where the message originated. - Classifier output:
intent,sentiment,toxicity. Attach ML classifier results with confidence scores. - Business context:
tier,plan,feature_area. Slice by customer attributes.
Inline Labels (at tracking time)
Pass labels when you already know the tags at tracking time:
from amplitude_ai import MessageLabel
# Custom tags -- no confidence needed
msg_id = ai.track_user_message(
user_id="user-1",
content="How do I create a funnel?",
session_id="sess-1",
labels=[
MessageLabel(key="flow", value="onboarding"),
MessageLabel(key="surface", value="chat_widget"),
MessageLabel(key="experiment", value="new_prompt_v2"),
],
)
# Classifier output -- include confidence scores
ai_msg_id = ai.track_ai_message(
user_id="user-1",
content="To create a funnel, go to...",
session_id="sess-1",
model="gpt-4o",
provider="openai",
latency_ms=300.0,
labels=[
MessageLabel(key="intent", value="how_to", confidence=0.94),
MessageLabel(key="sentiment", value="neutral", confidence=0.88),
],
)
Labels are emitted as [Agent] Message Labels on the event. In Amplitude, filter or group by label key/value to build charts like "messages by intent" or "sessions where flow=onboarding".
Retrospective Labels (after the session)
When classifier results arrive after the session ends (e.g., from a background pipeline), attach them via SessionEnrichments.message_labels, keyed by the message_id returned from tracking calls:
from amplitude_ai import SessionEnrichments, MessageLabel
enrichments = SessionEnrichments(
message_labels={
msg_id: [
MessageLabel(key="intent", value="how_to", confidence=0.94),
],
ai_msg_id: [
MessageLabel(key="quality", value="good", confidence=0.91),
],
},
)
ai.track_session_enrichment(user_id="user-1", session_id="sess-1", enrichments=enrichments)
Enrichments
Session enrichments attach structured classifications to a completed session: topic categories, rubric scores, outcome labels, and behavioral flags. They work differently depending on your privacy configuration:
When content_mode is "full", Amplitude's enrichment pipeline runs automatically on every session after it closes. You get topic classifications, rubric scores, behavioral flags, and session outcomes without writing or maintaining any eval code. The pipeline classifies sessions across configurable dimensions:
| Category | Description | Configurable |
|---|---|---|
| Quality Scores | Task completion, response quality, user satisfaction, agent confusion (0-1 scores with rationales) | Rubrics customizable per org |
| Safety | Toxicity detection, prompt injection detection, content policy violations | Custom policies per org |
| Emotions | User emotion classification with trajectory tracking | Custom emotion taxonomy per org |
| Dialog Acts | Conversation patterns: complaints, requests, apologies, completions | Default taxonomy provided |
| Behavioral Patterns | Anti-patterns: retry storms, clarification loops, early abandonment | Fixed taxonomy |
Three event types are produced per session:
| Event | What It Contains | Cardinality |
|---|---|---|
[Agent] Session Evaluation |
Session summary: outcome, turn count, boolean flags (has_task_failure, has_negative_feedback), metadata |
1 per session |
[Agent] Topic Classification |
Category label per topic model (e.g., query_intent, product_area, error_domain) |
1 per topic model per session |
[Agent] Score (ai) |
Rubric score with rationale (e.g., task_completion: 0.85), source="ai" |
1 per rubric per session |
Configurability: Topic models, rubric definitions, safety policies, and emotion taxonomies are configurable per organization. The categories in the table above are defaults, not fixed. Contact your Amplitude team to customize which dimensions are evaluated and what category values are used.
When do enrichments run? Enrichment runs asynchronously after the session closes, not inline with your SDK calls. A session closes when you call track_session_end(), or after 30 minutes of inactivity if you don't. Enrichment events typically appear within minutes of session close. Calling track_session_end() explicitly is recommended because it ensures timely enrichment and lets you attach SessionEnrichments in the same call.
When content_mode is "metadata_only" or "customer_enriched", server-side enrichment is not available (the pipeline needs raw text to classify content). Use customer_enriched with track_session_enrichment() to bridge this gap: run your own classifier in your environment, then send structured labels (topics, rubric scores, outcomes) to Amplitude. No raw content leaves your environment, but you get the same analytics power (cohorts, funnels, retention segmented by session quality) as customers using full mode. This is how you get full analytics value without sending content to Amplitude.
Defining Your Taxonomy
The topic model names, rubric names, and category values are yours to define. The examples below use values from Amplitude's internal taxonomy as a reference, but you should use whatever categories make sense for your product and agents.
from amplitude_ai import (
SessionEnrichments, TopicClassification, RubricScore,
EvidenceQuote, MessageLabel,
)
enrichments = SessionEnrichments(
# Topic models -- categorical labels for your sessions
topic_classifications={
# Single-select: what was the user trying to do?
"query_intent": TopicClassification(l1="quantitative_diagnostic"),
# Multi-select: which product areas were involved?
"product_area": TopicClassification(
values=["charts", "cohorts"], primary="charts",
topics_covered=["charts", "cohorts", "funnels"],
outcomes_by_topic={"charts": "response_provided", "funnels": "abandoned"},
),
# Subcategories for finer classification
"error_domain": TopicClassification(l1="TAX", subcategories=["WRONG_EVENT"]),
},
# Rubrics -- scored evaluation dimensions (0.0 to 1.0)
rubrics=[
RubricScore(name="task_completion", score=0.85),
RubricScore(
name="response_quality", score=0.92,
rationale="Clear and accurate",
evidence=[
EvidenceQuote(quote="Here is how to build a funnel...", turn_index=2, role="assistant"),
],
improvement_opportunities="Could include a screenshot link",
),
],
# Session outcome
overall_outcome="response_provided", # or "abandoned", "escalated", etc.
# Session-level scores
quality_score=0.88,
sentiment_score=0.75,
# Boolean flags for quick filtering
has_task_failure=False,
has_negative_feedback=False,
# Failure detail (when has_task_failure=True)
# task_failure_type="unable_to_complete",
# task_failure_reason="Data source not connected",
# Agent chain metadata (multi-agent flows)
agent_chain=["router", "analytics-agent"],
root_agent_name="router",
# Request classification
request_complexity="moderate",
# Supplementary data
error_categories=["timeout"],
behavioral_patterns=["multi_turn_refinement"],
custom_metadata={"deployment": "canary-v2"},
# Retrospective message labels (keyed by message_id)
message_labels={
"msg-uuid-1": [MessageLabel(key="intent", value="how_to", confidence=0.94)],
"msg-uuid-2": [MessageLabel(key="quality", value="good", confidence=0.91)],
},
)
# Attach when ending a session
ai.track_session_end(user_id="user-1", session_id="sess-1", enrichments=enrichments)
# Or send enrichments at any time from a background pipeline
ai.track_session_enrichment(user_id="user-1", session_id="sess-1", enrichments=enrichments)
# Note: each call creates a separate [Agent] Session Enrichment event (not an overwrite).
# Call multiple times for streaming enrichment -- e.g., topics first, then rubric scores later.
SessionEnrichments Dataclass
The SessionEnrichments dataclass uses the same vocabulary as Amplitude's enrichment taxonomy framework: topic models for categorical classification and rubrics for scored evaluation. This ensures [Agent] Session Enrichment events from the SDK have property naming consistent with the server-side [Agent] Session Evaluation, [Agent] Topic Classification, and [Agent] Score events.
@dataclass
class MessageLabel:
"""A key-value label attached to a message event."""
key: str # e.g., "intent", "flow", "sentiment"
value: str # e.g., "how_to", "onboarding", "neutral"
confidence: float | None = None # Optional 0.0-1.0
@dataclass
class EvidenceQuote:
"""A quoted excerpt from the conversation supporting a rubric score."""
quote: str # The quoted text
turn_index: int # 0-based position in conversation
role: str | None = None # "user", "assistant", "tool"
@dataclass
class TopicClassification:
"""Result of classifying a session for a single topic model."""
l1: str | None = None # Single-select mode (MECE) — e.g., "quantitative_diagnostic"
values: list[str] | None = None # Multi-select mode — e.g., ["charts", "cohorts"]
primary: str | None = None # Primary value in multi-select — e.g., "charts"
l2: str | None = None # Deprecated — use subcategories instead
subcategories: list[str] | None = None # Subcategory codes — e.g., ["WRONG_EVENT"]
topics_covered: list[str] | None = None # All topics discussed
outcomes_by_topic: dict[str, str] | None = None # Outcome per topic
@dataclass
class RubricScore:
"""Result of scoring a session on a single rubric."""
name: str # e.g., "task_completion", "response_quality"
score: float # 0.0-1.0
rationale: str | None = None # Optional explanation
evidence: list[EvidenceQuote] | None = None # Supporting quotes
improvement_opportunities: str | None = None # Suggested improvements
@dataclass
class SessionEnrichments:
# Topic models — categorical classification per topic model
topic_classifications: dict[str, TopicClassification] | None = None
# Rubrics — scored evaluation dimensions
rubrics: list[RubricScore] | None = None
# Outcome
overall_outcome: str | None = None # "response_provided", "abandoned", etc.
# Session-level scores
quality_score: float | None = None # 0.0-1.0
sentiment_score: float | None = None # 0.0-1.0
# Boolean flags
has_task_failure: bool = False
has_negative_feedback: bool = False
has_data_quality_issues: bool = False
has_technical_failure: bool = False
# Failure detail
task_failure_type: str | None = None # e.g., "unable_to_complete"
task_failure_reason: str | None = None # Free-text explanation
# Feedback and error detail
negative_feedback_phrases: list[str] | None = None
data_quality_issues: list[str] | None = None
technical_error_count: int | None = None
# Agent chain metadata
agent_chain: list[str] | None = None # Ordered agent delegation chain
root_agent_name: str | None = None # Entry-point agent
# Request classification
request_complexity: str | None = None # "simple", "moderate", "complex", "ambiguous"
# Supplementary data
error_categories: list[str] | None = None
behavioral_patterns: list[str] | None = None
custom_metadata: dict[str, Any] | None = None # Arbitrary customer-defined metadata
schema_version: str = "2.0"
# Retrospective message labels (keyed by message_id)
message_labels: dict[str, list[MessageLabel]] | None = None
TopicClassification Fields
Topics classify sessions along a dimension you define. Use l1 for single-select (one category per session) or values + primary for multi-select (session touches multiple areas):
| Field | Type | Description |
|---|---|---|
l1 |
str |
Single-select category. E.g., "quantitative_diagnostic", "help_guidance", "artifact_creation". |
values |
list[str] |
Multi-select categories. E.g., ["charts", "cohorts", "experiments"]. |
primary |
str |
Primary value in multi-select. Must be one of values. |
subcategories |
list[str] |
Subcategory codes for finer classification. E.g., ["WRONG_EVENT"], ["HALLUCINATION"]. |
l2 |
str |
Deprecated. Use subcategories instead. Scalar shorthand kept for backward compatibility. |
topics_covered |
list[str] |
All topics discussed in multi-topic sessions. |
outcomes_by_topic |
dict[str, str] |
Outcome per topic. E.g., {"charts": "response_provided", "funnels": "abandoned"}. |
RubricScore Fields
Rubrics are scored evaluation dimensions. Define whatever rubrics matter for your use case:
| Field | Type | Required | Description |
|---|---|---|---|
name |
str |
Yes | Rubric name, e.g., "task_completion", "helpfulness", "safety", "groundedness". |
score |
float |
Yes | 0.0 to 1.0. |
rationale |
str |
No | Explanation for the score. Useful for debugging and auditing. |
evidence |
list[EvidenceQuote] |
No | Quoted excerpts from the conversation supporting this score. |
improvement_opportunities |
str |
No | Suggested improvements based on this evaluation. |
MessageLabel Fields
Labels are flexible key-value pairs for filtering and segmentation:
| Field | Type | Required | Description |
|---|---|---|---|
key |
str |
Yes | Label key, e.g., "intent", "flow", "sentiment", "experiment". |
value |
str |
Yes | Label value, e.g., "how_to", "onboarding", "neutral". |
confidence |
float |
No | Confidence score (0.0 to 1.0) when the label comes from a classifier. |
EvidenceQuote Fields
Quoted excerpts that support rubric scores:
| Field | Type | Required | Description |
|---|---|---|---|
quote |
str |
Yes | The quoted text from the conversation. |
turn_index |
int |
Yes | 0-based position in the conversation. |
role |
str |
No | Role of the speaker ("user", "assistant", "tool"). |
Complete customer_enriched Example
End-to-end example for teams running their own classifiers. No conversation content leaves your environment; only structured labels:
from amplitude import Amplitude
from amplitude_ai import (
AIConfig, AmplitudeAI, ContentMode, SessionEnrichments, TopicClassification,
RubricScore, EvidenceQuote, MessageLabel,
)
amplitude = Amplitude("YOUR_API_KEY")
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(content_mode=ContentMode.CUSTOMER_ENRICHED))
agent = ai.agent(agent_id="support-agent")
with agent.session(user_id="user-1") as s:
msg_id = s.track_user_message(content="How do I create a funnel?")
ai_msg_id = s.track_ai_message(
content="To create a funnel...",
model="gpt-4o", provider="openai", latency_ms=450.0,
)
# After session: run your classifiers, then send structured labels
enrichments = SessionEnrichments(
topic_classifications={
"query_intent": TopicClassification(l1="how_to"),
},
rubrics=[
RubricScore(
name="task_completion", score=0.85,
evidence=[EvidenceQuote(quote="To create a funnel...", turn_index=1, role="assistant")],
),
],
overall_outcome="response_provided",
quality_score=0.85,
request_complexity="simple",
message_labels={
msg_id: [MessageLabel(key="intent", value="how_to", confidence=0.94)],
ai_msg_id: [MessageLabel(key="quality", value="good", confidence=0.91)],
},
)
ai.track_session_enrichment(user_id="user-1", session_id=s.session_id, enrichments=enrichments)
How Scores and Enrichments Relate
Scores (score()) and enrichments (track_session_enrichment()) coexist and serve different purposes:
| Concern | score() |
track_session_enrichment() |
|---|---|---|
| Purpose | Rate a specific message or session | Classify a session holistically |
| Granularity | Message-level or session-level | Session-level only |
| Data shape | Single name/value pair per call | Structured batch: topics + rubrics + outcomes + flags |
| Source tracking | Yes (user, ai, reviewer) |
No (assumed system/customer) |
| Primary use | User feedback, automated evals, human annotations | content_mode="customer_enriched" flow; background pipelines |
| Categorical data | No (numeric only) | Yes (topic classifications, outcomes, behavioral patterns) |
Use enrichments for comprehensive session classification (topic models + rubrics + outcomes in one batch). Use score() for individual quality signals, especially from end-users or at the message level.
Provider Wrappers
Drop-in replacements that automatically track every LLM call, including reasoning content from thinking models, system prompts, and model configuration (temperature, top_p, max_tokens, streaming mode).
Feature coverage by provider:
| Feature | OpenAI | AsyncOpenAI | Anthropic | Gemini | AzureOpenAI | Bedrock | Mistral |
|---|---|---|---|---|---|---|---|
| Streaming | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Tool call tracking | Yes | Yes | Yes | No | Yes | Yes | No |
| TTFB measurement | Yes | Yes | Yes | No | Yes | No | No |
| Cache token stats | Yes | Yes | Yes | No | No | No | No |
| Reasoning content | Yes | Yes | Yes | No | Yes | No | No |
| System prompt capture | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| Cost estimation | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
OpenAI
from amplitude_ai import OpenAI
client = OpenAI(amplitude=amplitude, api_key="sk-...")
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is retention?"}],
amplitude_user_id="user-1",
)
# [Agent] User Message + [Agent] AI Response tracked automatically
AsyncOpenAI
For async frameworks (FastAPI, pydantic_ai, etc.) that use openai.AsyncOpenAI:
from amplitude_ai import AsyncOpenAI
client = AsyncOpenAI(amplitude=amplitude, api_key="sk-...")
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is retention?"}],
amplitude_user_id="user-1",
)
Same feature coverage as the synchronous OpenAI wrapper — streaming, tool calls, reasoning, TTFB, cost estimation.
Anthropic
from amplitude_ai import Anthropic
client = Anthropic(amplitude=amplitude, api_key="sk-ant-...")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain funnels"}],
amplitude_user_id="user-1",
)
Google Gemini
from amplitude_ai import Gemini
client = Gemini(amplitude=amplitude, api_key="...", model_name="gemini-2.0-flash")
response = client.generate_content("What are cohorts?", amplitude_user_id="user-1")
Azure OpenAI
from amplitude_ai import AzureOpenAI
client = AzureOpenAI(
amplitude=amplitude,
azure_endpoint="https://your-resource.openai.azure.com",
api_key="...",
api_version="2024-02-01",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is retention?"}],
amplitude_user_id="user-1",
)
AWS Bedrock
from amplitude_ai import Bedrock
client = Bedrock(amplitude=amplitude, region_name="us-east-1")
response = client.converse(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
messages=[{"role": "user", "content": [{"text": "Explain funnels"}]}],
amplitude_user_id="user-1",
)
Mistral
from amplitude_ai import Mistral
client = Mistral(amplitude=amplitude, api_key="...")
response = client.chat.complete(
model="mistral-large-latest",
messages=[{"role": "user", "content": "What are cohorts?"}],
amplitude_user_id="user-1",
)
LangChain
from amplitude_ai import create_amplitude_callback
callback = create_amplitude_callback(amplitude=amplitude, user_id="user-1")
# Pass as callback to any LangChain chain or agent
LlamaIndex
from amplitude_ai import AmplitudeLlamaIndexHandler
handler = AmplitudeLlamaIndexHandler(amplitude=amplitude, user_id="user-1")
# Set as the global callback handler or pass to individual components
OpenAI Agents SDK
Tracing processor that plugs into the OpenAI Agents SDK's tracing system. Maps GenerationSpanData to [Agent] AI Response, FunctionSpanData to [Agent] Tool Call, and agent/handoff/guardrail spans to [Agent] Span.
pip install "amplitude-ai[openai-agents]"
from agents import Agent, Runner, RunConfig
from amplitude import Amplitude
from amplitude_ai.integrations.openai_agents import AmplitudeTracingProcessor
amplitude = Amplitude("YOUR_API_KEY")
processor = AmplitudeTracingProcessor(
amplitude=amplitude,
user_id="user-1",
agent_id="my-agent",
env="production",
)
agent = Agent(name="support-bot", instructions="You are a helpful assistant.")
result = Runner.run_sync(
agent,
"What is retention?",
run_config=RunConfig(tracing_processors=[processor]),
)
# All generations, tool calls, handoffs, and guardrail checks tracked automatically
Anthropic Tool Use Loop
Managed multi-turn tool_use loop that handles the Anthropic agentic pattern of repeated tool_use -> tool_result cycles. Tracks every turn automatically.
from anthropic import Anthropic
from amplitude import Amplitude
from amplitude_ai.integrations.anthropic_tools import AmplitudeToolLoop
amplitude = Amplitude("YOUR_API_KEY")
client = Anthropic()
loop = AmplitudeToolLoop(
amplitude=amplitude,
client=client,
user_id="user-1",
tool_handlers={
"get_weather": lambda city: f"72°F in {city}",
"search": lambda query: f"Results for {query}",
},
)
result = loop.run(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "What's the weather in SF?"}],
tools=[{"name": "get_weather", "description": "Get weather", "input_schema": {...}}],
)
# Each turn emits [Agent] AI Response + [Agent] Tool Call events
# Loop stops when model returns stop_reason != "tool_use"
Supports async via await loop.arun(...) with anthropic.AsyncAnthropic.
CrewAI
Event listener hooks that capture CrewAI's LLM calls and tool usage across all agents in a crew.
pip install "amplitude-ai[crewai]"
from crewai import Crew, Agent, Task
from amplitude import Amplitude
from amplitude_ai.integrations.crewai import AmplitudeCrewAIHooks
amplitude = Amplitude("YOUR_API_KEY")
with AmplitudeCrewAIHooks(amplitude=amplitude, user_id="user-1") as hooks:
researcher = Agent(role="Researcher", goal="Find information", ...)
writer = Agent(role="Writer", goal="Write content", ...)
crew = Crew(agents=[researcher, writer], tasks=[...])
result = crew.kickoff()
# All LLM calls and tool invocations across all agents tracked automatically
# Agent roles are captured as agent_id when no explicit agent_id is set
Reasoning Extraction
Provider wrappers auto-extract reasoning content from each provider's native response format:
| Provider | Extraction Method |
|---|---|
| OpenAI (o1, o3, etc.) | response.choices[0].message.reasoning_content |
| Anthropic (extended thinking) | Filter response.content for blocks with type == "thinking", concatenate text |
| Google Gemini | Extract thinking parts from response |
| Mistral | choice.message.reasoning_content or typed content blocks |
| AWS Bedrock | Reasoning blocks from Bedrock Converse API response |
OTEL GenAI Bridge Reference
The AmplitudeAgentExporter consumes any OTEL GenAI semantic convention spans and maps them to Amplitude [Agent] events. Works with any tool that emits standard OTEL GenAI spans:
- OpenLIT
- Traceloop / OpenLLMetry
- OpenAI Python SDK (with OTEL instrumentation enabled)
- Any manual OpenTelemetry instrumentation following the GenAI semantic conventions
Note on Langfuse: Langfuse v3+ uses OTEL internally for transport and can receive OTEL traces as a backend. However, Langfuse's own SDK integrations (100+) use proprietary APIs, not standard OTEL GenAI spans. If you use an OTEL-native instrumentation library (OpenLIT, Traceloop) alongside Langfuse, the same GenAI spans can flow to both Langfuse and Amplitude simultaneously.
Attribute mapping: OTEL GenAI semantic conventions to Amplitude properties:
| OTEL GenAI Attribute | Amplitude Property | Notes |
|---|---|---|
gen_ai.operation.name |
(event routing) | chat/text_completion/generate_content -> User Message + AI Response; embeddings -> Embedding; execute_tool -> Tool Call; invoke_agent/create_agent -> AI Response with agent metadata |
gen_ai.response.model |
[Agent] Model Name |
Preferred; often contains the versioned name (e.g., gpt-4o-2024-11-20) |
gen_ai.request.model |
[Agent] Model Name |
Fallback when response model is absent |
gen_ai.provider.name |
[Agent] Provider |
|
gen_ai.usage.input_tokens |
[Agent] Input Tokens |
|
gen_ai.usage.output_tokens |
[Agent] Output Tokens |
|
| (computed) input + output | [Agent] Total Tokens |
Auto-summed when both present |
gen_ai.response.finish_reasons |
[Agent] Finish Reason |
First element of array |
gen_ai.input.messages |
User message content | Last role=user message; respects privacy_config |
gen_ai.output.messages |
AI response content | First output message; respects privacy_config |
gen_ai.system_instructions |
[Agent] System Prompt |
Respects privacy_config |
gen_ai.request.temperature |
[Agent] Temperature |
|
gen_ai.request.max_tokens |
[Agent] Max Output Tokens |
|
gen_ai.request.top_p |
[Agent] Top P |
|
gen_ai.conversation.id |
[Agent] Session ID |
Fallback when no ContextVar session active |
gen_ai.agent.id |
[Agent] Agent ID |
invoke_agent/create_agent operations |
gen_ai.agent.name |
[Agent] Agent Name |
invoke_agent/create_agent operations |
gen_ai.tool.name |
Tool name | execute_tool operations; falls back to span name parsing |
gen_ai.embeddings.dimension.count |
[Agent] Embedding Dimensions |
embeddings operations only |
error.type |
[Agent] Is Error + [Agent] Error Message |
|
Span trace_id (hex) |
[Agent] Trace ID |
Fallback when no ContextVar trace_id active |
| Span duration (ns) | [Agent] Latency Ms |
Computed from span timestamps |
| (computed) model + tokens | [Agent] Cost USD |
Auto-calculated via built-in genai-prices database |
enduser.id |
user_id |
Fallback when no SessionContext or default user_id |
Not mapped by the bridge (use native SDK provider wrappers for these): cache tokens, reasoning content/tokens, TTFB, streaming detection, event graph linking (parent_message_id).
Scope filtering: control which spans reach Amplitude:
# Only process spans from specific instrumentation scopes
exporter = AmplitudeAgentExporter(
amplitude=amplitude,
user_id="user-123",
allowed_scopes={"langfuse-sdk", "openai"},
)
# Or block infrastructure scopes
exporter = AmplitudeAgentExporter(
amplitude=amplitude,
user_id="user-123",
blocked_scopes={"fastapi", "sqlalchemy", "psycopg"},
)
OTEL bridge vs. native wrappers, field-level comparison:
| Capability | OTEL Bridge | Native Wrapper |
|---|---|---|
| Model, provider, tokens, latency, errors | Yes | Yes |
| Cost calculation | Yes (basic) | Yes (cache-aware, 2-5x more accurate for prompt-cached workloads) |
| System prompt, temperature, top_p | Yes | Yes |
| Message content (when opted-in) | Yes | Yes |
| Cache read / creation tokens | No (not in OTEL spec) | Yes |
| Reasoning content and tokens | No (not in OTEL spec) | Yes |
| Time to first byte (TTFB) | No | Yes |
| Streaming detection | No | Yes |
| Event graph linking (parent_message_id) | No | Yes |
| Session / agent context | With agent.session() |
With agent.session() |
API Reference
Message, tool, and span tracking methods return a UUID string (message_id, invocation_id, or span_id) for event linking. Session lifecycle methods and score() return None.
| Category | Methods |
|---|---|
| Messages | track_user_message(user_id, content, session_id, labels=..., ...) -> str |
track_ai_message(user_id, content, session_id, model, provider, latency_ms, labels=..., ...) -> str |
|
| Operations | track_tool_call(user_id, tool_name, latency_ms, success, ...) -> str |
track_embedding(user_id, model, provider, latency_ms, ...) -> str |
|
track_span(user_id, span_name, trace_id, latency_ms, ...) -> str |
|
| Sessions | track_session_end(user_id, session_id, ...) |
track_session_enrichment(user_id, session_id, enrichments) |
|
| Scoring | score(user_id, name, value, target_id, ...) |
| Types | MessageLabel(key, value, confidence=...) (inline message labels) |
EvidenceQuote(quote, turn_index, role=...) (rubric evidence) |
|
SessionEnrichments(...) (structured session classifications) |
|
| Agents | agent(agent_id, ...) -> BoundAgent — pre-configured handle with inherited context |
agent.child(agent_id, ...) -> BoundAgent — inherits parent context, sets parent_agent_id |
|
agent.session(session_id=None) -> Session — context manager, auto-closes on exit |
|
| Utilities | status() -> dict — config, available providers, patched providers |
tenant(customer_org_id, ...) -> TenantHandle — multi-tenant factory for BoundAgent |
|
| Lifecycle | flush(), shutdown() |
Usage Examples
Messages
msg_id = ai.track_user_message(
user_id="user-1",
content="How do I set up a funnel?",
session_id="sess-1",
trace_id="trace-1",
turn_id=1,
agent_id="support-agent",
env="production",
)
ai_msg_id = ai.track_ai_message(
user_id="user-1",
content="To create a funnel chart...",
session_id="sess-1",
trace_id="trace-1",
model="gpt-4o",
provider="openai",
latency_ms=450.0,
input_tokens=120,
output_tokens=340,
total_tokens=460,
total_cost_usd=0.0023,
turn_id=2,
agent_id="support-agent",
env="production",
ttfb_ms=85.0,
)
Cache-Aware Cost Calculation
LLM providers cache repeated token prefixes (system prompts, tool definitions) at reduced rates. Pass cache breakdowns for accurate cost tracking. When total_cost_usd is omitted, the SDK auto-calculates with cache-aware pricing:
ai_msg_id = ai.track_ai_message(
user_id="user-1",
content="Here's how to configure...",
session_id="sess-1",
model="claude-sonnet-4-20250514",
provider="anthropic",
latency_ms=800.0,
input_tokens=5000,
output_tokens=200,
cache_read_tokens=4500, # ~10% cost (Anthropic), ~50% (OpenAI)
cache_creation_tokens=500, # ~125% cost (Anthropic)
)
Note — pricing data freshness. Cost calculation relies on pricing data bundled in the installed
genai-pricespackage. Newly released models may return$0until the package is updated. To get the latest pricing between package releases, opt in to live updates at startup:from amplitude_ai import enable_live_price_updates enable_live_price_updates() # fetches latest prices from genai-prices GitHub repo hourlyThis makes periodic HTTPS requests to
raw.githubusercontent.com(~26 KB each). Only enable in environments where outbound network access is permitted.
Implicit Feedback
Track behavioral signals that indicate whether a response met the user's need, without requiring explicit ratings:
# User asks a question
msg1 = ai.track_user_message(
user_id="u1", content="How do I create a funnel?", session_id="s1",
)
# AI responds -- user copies the answer (positive signal)
ai_msg = ai.track_ai_message(
user_id="u1", content="To create a funnel, go to...",
session_id="s1", model="gpt-4o", provider="openai", latency_ms=300.0,
was_copied=True,
)
# User regenerates (negative signal -- first response wasn't good enough)
msg2 = ai.track_user_message(
user_id="u1", content="How do I create a funnel?",
session_id="s1", is_regeneration=True,
)
# User edits their question (refining intent)
msg3 = ai.track_user_message(
user_id="u1", content="How do I create a conversion funnel for signups?",
session_id="s1", is_edit=True, edited_message_id=msg1,
)
# Session where user abandoned after the first exchange
ai.track_session_end(user_id="u1", session_id="s1", abandonment_turn=1)
File Attachments
Track rich media (images, PDFs, audio, video) without sending file content through the SDK. Include a url pointing to the resource on your own infrastructure (CDN, S3, internal docs system) and the LLM session viewer renders it on-the-fly in the reviewer's browser. Amplitude never stores or proxies the file; the browser fetches directly from your URL using your existing network access and auth.
If the URL is live when someone reviews the session, they see the full resource inline. If it has expired or is unreachable, the viewer falls back to the filename and type.
# Image -- session viewer renders it inline from your CDN
s.track_user_message(
content="What's wrong with this error?",
attachments=[{
"type": "image",
"name": "error_screenshot.png",
"url": "https://cdn.example.com/uploads/error_screenshot.png",
"mime_type": "image/png",
}],
)
# PDF -- session viewer opens it in an embedded viewer from your docs system
s.track_user_message(
content="Summarize the key risks in this contract",
attachments=[{
"type": "pdf",
"name": "vendor_agreement_v3.pdf",
"url": "https://docs.internal.example.com/contracts/vendor_agreement_v3.pdf",
"mime_type": "application/pdf",
"page_count": 23,
"department": "legal",
}],
)
# Multiple attachments, mixed types
s.track_user_message(
content="Compare these datasets and explain the chart",
attachments=[
{"type": "csv", "name": "sales_2025.csv", "url": "https://s3.example.com/data/sales_2025.csv"},
{"type": "csv", "name": "sales_2024.csv", "url": "https://s3.example.com/data/sales_2024.csv"},
{"type": "image", "name": "revenue_chart.png", "url": "https://cdn.example.com/charts/revenue.png"},
],
)
# AI-generated attachment (works on track_ai_message too)
s.track_ai_message(
content="Here's the visualization you requested",
model="gpt-4o", provider="openai", latency_ms=3200,
attachments=[{
"type": "image",
"name": "forecast_chart.png",
"url": "https://cdn.example.com/generated/forecast_chart.png",
"mime_type": "image/png",
}],
)
The attachment dict is free-form. Add any extra keys you need (size_bytes, page_count, duration_seconds, department, internal_doc_id). The SDK extracts type for aggregate analytics; everything else is serialized as-is into the event and available to viewers and downstream consumers.
Multi-Agent & Multi-Tenant
With BoundAgent (recommended):
orchestrator = ai.agent("orchestrator", agent_version="v4.2", env="prod", customer_org_id="acme-123")
billing = orchestrator.child("billing-agent") # inherits agent_version="v4.2"
with billing.session(user_id="u1") as s:
s.new_trace()
s.track_user_message(content="Check my billing status")
s.track_ai_message(content="Your balance is...", model="gpt-4o", provider="openai", latency_ms=200)
With ai.tenant() (multi-tenant shorthand):
For platforms serving multiple customers, ai.tenant() pre-fills customer_org_id and groups on every agent:
tenant = ai.tenant("acme-corp", groups={"company": "acme-corp"}, env="production")
support_bot = tenant.agent("support-bot", user_id="u1")
billing_bot = tenant.agent("billing-bot", user_id="u1")
# Both agents inherit customer_org_id="acme-corp" and groups automatically
Explicit kwargs on tenant.agent() override the defaults.
Without BoundAgent (manual, same result):
msg_id = ai.track_user_message(
user_id="user-1",
content="Check my billing status",
session_id="sess-1",
trace_id="trace-1",
agent_id="billing-agent",
parent_agent_id="orchestrator",
customer_org_id="cust-acme-123",
env="production",
)
A/B Testing with Context
The context dict lets you attach experiment variants, feature flags, prompt revisions, and any other segmentation dimension to every event:
# Variant assigned at session start (e.g., from your experiment framework)
variant = get_experiment_variant(user_id, "prompt-rewrite-v2")
agent = ai.agent(
"support-bot",
user_id="u1",
env="production",
context={
"experiment_variant": variant, # "control" or "treatment"
"feature_flags": {"rag_v2": True},
"prompt_revision": "abc123",
},
)
# All events in this session carry the context -- segment quality metrics
# by experiment_variant in Amplitude charts and cohorts
with agent.session() as s:
s.track_user_message(content="How do I set up billing?")
s.track_ai_message(content="...", model="gpt-4o", provider="openai", latency_ms=300)
Child agents merge context (child keys override parent keys):
orchestrator = ai.agent("orchestrator",
context={"experiment_variant": "treatment"})
researcher = orchestrator.child("researcher",
context={"sub_experiment": "rag-rerank"})
# researcher.context == {"experiment_variant": "treatment", "sub_experiment": "rag-rerank"}
Tool Calls
inv_id = ai.track_tool_call(
user_id="user-1",
tool_name="search_docs",
latency_ms=85.0,
success=True,
session_id="sess-1",
trace_id="trace-1",
turn_id=3,
input={"query": "funnel setup"},
output="Found 3 matching docs...",
parent_message_id=ai_msg_id, # links this tool call to the AI response
agent_id="support-agent",
env="production",
)
Embeddings
span_id = ai.track_embedding(
user_id="user-1",
model="text-embedding-3-small",
provider="openai",
latency_ms=25.0,
input_tokens=45,
dimensions=1536,
total_cost_usd=0.00001,
session_id="sess-1",
)
Generic Spans (Custom Events in Agent Analytics)
Track any pipeline operation (vector search, rerank, guardrails, retrieval, etc.):
span_id = ai.track_span(
user_id="user-1",
span_name="vector_search",
trace_id="trace-1",
latency_ms=120.0,
input_state={"query": "funnel setup", "top_k": 10},
output_state={"results_count": 3},
session_id="sess-1",
)
@tool Decorator
Automatically track function calls as [Agent] Tool Call events:
from amplitude_ai import tool
@tool(amplitude=amplitude)
def search_knowledge_base(query: str) -> str:
"""Search the knowledge base for relevant articles."""
return "Found 3 results..."
# Every call tracked with latency, input, output, and success status
result = search_knowledge_base(query="retention", amplitude_user_id="user-1")
Works with async functions too. The decorator detects coroutines automatically:
@tool(amplitude=amplitude)
async def fetch_user_profile(user_id: str) -> dict:
"""Fetch user profile from the API."""
async with httpx.AsyncClient() as client:
resp = await client.get(f"/users/{user_id}")
return resp.json()
# Tracked identically to sync — latency, input, output, success
profile = await fetch_user_profile(user_id="u-123", amplitude_user_id="user-1")
@observe Decorator
Track any function as a [Agent] Span event with automatic latency measurement, error capture, and session lifecycle management:
from amplitude_ai import observe
@observe
def summarize_document(text: str) -> str:
"""Summarize a document using an LLM pipeline."""
chunks = chunk_text(text)
summaries = [call_llm(chunk) for chunk in chunks]
return combine_summaries(summaries)
result = summarize_document(long_text)
# Tracked: span_name="summarize_document", latency, input/output state
Sessions are handled automatically: @observe joins an active session if one exists, or creates and closes its own. Nested calls share the outer session. Use @observe(name="custom-span-name") to override the function name. Async functions are detected automatically. In metadata_only mode, only function name, latency, and error status are captured.
@observe
def pipeline(query):
step1(query) # @observe — attaches to pipeline's session, not a new one
step2(query) # @observe — same session, same trace
Common Recipes
Custom events in Agent Analytics. track_span() is the catch-all for any operation not covered by track_user_message, track_ai_message, track_tool_call, or track_embedding. It emits an [Agent] Span event with full session context (session ID, agent ID, trace ID, SDK version) so custom events appear in Agent Analytics alongside auto-tracked events:
# Track a custom business event that shows up in Agent Analytics
span_id = s.track_span(
span_name="subscription_check",
latency_ms=45.0,
output_state="active",
event_properties={"plan": "enterprise", "seats": 50},
)
Tracking agent actions and side effects. When your agent takes real-world actions (issuing refunds, sending emails, creating tickets), use track_span(). The span_name is the action type, output_state carries the result, and is_error captures failures:
# Agent issues a refund via Stripe
span_id = s.track_span(
span_name="issue_refund",
latency_ms=340.0,
input_state={"order_id": "ord-789", "amount": 49.99},
output_state={"transaction_id": "txn_abc", "success": True},
)
# Agent sends an email via SendGrid
span_id = s.track_span(
span_name="send_email",
latency_ms=120.0,
input_state={"template": "refund_confirmation", "recipient": "user@example.com"},
output_state={"message_id": "sg-456", "success": True},
)
# Failed action
span_id = s.track_span(
span_name="create_ticket",
latency_ms=2100.0,
is_error=True,
error_message="Zendesk API rate limited",
input_state={"subject": "Refund follow-up"},
)
Filter by [Agent] Span Name in Amplitude to build dashboards for action success rates, latency by target system, and error attribution.
Tracking guardrails and safety checks. Content filters, injection detection, and policy checks are spans too:
span_id = s.track_span(
span_name="content_filter",
latency_ms=15.0,
input_state={"check": "prompt_injection"},
output_state={"blocked": True, "reason": "injection_detected"},
is_error=True,
error_message="Prompt injection detected -- blocked",
)
Tracking RAG pipelines. Use nested spans to capture the full retrieval pipeline (embed, search, rerank) as a single traceable unit:
rag_span = s.track_span(span_name="rag_pipeline", latency_ms=280.0)
embed_id = s.track_embedding(
model="text-embedding-3-small",
provider="openai",
input_tokens=8,
latency_ms=45.0,
)
search_span = s.track_span(
span_name="vector_search",
latency_ms=90.0,
parent_span_id=rag_span,
input_state={"query": "billing setup", "top_k": 10},
output_state={"results_count": 5, "best_score": 0.94},
)
rerank_span = s.track_span(
span_name="rerank",
latency_ms=60.0,
parent_span_id=rag_span,
input_state={"candidates": 5},
output_state={"kept": 3},
)
Connecting AI events to business outcomes. Your existing product events already track business outcomes. Because AI events share the same user_id, you build cross-product funnels directly in Amplitude. No dedicated "goal" event needed:
[Agent] User Message -> [Agent] AI Response -> Purchase Completed
Build a cohort of users whose AI sessions scored above 0.8 on task_completion and compare their conversion rate to everyone else. The funnel builder connects AI sessions to any downstream product event.
Event Schema
The SDK produces 8 event types, all prefixed with [Agent].
SDK Events
| Event | Method | Description |
|---|---|---|
[Agent] User Message |
track_user_message() |
Session/trace/turn IDs, message content, agent IDs, implicit feedback (regeneration, edit), file attachments |
[Agent] AI Response |
track_ai_message() |
Model, provider, latency, tokens (including reasoning), cost, finish reason, reasoning content, system prompt, model config, copy signal |
[Agent] Tool Call |
track_tool_call() |
Tool name, input/output, latency, success status |
[Agent] Embedding |
track_embedding() |
Model, provider, latency, tokens, vector dimensions, cost |
[Agent] Span |
track_span() |
Generic operation tracking (name, input/output state, latency, parent span hierarchy) |
[Agent] Session End |
track_session_end() |
Explicit session close with optional enrichments, abandonment tracking |
[Agent] Session Enrichment |
track_session_enrichment() |
Customer-provided session classifications. Distinct from server-side [Agent] Session Evaluation. |
[Agent] Score |
score() |
Quality signal attached to a message or session (user feedback, automated evals, human annotations) |
Server-Side Events (automatic)
| Event | Description |
|---|---|
[Agent] Session Evaluation |
Session-level summary: outcome, turn count, flags (has_task_failure, has_negative_feedback), metadata |
[Agent] Topic Classification |
One event per configured topic model per session: model_name, primary, subcategories, values |
[Agent] Score (reused) |
One event per configured rubric per session, with [Agent] Evaluation Source = "ai" |
How Behavioral Signals Become Analytics
The SDK captures behavioral facts at the application layer, and when content_mode="full", server-side enrichment detects patterns across the full session. Both converge into the same charts, cohorts, and funnels:
| SDK call | What appears in Amplitude | What you can build |
|---|---|---|
track_user_message(is_regeneration=True) |
[Agent] User Message with is_regeneration=True + [Agent] Session Evaluation with behavioral_patterns=["retry_storm"] |
Cohort of frustrated users -> target with Guide -> measure churn delta |
track_ai_message(was_copied=True) |
[Agent] AI Response with was_copied=True |
Copy rate as positive quality signal, no explicit rating required |
score(source="user", value=1.0) |
[Agent] Score with source="user" |
Single chart: user feedback, LLM-as-judge, and human annotations side by side |
All content properties respect the configured content_mode. See Privacy & Content Control for tier details.
Event Property Reference
All event properties are prefixed with [Agent] (except [Amplitude] Session Replay ID). This reference is auto-generated and matches what gets registered in Amplitude's data catalog via the amplitude-ai-register-catalog CLI.
Common Properties (present on all SDK events)
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Session ID |
string | Yes | Unique session identifier. All events in one conversation share the same session ID. |
[Agent] Trace ID |
string | No | Identifies one user-message-to-AI-response cycle within a session. |
[Agent] Turn ID |
number | No | Monotonically increasing counter for event ordering within a session. |
[Agent] Agent ID |
string | No | Identifies which AI agent handled the interaction (e.g., 'support-bot', 'houston'). |
[Agent] Parent Agent ID |
string | No | For multi-agent orchestration: the agent that delegated to this agent. |
[Agent] Customer Org ID |
string | No | Organization ID for multi-tenant platforms. Enables account-level group analytics. |
[Agent] Agent Version |
string | No | Agent code version (e.g., 'v4.2'). Enables version-over-version quality comparison. |
[Agent] Agent Description |
string | No | Human-readable description of the agent's purpose (e.g., 'Handles user chat requests via OpenAI GPT-4o'). Enables observability-driven agent registry from event streams. |
[Agent] Context |
string | No | Serialized JSON dict of arbitrary segmentation dimensions (experiment_variant, surface, feature_flag, prompt_revision, etc.). |
[Agent] Env |
string | No | Deployment environment: 'production', 'staging', or 'dev'. |
[Agent] SDK Version |
string | Yes | Version of the amplitude-ai SDK that produced this event. |
[Agent] Runtime |
string | Yes | SDK runtime: 'python' or 'node'. |
User Message Properties
Event-specific properties for [Agent] User Message (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Message ID |
string | Yes | Unique identifier for this message event (UUID). Used to link scores and tool calls back to specific messages. |
[Agent] Component Type |
string | Yes | Type of component that produced this event: 'user_input', 'llm', 'tool', 'embedding'. |
[Agent] Locale |
string | No | User locale (e.g., 'en-US'). |
[Amplitude] Session Replay ID |
string | No | Links to Amplitude Session Replay (format: device_id/session_id). Enables one-click navigation from AI session to browser replay. |
[Agent] Is Regeneration |
boolean | No | Whether the user requested the AI regenerate a previous response. |
[Agent] Is Edit |
boolean | No | Whether the user edited a previous message and resubmitted. |
[Agent] Edited Message ID |
string | No | The message_id of the original message that was edited (links the edit to the original). |
[Agent] Has Attachments |
boolean | No | Whether this message includes file attachments (uploads, images, etc.). |
[Agent] Attachment Types |
string[] | No | Distinct attachment types (e.g., 'pdf', 'image', 'csv'). Serialized JSON array. |
[Agent] Attachment Count |
number | No | Number of file attachments included with this message. |
[Agent] Total Attachment Size Bytes |
number | No | Total size of all attachments in bytes. |
[Agent] Attachments |
string | No | Serialized JSON array of attachment metadata (type, name, size_bytes, mime_type). Only metadata, never file content. |
[Agent] Message Labels |
string | No | Serialized JSON array of MessageLabel objects (key-value pairs with optional confidence). Used for routing tags, classifier output, business context. |
[Agent] Message Source |
string | No | Origin of the user message: 'user' for real end-user input, 'agent' for inter-agent delegation (parent agent sending instructions to a child agent). Automatically set by provider wrappers based on parent_agent_id context. |
AI Response Properties
Event-specific properties for [Agent] AI Response (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Message ID |
string | Yes | Unique identifier for this message event (UUID). Used to link scores and tool calls back to specific messages. |
[Agent] Component Type |
string | Yes | Type of component that produced this event: 'user_input', 'llm', 'tool', 'embedding'. |
[Agent] Model Name |
string | Yes | LLM model identifier (e.g., 'gpt-4o', 'claude-sonnet-4-20250514'). |
[Agent] Provider |
string | Yes | LLM provider name (e.g., 'openai', 'anthropic', 'google', 'mistral', 'bedrock'). |
[Agent] Latency Ms |
number | Yes | Total wall-clock latency in milliseconds for this operation. |
[Agent] Is Error |
boolean | Yes | Whether this event represents an error condition. |
[Agent] Error Message |
string | No | Error message text when Is Error is true. |
[Agent] Locale |
string | No | User locale (e.g., 'en-US'). |
[Agent] Span Kind |
string | No | Classification of the span type for OTEL bridge compatibility. |
[Amplitude] Session Replay ID |
string | No | Links to Amplitude Session Replay (format: device_id/session_id). Enables one-click navigation from AI session to browser replay. |
[Agent] TTFB Ms |
number | No | Time to first byte/token in milliseconds. Measures perceived responsiveness for streaming. |
[Agent] Input Tokens |
number | No | Number of input/prompt tokens consumed by this LLM call. |
[Agent] Output Tokens |
number | No | Number of output/completion tokens generated by this LLM call. |
[Agent] Total Tokens |
number | No | Total tokens consumed (input + output). |
[Agent] Reasoning Tokens |
number | No | Tokens consumed by reasoning/thinking (o1, o3, extended thinking models). |
[Agent] Cache Read Tokens |
number | No | Input tokens served from the provider's prompt cache (cheaper rate). Used for cache-aware cost calculation. |
[Agent] Cache Creation Tokens |
number | No | Input tokens that created new prompt cache entries. |
[Agent] Cost USD |
number | No | Estimated cost in USD for this LLM call. Cache-aware when cache token counts are provided. |
[Agent] Finish Reason |
string | No | Why the model stopped generating: 'stop', 'end_turn', 'tool_use', 'length', 'content_filter', etc. |
[Agent] Tool Calls |
string | No | Serialized JSON array of tool call requests made by the AI in this response. |
[Agent] Has Reasoning |
boolean | No | Whether the AI response included reasoning/thinking content. |
[Agent] Reasoning Content |
string | No | The AI's reasoning/thinking content (when available and content_mode permits). |
[Agent] System Prompt |
string | No | The system prompt used for this LLM call (when content_mode permits). Chunked for long prompts. |
[Agent] System Prompt Length |
number | No | Character length of the system prompt. |
[Agent] Tool Definitions |
string | No | Normalized JSON array of tool definitions sent to the LLM (when content_mode permits). Each entry contains name, description, and parameters schema. |
[Agent] Tool Definitions Count |
number | No | Number of tool definitions in the LLM request. |
[Agent] Tool Definitions Hash |
string | No | Stable SHA-256 hash of the normalized tool definitions. Always present regardless of content_mode; enables toolset change detection without exposing schemas. |
[Agent] Temperature |
number | No | Temperature parameter used for this LLM call. |
[Agent] Max Output Tokens |
number | No | Maximum output tokens configured for this LLM call. |
[Agent] Top P |
number | No | Top-p (nucleus sampling) parameter used for this LLM call. |
[Agent] Is Streaming |
boolean | No | Whether this response was generated via streaming. |
[Agent] Prompt ID |
string | No | Identifier for the prompt template or version used. |
[Agent] Was Copied |
boolean | No | Whether the user copied this AI response content. An implicit positive quality signal. |
[Agent] Was Cached |
boolean | No | Whether this response was served from a semantic/full-response cache (distinct from token-level prompt caching). |
[Agent] Model Tier |
string | No | Model tier classification: 'fast' (GPT-4o-mini, Haiku, Flash), 'standard' (GPT-4o, Sonnet, Pro), or 'reasoning' (o1, o3, DeepSeek-R1). Auto-inferred from model name. |
[Agent] Has Attachments |
boolean | No | Whether this AI response includes generated attachments (images, charts, files). |
[Agent] Attachment Types |
string[] | No | Distinct attachment types in this AI response. Serialized JSON array. |
[Agent] Attachment Count |
number | No | Number of attachments generated by the AI in this response. |
[Agent] Total Attachment Size Bytes |
number | No | Total size of all AI-generated attachments in bytes. |
[Agent] Attachments |
string | No | Serialized JSON array of AI-generated attachment metadata. |
[Agent] Message Labels |
string | No | Serialized JSON array of MessageLabel objects attached to this AI response. |
[Agent] Message Label Map |
string | No | Serialized JSON map of label key to value for quick lookup. |
Tool Call Properties
Event-specific properties for [Agent] Tool Call (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Component Type |
string | Yes | Type of component that produced this event: 'user_input', 'llm', 'tool', 'embedding'. |
[Agent] Latency Ms |
number | Yes | Total wall-clock latency in milliseconds for this operation. |
[Agent] Is Error |
boolean | Yes | Whether this event represents an error condition. |
[Agent] Error Message |
string | No | Error message text when Is Error is true. |
[Agent] Locale |
string | No | User locale (e.g., 'en-US'). |
[Agent] Span Kind |
string | No | Classification of the span type for OTEL bridge compatibility. |
[Amplitude] Session Replay ID |
string | No | Links to Amplitude Session Replay (format: device_id/session_id). Enables one-click navigation from AI session to browser replay. |
[Agent] Invocation ID |
string | Yes | Unique identifier for this tool invocation (UUID). Used to link tool calls to parent messages. |
[Agent] Tool Name |
string | Yes | Name of the tool/function that was invoked (e.g., 'search_docs', 'web_search'). |
[Agent] Tool Success |
boolean | Yes | Whether the tool call completed successfully. |
[Agent] Tool Input |
string | No | Serialized JSON of the tool's input arguments. Only sent when content_mode='full'. |
[Agent] Tool Output |
string | No | Serialized JSON of the tool's output/return value. Only sent when content_mode='full'. |
[Agent] Parent Message ID |
string | No | The message_id of the user message that triggered this tool call. Links the tool call into the event graph. |
Embedding Properties
Event-specific properties for [Agent] Embedding (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Component Type |
string | Yes | Type of component that produced this event: 'user_input', 'llm', 'tool', 'embedding'. |
[Agent] Model Name |
string | Yes | LLM model identifier (e.g., 'gpt-4o', 'claude-sonnet-4-20250514'). |
[Agent] Provider |
string | Yes | LLM provider name (e.g., 'openai', 'anthropic', 'google', 'mistral', 'bedrock'). |
[Agent] Latency Ms |
number | Yes | Total wall-clock latency in milliseconds for this operation. |
[Agent] Span ID |
string | Yes | Unique identifier for this embedding operation (UUID). |
[Agent] Input Tokens |
number | No | Number of input tokens processed by the embedding model. |
[Agent] Embedding Dimensions |
number | No | Dimensionality of the output embedding vector. |
[Agent] Cost USD |
number | No | Estimated cost in USD for this embedding operation. |
Span Properties
Event-specific properties for [Agent] Span (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Latency Ms |
number | Yes | Total wall-clock latency in milliseconds for this operation. |
[Agent] Is Error |
boolean | Yes | Whether this event represents an error condition. |
[Agent] Error Message |
string | No | Error message text when Is Error is true. |
[Agent] Span ID |
string | Yes | Unique identifier for this span (UUID). |
[Agent] Span Name |
string | Yes | Name of the operation (e.g., 'rag_pipeline', 'vector_search', 'rerank'). |
[Agent] Parent Span ID |
string | No | Span ID of the parent span for nested pipeline steps. |
[Agent] Input State |
string | No | Serialized JSON of the span's input state. Only sent when content_mode='full'. |
[Agent] Output State |
string | No | Serialized JSON of the span's output state. Only sent when content_mode='full'. |
Session End Properties
Event-specific properties for [Agent] Session End (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Enrichments |
string | No | Serialized JSON of SessionEnrichments (topic classifications, rubric scores, outcome, flags). Attached when enrichments are provided at session close. |
[Agent] Abandonment Turn |
number | No | Turn ID of the last user message that received an AI response before the user left. Low values (e.g., 1) strongly signal first-response dissatisfaction. |
[Agent] Session Idle Timeout Minutes |
number | No | Custom idle timeout for this session (default 30 min). Tells the server how long to wait before auto-closing. |
Session Enrichment Properties
Event-specific properties for [Agent] Session Enrichment (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Enrichments |
string | Yes | Serialized JSON of SessionEnrichments: topic_classifications, rubrics, overall_outcome, quality_score, sentiment_score, boolean flags, agent chain metadata, and message labels. |
Score Properties
Event-specific properties for [Agent] Score (in addition to common properties above).
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Score Name |
string | Yes | Name of the score (e.g., 'user-feedback', 'task_completion', 'accuracy', 'groundedness'). |
[Agent] Score Value |
number | Yes | Numeric score value. Binary (0/1), continuous (0.0-1.0), or rating scale (1-5). |
[Agent] Target ID |
string | Yes | The message_id or session_id being scored. |
[Agent] Target Type |
string | Yes | What is being scored: 'message' or 'session'. |
[Agent] Evaluation Source |
string | Yes | Source of the evaluation: 'user' (end-user feedback), 'ai' (automated/server pipeline), or 'reviewer' (human expert). |
[Agent] Comment |
string | No | Optional text explanation for the score (respects content_mode). |
[Agent] Taxonomy Version |
string | No | Which taxonomy config version produced this enrichment (from ai_category_config.config_version_id). |
[Agent] Evaluated At |
number | No | Epoch milliseconds when this enrichment/evaluation was computed. |
[Agent] Score Label |
string | No | Direction-neutral magnitude label derived from score value. Default 5-tier: very_high (>=0.8), high (>=0.6), moderate (>=0.4), low (>=0.2), very_low (>=0.0). Server-side only. |
Server-Side: Session Evaluation Properties
[Agent] Session Evaluation is emitted automatically by the server-side enrichment pipeline — do not send this event from your code.
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Session ID |
string | Yes | Unique session identifier. All events in one conversation share the same session ID. |
[Agent] Agent ID |
string | Yes | Identifies which AI agent handled the interaction (e.g., 'support-bot', 'houston'). |
[Agent] Customer Org ID |
string | Yes | Organization ID for multi-tenant platforms. Enables account-level group analytics. |
[Agent] Evaluation Source |
string | Yes | Source of the evaluation: 'user' (end-user feedback), 'ai' (automated/server pipeline), or 'reviewer' (human expert). |
[Agent] Taxonomy Version |
string | Yes | Which taxonomy config version produced this enrichment (from ai_category_config.config_version_id). |
[Agent] Evaluated At |
number | Yes | Epoch milliseconds when this enrichment/evaluation was computed. |
[Agent] Overall Outcome |
string | Yes | Session outcome classification: 'success', 'partial_success', 'failure', 'abandoned', 'response_provided', etc. |
[Agent] Turn Count |
number | Yes | Number of conversation turns in this session. |
[Agent] Session Total Tokens |
number | No | Total LLM tokens consumed across all turns in this session. |
[Agent] Session Avg Latency Ms |
number | No | Average AI response latency in milliseconds across the session. |
[Agent] Request Complexity |
string | No | Complexity classification of the user's request: 'simple', 'moderate', 'complex', or 'ambiguous'. |
[Agent] Has Task Failure |
boolean | Yes | Whether the agent failed to complete the user's request. |
[Agent] Has Negative Feedback |
boolean | Yes | Whether the user expressed dissatisfaction during the session. |
[Agent] Has Technical Failure |
boolean | Yes | Whether technical errors occurred (tool timeouts, API failures, etc.). |
[Agent] Has Data Quality Issues |
boolean | Yes | Whether the AI output had data quality problems (wrong data, hallucinations, etc.). |
[Agent] Models Used |
string[] | No | LLM models used in this session. JSON array of strings. |
[Agent] Root Agent Name |
string | No | Entry-point agent in multi-agent flows. |
[Agent] Agent Chain Depth |
number | No | Number of agents in the delegation chain. |
[Agent] Task Failure Type |
string | No | Specific failure type when has_task_failure is true (e.g., 'wrong_answer', 'unable_to_complete'). |
[Agent] Technical Error Count |
number | No | Count of technical errors that occurred during the session. |
[Agent] Error Categories |
string[] | No | Categorized error types (e.g., 'chart_not_found', 'timeout'). JSON array of strings. |
[Agent] Behavioral Patterns |
string[] | No | Detected behavioral anti-patterns (e.g., 'retry_storm', 'clarification_loop', 'early_abandonment'). JSON array of strings. |
[Agent] Session Cost USD |
number | No | Total LLM cost in USD for this AI session (aggregated from per-message costs). |
[Agent] Enrichment Cost USD |
number | No | Cost in USD of running the enrichment pipeline's LLM inference for this session. Distinct from the session's own LLM cost. |
[Agent] Quality Score |
number | No | Overall quality score (0.0-1.0) computed by the enrichment pipeline for this session. |
[Agent] Sentiment Score |
number | No | User sentiment score (0.0-1.0) inferred from the conversation by the enrichment pipeline. |
[Agent] Task Failure Reason |
string | No | Explanation of why the task failed when has_task_failure is true (e.g., 'chart data source unavailable'). |
[Agent] Agent Chain |
string[] | No | Serialized JSON array of agent IDs representing the delegation chain in multi-agent flows. |
[Agent] Project ID |
string | No | Amplitude project ID that owns the AI session being evaluated. |
[Agent] Has User Feedback |
boolean | Yes | Whether the session received explicit user feedback (thumbs up/down, rating). |
[Agent] User Score |
number | No | Aggregate user feedback score for the session (0.0-1.0). Present only when has_user_feedback is true. |
[Agent] Agent Version |
string | No | Agent code version (e.g., 'v4.2'). Enables version-over-version quality comparison. |
[Agent] Agent Description |
string | No | Human-readable description of the agent's purpose (e.g., 'Handles user chat requests via OpenAI GPT-4o'). Enables observability-driven agent registry from event streams. |
Server-Side: Topic Classification Properties
[Agent] Topic Classification is emitted automatically by the server-side enrichment pipeline — do not send this event from your code.
| Property | Type | Required | Description |
|---|---|---|---|
[Agent] Session ID |
string | Yes | Unique session identifier. All events in one conversation share the same session ID. |
[Agent] Agent ID |
string | Yes | Identifies which AI agent handled the interaction (e.g., 'support-bot', 'houston'). |
[Agent] Customer Org ID |
string | Yes | Organization ID for multi-tenant platforms. Enables account-level group analytics. |
[Agent] Evaluation Source |
string | Yes | Source of the evaluation: 'user' (end-user feedback), 'ai' (automated/server pipeline), or 'reviewer' (human expert). |
[Agent] Taxonomy Version |
string | Yes | Which taxonomy config version produced this enrichment (from ai_category_config.config_version_id). |
[Agent] Evaluated At |
number | Yes | Epoch milliseconds when this enrichment/evaluation was computed. |
[Agent] Topic |
string | Yes | Which topic model this classification is for (e.g., 'product_area', 'query_intent', 'error_domain'). |
[Agent] Selection Mode |
string | Yes | Whether this topic model uses 'single' (MECE) or 'multiple' (multi-label) selection. |
[Agent] Primary |
string | No | Primary classification value (e.g., 'charts', 'billing_issues'). |
[Agent] Secondary |
string[] | No | Secondary classifications for multi-label topics. JSON array of strings. |
[Agent] Subcategories |
string[] | No | Subcategories for finer classification within the primary topic (e.g., 'TREND_ANALYSIS', 'WRONG_EVENT'). JSON array of strings. |
Sending Events Without the SDK
The [Agent] event schema is not tied to this SDK. If your stack is Go, Java, Rust, or any language without an Amplitude AI SDK, you can send the same events directly via Amplitude's ingestion APIs.
What the SDK handles for you
When you use this SDK, the following are managed automatically. If you send events directly, you are responsible for these:
| Concern | SDK behavior | DIY equivalent |
|---|---|---|
| Session ID | Generated once per Session and propagated to every event |
Generate a UUID per conversation and include it as [Agent] Session ID on every event |
| Deduplication | Automatic insert_id on each event |
Set a unique insert_id per event to prevent duplicates on retry |
| Property prefixing | All properties are prefixed with [Agent] |
You must include the [Agent] prefix in every property name |
| Cost / token calculation | Auto-computed from model and token counts | Compute and send [Agent] Cost USD, [Agent] Input Tokens, etc. yourself |
| Server-side enrichment | [Agent] Session Evaluation, [Agent] Topic Classification, and [Agent] Score events are emitted automatically by the enrichment pipeline after [Agent] Session End |
These fire automatically — you do not need to send them. Just send the SDK-level events and close the session with [Agent] Session End. |
Ingestion methods
| Method | Best for | Docs |
|---|---|---|
| HTTP V2 API | Real-time, low-to-medium volume | HTTP V2 API docs |
| Batch Event Upload API | High volume, backfills | Batch API docs |
| Amazon S3 Import | Bulk historical import, warehouse-first workflows | S3 Import docs |
Minimal HTTP API example
curl -X POST https://api2.amplitude.com/2/httpapi \
-H 'Content-Type: application/json' \
-d '{
"api_key": "YOUR_API_KEY",
"events": [
{
"event_type": "[Agent] User Message",
"user_id": "user-42",
"insert_id": "evt-unique-id-1",
"event_properties": {
"[Agent] Session ID": "sess-abc123",
"[Agent] Trace ID": "trace-def456",
"[Agent] Turn ID": 1,
"[Agent] Agent ID": "support-bot",
"[Agent] Message ID": "msg-001"
}
},
{
"event_type": "[Agent] AI Response",
"user_id": "user-42",
"insert_id": "evt-unique-id-2",
"event_properties": {
"[Agent] Session ID": "sess-abc123",
"[Agent] Trace ID": "trace-def456",
"[Agent] Turn ID": 1,
"[Agent] Message ID": "msg-002",
"[Agent] Agent ID": "support-bot",
"[Agent] Model Name": "gpt-4o",
"[Agent] Provider": "openai",
"[Agent] Latency Ms": 1203,
"[Agent] Input Tokens": 150,
"[Agent] Output Tokens": 420,
"[Agent] Cost USD": 0.0042
}
}
]
}'
Refer to the Event Schema tables above for required and optional properties per event type.
Register Event Schema in Your Data Catalog
Amplitude's data catalog is per-project. When your application sends [Agent] events, Amplitude auto-discovers the event types and property names — but they appear without descriptions, types, or required flags. The SDK ships a CLI tool that populates your project's data catalog with the full [Agent] event schema.
Prerequisites
- Amplitude Enterprise plan (required for Taxonomy API access)
- API Key and Secret Key from Settings > Projects in Amplitude
Usage
# Register the full [Agent] event schema in your project
amplitude-ai-register-catalog --api-key YOUR_API_KEY --secret-key YOUR_SECRET_KEY
# Preview what would be registered (no API calls)
amplitude-ai-register-catalog --dry-run
# EU residency endpoint
amplitude-ai-register-catalog --api-key KEY --secret-key SECRET --eu
The command is idempotent — safe to re-run. It creates events and properties that don't exist yet and updates descriptions for those that do. Run it once after installing the SDK, and again after SDK upgrades to pick up any new events or properties.
What gets registered
All 10 [Agent] event types and their 225+ properties, including:
- Event descriptions explaining what each event captures
- Property types (string, number, boolean)
- Required flags for critical properties
- Array type annotations for list-valued properties
After registration, your team can browse the full schema in Amplitude > Data > Events with descriptions visible inline.
Testing
MockAmplitudeAI is a drop-in replacement that captures events in-memory instead of sending them over the network. It supports all SDK features including agent() and session():
from amplitude_ai import MockAmplitudeAI
mock = MockAmplitudeAI()
agent = mock.agent("test-bot", user_id="u1")
with agent.session("s1") as s:
s.new_trace()
s.track_user_message(content="Hello")
s.track_ai_message(content="Hi!", model="gpt-4o", provider="openai", latency_ms=100.0)
assert len(mock.events) == 3 # user msg + ai msg + session end
mock.assert_event_tracked("[Agent] User Message", user_id="u1")
mock.assert_event_tracked("[Agent] AI Response", **{"[Agent] Model Name": "gpt-4o"})
mock.reset()
Filter and assert by session or agent:
mock.events_for_session("s1") # list of events for that session
mock.events_for_agent("test-bot") # list of events for that agent
mock.assert_session_closed("s1") # assert [Agent] Session End exists
Disabling Tracking in Tests
If you don't need to assert on events and just want tracking to be a no-op, use MockAmplitudeAI without inspecting events. It never makes network calls. Alternatively, skip SDK initialization entirely in your test config.
Serverless Environments
The SDK auto-detects serverless environments (AWS Lambda, Vercel, Netlify, Google Cloud Functions, Azure Functions, Cloudflare Pages). When detected, agent.session() context managers automatically flush all pending events on exit — no explicit ai.flush() needed. You can also control this explicitly via the auto_flush parameter:
# Auto-detected: flushes automatically in serverless, skips in long-running servers
with agent.session(user_id=uid, session_id=sid) as s:
...
# Explicit control:
with agent.session(user_id=uid, session_id=sid, auto_flush=True) as s: # always flush
...
with agent.session(user_id=uid, session_id=sid, auto_flush=False) as s: # never flush
...
If you track events outside of agent.session(), you still need ai.flush() before your handler returns:
def handler(event, context):
agent = ai.agent("lambda-bot")
with agent.session(user_id=event["user_id"]) as s:
s.new_trace()
s.track_user_message(content=event["message"])
ai_msg = s.track_ai_message(
content=generate_response(event["message"]),
model="gpt-4o", provider="openai", latency_ms=500.0,
)
ai.flush() # block until all events are delivered
return {"statusCode": 200}
flush() returns a list of Future objects. Call .result() on each to block until delivery completes. For long-running servers, the Amplitude SDK flushes automatically on a timer; explicit flush() is only needed for short-lived processes.
Streaming Patterns
When using streaming LLM responses (e.g. with the Vercel AI SDK's streamText or streamObject), the SDK's session.run() / with session() pattern doesn't fit because the response completes asynchronously after the context manager exits.
Use explicit event tracking with manual flush instead:
import asyncio
async def handle_stream(agent, user_id, session_id):
# Track the user message
agent.track_user_message(
user_id=user_id,
content="Summarize this document",
session_id=session_id,
)
# ... stream the LLM response, accumulate tokens/content ...
# Track the AI response after streaming completes
agent.track_ai_message(
user_id=user_id,
content=accumulated_content,
model="gpt-4o",
provider="openai",
latency_ms=total_latency,
session_id=session_id,
usage={"input_tokens": input_tokens, "output_tokens": output_tokens},
)
# Explicitly flush before the serverless handler returns
await asyncio.wrap_future(ai.flush())
Key points:
with session()auto-flushes on exit, but streaming responses may not be complete yet- Track the AI response in the streaming callback (e.g.
onFinish) after content is fully accumulated - Always call
ai.flush()explicitly in serverless environments when not usingwith session()
Error Handling and Reliability
-
Delivery failures are caught. All
track_*methods catchamplitude.track()exceptions internally and log errors. Your application code won't break if Amplitude is unreachable. Events that fail to send are logged atERRORlevel but do not propagate exceptions to your code. Note: malformed inputs may still raise during preprocessing (e.g., validation or privacy sanitization) before the event is sent. -
Events are buffered and retried automatically. The SDK delegates to the Amplitude Python SDK's event pipeline, which buffers events in memory and retries failed deliveries with exponential backoff. You don't need to implement retry logic.
-
Delivery status callback. Use
on_event_callbackinAIConfigto monitor delivery status per event:
def on_event(event, code, message):
if code != 200:
logging.warning(f"Event delivery failed: {code} {message}")
ai = AmplitudeAI(amplitude=amplitude, config=AIConfig(on_event_callback=on_event))
-
flush()blocks until delivery. ReturnsFutureobjects; call.result()to block until all buffered events are sent. Required in serverless environments; optional in long-running processes where the SDK flushes automatically on a timer. -
shutdown()for clean exit. Callai.shutdown()when your application exits to flush remaining events and release resources. Only necessary if the SDK created the Amplitude instance internally (viaapi_key=); if you passed in your ownamplitude=instance, manage its lifecycle yourself.
Under the Hood
Built on the official Amplitude-Python SDK (Amplitude, BaseEvent, track(), flush(), shutdown()).
Content Storage ($llm_message)
Message content (user messages and AI responses) is stored in a $llm_message nested property inside the event as {"text": "..."}. Content is preserved at full length with no truncation or size limits — the SDK bypasses the base Amplitude SDK's per-property string truncation for this property, and the server-side ingestion pipeline whitelists it as well.
This applies only in full content mode. In metadata_only mode, no content is sent.
Legacy note: SDK versions prior to this change split long content into
c0..c7chunk sub-properties. The LLM session viewer and enrichment pipeline continue to handle both formats transparently.
Context Propagation
The SDK uses Python's contextvars module to propagate session context (session ID, trace ID, agent ID, description, turn counter) across function calls. This is how provider wrappers, @tool, @observe, and the FastAPI middleware all share the same session without explicit parameter threading.
How it works:
- Sync code: context flows naturally through the call stack.
asyncio.create_task(): Python 3.10+ automatically copies the parent context into the new task. No action needed; the child task inherits the active session.ThreadPoolExecutor: threads do not inheritContextVarstate. If you offload work to a thread pool within a session, you must explicitly copy the context:
import contextvars
from concurrent.futures import ThreadPoolExecutor
with agent.session() as s:
s.track_user_message(content="hello")
ctx = contextvars.copy_context()
with ThreadPoolExecutor() as pool:
# ctx.run() ensures the thread sees the active session
future = pool.submit(ctx.run, my_blocking_function, arg1, arg2)
Without ctx.run(), the thread sees no active session and provider wrappers fall back to the global ToolCallTracker config (if set).
Nesting: sessions nest correctly. An inner with agent.session() or @observe call saves and restores the outer context on exit. No session leaks.
Debug Logging
The SDK uses the Amplitude SDK's built-in logger. Enable verbose output to see every event as it's tracked:
import logging
# Option 1: Enable via Amplitude's configuration
amplitude.configuration.logger = logging.getLogger("amplitude")
amplitude.configuration.logger.setLevel(logging.DEBUG)
# Option 2: Set min_id_length to bypass validation in dev
amplitude.configuration.min_id_length = 1
Supported Integrations
The SDK provides broad coverage across LLM providers, agent frameworks, and observability standards:
Provider Wrappers (drop-in replacements with full field coverage):
| Provider | Class | Install Extra | Key Capabilities |
|---|---|---|---|
| OpenAI | OpenAI |
[openai] |
Chat, streaming, reasoning (o1/o3/o4), function calling, prompt caching |
| OpenAI (async) | AsyncOpenAI |
[openai] |
Same as OpenAI — for openai.AsyncOpenAI users (FastAPI, pydantic_ai, etc.) |
| Anthropic | Anthropic |
[anthropic] |
Messages, streaming, extended thinking, tool_use, prompt caching |
| Google Gemini | Gemini |
[gemini] |
Generate content, streaming, thinking models |
| Azure OpenAI | AzureOpenAI |
[azure] |
Same as OpenAI, Azure-hosted |
| AWS Bedrock | Bedrock |
[bedrock] |
Converse API, streaming, cross-provider (Claude, Titan, etc.) |
| Mistral | Mistral |
[mistral] |
Chat, streaming, function calling, reasoning |
Framework Integrations (callbacks and processors for popular frameworks):
| Framework | Class | Install Extra | Integration Pattern |
|---|---|---|---|
| LangChain | AmplitudeCallbackHandler |
[langchain] |
Callback handler for chains and agents |
| LlamaIndex | AmplitudeLlamaIndexHandler |
[llamaindex] |
Callback handler for queries and retrievals |
| OpenTelemetry | AmplitudeAgentExporter |
[otel] |
OTEL span exporter (GenAI semantic conventions) |
| OpenAI Agents SDK | AmplitudeTracingProcessor |
[openai-agents] |
Tracing processor for multi-agent workflows |
| Anthropic tool_use | AmplitudeToolLoop |
[anthropic] |
Managed multi-turn tool_use loop |
| CrewAI | AmplitudeCrewAIHooks |
[crewai] |
Event listener hooks for LLM and tool calls |
| Claude Agent SDK | ClaudeAgentSDKTracker |
[anthropic] |
PreToolUse/PostToolUse hooks + message stream processing with real tool latency |
Managed / Hosted Agent Adapters (for architectures where LLM calls happen server-side):
| Platform | Class | Install Extra | Integration Pattern |
|---|---|---|---|
| Anthropic Managed Agents | ManagedAgentTracker |
[anthropic] |
Polls session events, maps to track_user_message / track_ai_message / track_tool_call |
See examples/anthropic_managed_agents_example.py and the coding agent guide (amplitude-ai.md, Step 3f) for full usage.
Claude Agent SDK
Track tool calls with execution latency and AI messages from Claude Agent SDK.
Essential fields: agentId (on ai.agent()) identifies which AI feature produced the events — it maps to the LLM Usage Application Registry. user_id + session_id (on agent.session()) tie all events into a single user conversation, powering funnels, retention, and conversation views. The session automatically emits [Agent] Session End when the context manager exits.
from amplitude import Amplitude
from amplitude_ai import AmplitudeAI
from amplitude_ai.integrations.claude_agent_sdk import ClaudeAgentSDKTracker
ai = AmplitudeAI(amplitude=Amplitude("YOUR_API_KEY"))
agent = ai.agent("code-reviewer")
tracker = ClaudeAgentSDKTracker()
with agent.session(user_id="u1", session_id="s1") as session:
for message in query(
prompt="Analyze this codebase",
options=ClaudeAgentOptions(hooks=tracker.hooks(session)),
):
tracker.process(session, message)
tracker.hooks(session) returns PreToolUse/PostToolUse hooks that track each tool execution with precise latency. tracker.process(session, message) processes streamed messages to track AI responses and user messages. See examples/claude_agent_sdk_example.py for a complete example.
Coverage summary: The OTEL GenAI bridge provides baseline coverage for any OTEL-instrumented provider or framework. The 6 dedicated provider wrappers add full field coverage (cache tokens, reasoning, TTFB, streaming). The framework integrations capture agent-level structure (multi-agent, tool loops, handoffs). Together, this covers the vast majority of production LLM deployments.
Package Structure
amplitude-ai/
├── amplitude_ai/
│ ├── __init__.py # Public API
│ ├── client.py # AmplitudeAI, BoundAgent, TenantHandle, Session
│ ├── config.py # AIConfig, ContentMode
│ ├── context.py # SessionContext, ContextVar propagation
│ ├── exceptions.py # AmplitudeAIError, ValidationError, etc.
│ ├── middleware.py # FastAPI/Starlette AmplitudeAIMiddleware
│ ├── patching.py # Zero-code monkey-patching (patch_openai, etc.)
│ ├── testing.py # MockAmplitudeAI
│ ├── wrappers.py # wrap() convenience function
│ ├── core/
│ │ ├── tracking.py # Event tracking functions
│ │ ├── privacy.py # Privacy/redaction
│ │ ├── enrichments.py # SessionEnrichments, TopicClassification, RubricScore
│ │ └── decorators.py # @tool and @observe decorators
│ ├── providers/ # OpenAI, Anthropic, Gemini, Azure, Bedrock, Mistral
│ ├── integrations/ # LangChain, LlamaIndex, OTEL, OpenAI Agents, Anthropic tools, CrewAI
│ └── utils/ # Cost (genai-prices), tokens, streaming
└── README.md
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| No events in Amplitude | API key not set or incorrect | Run amplitude-ai-doctor — it checks AMPLITUDE_AI_API_KEY and reports a fix command |
Events tracked but [Agent] Cost USD is $0 |
Model not in the pricing database, or total_cost_usd not passed |
Pass total_cost_usd explicitly, or install genai-prices: pip install genai-prices |
patch() doesn't instrument calls |
patch() called after the provider client was created |
Call patch() before importing or instantiating provider clients |
| Session context missing on events | LLM calls made outside a with agent.session() block |
Wrap your LLM calls inside with agent.session() as s: |
flush() hangs or times out in serverless |
Process exits before flush completes | Call ai.flush() before returning from your Lambda/Cloud Function handler |
@tool / @observe not emitting events |
No active session context | Ensure the decorated function is called within with agent.session() or another @observe scope |
| Import error for provider wrapper | Optional dependency not installed | Install the provider extra: pip install 'amplitude-ai[openai]' (or [anthropic], [gemini], etc.) |
Run amplitude-ai-doctor for automated environment diagnostics with fix suggestions.
For AI Coding Agents
This SDK is designed to be discovered and used by AI coding agents (Cursor, Claude Code, Windsurf, Cline, Copilot, or any MCP-compatible assistant). The following files ship with the package to help agents understand and integrate the SDK without reading the full README:
| File | Purpose |
|---|---|
amplitude-ai.md |
Primary guide — self-contained 4-phase instrumentation workflow and full API reference |
AGENTS.md |
Machine-readable decision tree, canonical patterns, MCP surface, gotchas, and CLI reference |
llms.txt |
Compact discovery file listing tools, resources, and event names |
llms-full.txt |
Extended reference with full API signatures, provider coverage matrix, and common error resolutions |
mcp.schema.json |
Structured JSON describing the MCP server's tools, resources, and prompt |
Run amplitude-ai-mcp to start the MCP server (standard stdio protocol). Any MCP-compatible AI coding agent can call tools like scan_project to analyze your codebase, instrument_file to transform source files, validate_file to detect uninstrumented LLM call sites, and generate_verify_test to produce CI tests.
CLI commands for coding agent workflows:
amplitude-ai # Print setup prompt for your AI coding agent
amplitude-ai --print-guide # Print the full amplitude-ai.md guide to stdout
amplitude-ai mcp # Start the MCP server
amplitude-ai doctor # Validate environment and event pipeline
amplitude-ai status # Show SDK version, installed providers, and env config
amplitude-ai-register-catalog # Push event catalog to Amplitude Data
Requirements
- Python >= 3.10
amplitude-analytics >= 1.0.0
Optional provider extras:
pip install "amplitude-ai[openai]" # OpenAI / Azure OpenAI
pip install "amplitude-ai[anthropic]" # Anthropic Claude (provider + tool_use loop)
pip install "amplitude-ai[gemini]" # Google Gemini
pip install "amplitude-ai[bedrock]" # AWS Bedrock
pip install "amplitude-ai[mistral]" # Mistral
pip install "amplitude-ai[langchain]" # LangChain
pip install "amplitude-ai[llamaindex]" # LlamaIndex
pip install "amplitude-ai[otel]" # OpenTelemetry
pip install "amplitude-ai[openai-agents]" # OpenAI Agents SDK
pip install "amplitude-ai[crewai]" # CrewAI
pip install "amplitude-ai[tokens]" # tiktoken (accurate token counting; falls back to estimation if absent)
pip install "amplitude-ai[all]" # Everything (includes tiktoken)
Reporting Issues
Found a bug or have a feature request? File an issue on the public tracker:
- Python SDK issues: github.com/amplitude/Amplitude-Python/issues (use the
amplitude-ailabel) - General questions: Amplitude Support
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file amplitude_ai-1.5.0.tar.gz.
File metadata
- Download URL: amplitude_ai-1.5.0.tar.gz
- Upload date:
- Size: 703.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30c672f3404801bc5a3a75b2ba592a0dd717d96789c1c7316bb66b496ec59d43
|
|
| MD5 |
66398aa7f5b14d71151ad66aeb46bb58
|
|
| BLAKE2b-256 |
69f5988c8f2c6cf09337d101691ac2baaf719179d5901c19b6b4862c4aba20d9
|
File details
Details for the file amplitude_ai-1.5.0-py3-none-any.whl.
File metadata
- Download URL: amplitude_ai-1.5.0-py3-none-any.whl
- Upload date:
- Size: 261.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
353cbcbcfbb37911f38101c0427a5304bee616bfea51e7f737b3ea26fa5497aa
|
|
| MD5 |
83a886cd1d310836408ba55f55a80d8d
|
|
| BLAKE2b-256 |
79c586ae0ad07115b00b1c119d470daacf20a7a2d54eb69a3d3049b5dc0759d1
|