Skip to main content

Last9 observability attributes for OpenTelemetry GenAI spans - track costs, workflows, and conversations in LLM applications

Project description

Last9 GenAI - Python SDK

OpenTelemetry extension for LLM observability: track conversations, workflows, and costs

PyPI version PyPI downloads Python 3.10+ License: MIT

Overview

Track conversations and workflows in your LLM applications with automatic context propagation. Built on OpenTelemetry for seamless integration with your existing observability stack.

Not a replacement for OTel auto-instrumentation — works alongside it or standalone.

Key Features:

  • 🎯 Conversation Tracking: Automatic multi-turn conversation tracking with conversation_context
  • 🤖 Agent Tracking: First-class agent identity with agent_context (OTel gen_ai.agent.* semantic conventions)
  • 🔄 Workflow Management: Track complex multi-step AI workflows with workflow_context
  • 🎨 Zero-Touch Instrumentation: @observe() decorator for automatic tracking
  • 📊 Context Propagation: Thread-safe attribute tracking across nested operations
  • 💰 Optional Cost Tracking: Bring your own pricing for cost monitoring
  • 🏷️ Span Classification: Filter by type (llm/tool/chain/agent/prompt)

Features

Core Tracking

  • 🎯 Conversation Tracking: Multi-turn conversations with gen_ai.conversation.id and turn numbers
  • 🤖 Agent Identity: Track agents with gen_ai.agent.id, gen_ai.agent.name, gen_ai.agent.version (OTel semantic conventions)
  • 🔄 Workflow Management: Track multi-step AI operations across LLM calls, tools, and retrievals
  • 📊 Auto-Context Propagation: Thread-safe context managers that automatically tag all nested operations
  • 🎨 Decorator Pattern: @observe() for zero-touch instrumentation with full input/output/latency tracking
  • 🔧 SpanProcessor: Automatic context enrichment for all spans in your application

Enhanced Observability

  • 🏷️ Span Classification: gen_ai.l9.span.kind for filtering (llm/tool/chain/agent/prompt)
  • 🛠️ Tool/Function Tracking: Enhanced attributes for function calls and tool usage
  • Performance Metrics: Response times, token counts, and quality scores
  • 🌐 Provider Agnostic: Works with OpenAI, Anthropic, Google, Cohere, etc.
  • 📏 Standard Attributes: Full OpenTelemetry gen_ai.* semantic conventions

Optional Features

  • 💰 Cost Tracking: Bring your own model pricing for cost monitoring
  • 💸 Workflow Costing: Aggregate costs across multi-step operations

Relationship to OpenTelemetry GenAI

This is an EXTENSION, not a replacement:

Package Purpose Approach
OTel GenAI
opentelemetry-instrumentation-openai-v2
Auto-instrument LLM SDKs Automatic (monkey-patching)
Last9 GenAI
last9-genai
Add conversation/workflow tracking Context-based enrichment

You can use:

  1. Last9 GenAI alone - Full conversation and workflow tracking
  2. Both together - OTel auto-traces + Last9 adds conversation/workflow context (recommended!)

See Working with OTel Auto-Instrumentation for combined usage.

Installation

Basic:

pip install last9-genai

With OTLP export (recommended):

pip install last9-genai[otlp]

Requirements:

  • Python 3.10+
  • opentelemetry-api>=1.20.0
  • opentelemetry-sdk>=1.20.0

Quick Start

Note: The examples below use client to represent your LLM client. Initialize your preferred provider:

# OpenAI
from openai import OpenAI
client = OpenAI()

# Or Anthropic
from anthropic import Anthropic
anthropic_client = Anthropic()

# Or any other provider (Google, Cohere, etc.)

The SDK works with any LLM provider - just use your client normally!

Track Conversations (Recommended)

Automatically track multi-turn conversations with zero manual instrumentation:

from last9_genai import conversation_context, Last9SpanProcessor
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider

# Setup tracing with Last9 processor
provider = TracerProvider()
trace.set_tracer_provider(provider)
provider.add_span_processor(Last9SpanProcessor())

# Track conversations automatically - works with any LLM provider
with conversation_context(conversation_id="session_123", user_id="user_456"):
    # OpenAI
    response1 = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )

    # Anthropic (same context!)
    response2 = anthropic_client.messages.create(
        model="claude-sonnet-4",
        messages=[{"role": "user", "content": "How are you?"}]
    )
    # Both calls automatically have conversation_id = "session_123"!

Track Workflows

Track complex multi-step AI operations:

from last9_genai import workflow_context

# Track entire workflow with automatic tagging
with workflow_context(workflow_id="rag_search", workflow_type="retrieval"):
    # All operations automatically tagged with workflow_id
    docs = retrieve_documents(query)  # Tagged
    context = rerank_documents(docs)   # Tagged
    response = generate_answer(context) # Tagged
    # Full workflow visibility with zero manual instrumentation!

# Nest workflows and conversations
with conversation_context(conversation_id="support_123"):
    with workflow_context(workflow_id="order_lookup"):
        # Both conversation AND workflow tracked automatically
        result = lookup_and_respond()

Track Agents

Track agent identity using OTel GenAI semantic conventions (gen_ai.agent.*):

from last9_genai import agent_context

# Track agent identity — all child spans get gen_ai.agent.* attributes
with agent_context(agent_id="support_bot_v2", agent_name="Support Bot", agent_version="2.0"):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Help me with my order"}]
    )
    # Span automatically has gen_ai.agent.id, gen_ai.agent.name, gen_ai.agent.version

# Nest with conversations for full context
with conversation_context(conversation_id="session_123", user_id="user_456"):
    with agent_context(agent_id="router_agent", agent_name="Router"):
        route = classify_intent(query)

    with agent_context(agent_id="support_agent", agent_name="Support"):
        response = handle_support(query)
    # Each agent's spans are tagged separately, both share the conversation

Decorator Pattern (Zero-Touch)

Use @observe() for automatic tracking of everything:

from last9_genai import observe

@observe()  # That's it!
def call_llm(prompt: str):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response

# Automatically tracks:
# - Input (prompt)
# - Output (response)
# - Latency (span duration)
# - Context (conversation_id, workflow_id if set)

# Works seamlessly with context managers
with conversation_context(conversation_id="session_456"):
    response = call_llm("Explain quantum computing")
    # Span automatically has conversation_id!

Optional: Cost Tracking

Add cost monitoring by providing model pricing:

from last9_genai import ModelPricing

# Add pricing when creating processor
processor = Last9SpanProcessor(custom_pricing={
    "gpt-4o": ModelPricing(input=2.50, output=10.0),
    "claude-sonnet-4-5": ModelPricing(input=3.0, output=15.0),
})

# Or with decorator
pricing = {"gpt-4o": ModelPricing(input=2.50, output=10.0)}

@observe(pricing=pricing)
def call_llm(prompt: str):
    # Now also tracks cost automatically
    return client.chat.completions.create(...)

Decorator Pattern (Zero-Touch)

Use @observe() decorator for automatic tracking of input/output, latency, and cost:

from last9_genai import observe, ModelPricing

pricing = {"gpt-4o": ModelPricing(input=2.50, output=10.0)}

@observe(pricing=pricing)
def call_openai(prompt: str):
    """Automatically tracks everything!"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response

# That's it! Automatically tracks:
# - Input (prompt)
# - Output (response)
# - Latency (span duration)
# - Cost (calculated from usage)
# - Metadata (from context)

# Works with context too:
with conversation_context(conversation_id="session_123"):
    response = call_openai("Hello!")
    # Span automatically has conversation_id!

Tags and Categories

Add tags and categories for better filtering and organization in your observability platform:

from last9_genai import observe

@observe(
    tags=["production", "customer_support"],
    metadata={
        "category": "customer_support",  # Appears in Last9 dashboard Category column
        "version": "1.0.0",
        "priority": "high"
    }
)
def handle_support_query(query: str):
    """Categorized LLM call with metadata"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": query}]
    )
    return response

# Categories automatically appear in Last9 dashboard:
# - Category column in traces table
# - Category filter dropdown
# - Enhanced trace details

# Use underscores for multi-word categories:
@observe(metadata={"category": "data_analysis"})  # Shows as "data analysis"
def analyze_data(data: str):
    return client.chat.completions.create(...)

Common categories:

  • customer_support, conversational_ai, code_assistant
  • data_analysis, content_generation, summarization
  • translation, research, qa_automation

Working with OTel Auto-Instrumentation

Recommended: Combine OTel auto-instrumentation with Last9 extensions:

# Step 1: Auto-instrument with OpenTelemetry (standard attributes)
from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor
OpenAIInstrumentor().instrument()

# Step 2: Add Last9 extensions (cost, workflows)
from last9_genai import Last9GenAI, ModelPricing

l9 = Last9GenAI(custom_pricing={
    "gpt-4o": ModelPricing(input=2.50, output=10.0),
})

# Now make LLM calls
from openai import OpenAI
client = OpenAI()

# OTel automatically traces this call (standard attributes)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

# Last9 adds cost on top of auto-traced span
from opentelemetry import trace
span = trace.get_current_span()
usage = {
    "input_tokens": response.usage.prompt_tokens,
    "output_tokens": response.usage.completion_tokens,
}
cost = l9.add_llm_cost_attributes(span, "gpt-4o", usage)
print(f"Cost: ${cost.total:.6f}")

Result: You get standard OTel attributes (automatic) + Last9 cost/workflow (manual).

Capturing Prompts, Completions, and Tool Calls

opentelemetry-instrumentation-openai-v2 (v2.x) follows the new OpenTelemetry GenAI semantic conventions and emits message content, tool calls, and completions as OTel log events, not as span attributes. The Last9 LLM dashboard reads span attributes / events, so without a bridge those payloads never reach the dashboard.

Last9LogToSpanProcessor listens to those log events and promotes their payloads onto the currently active span:

  • gen_ai.prompt (JSON array of prompt messages)
  • gen_ai.completion (JSON array of completion choices)
  • span events gen_ai.content.prompt / gen_ai.content.completion
  • indexed gen_ai.prompt.{i}.* / gen_ai.completion.{i}.* (AgentOps / Traceloop compatible)
from opentelemetry import trace, _logs
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk._logs import LoggerProvider
from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor

from last9_genai import Last9SpanProcessor, Last9LogToSpanProcessor

log_bridge = Last9LogToSpanProcessor()

tracer_provider = TracerProvider()
tracer_provider.add_span_processor(Last9SpanProcessor(log_processor=log_bridge))
trace.set_tracer_provider(tracer_provider)

logger_provider = LoggerProvider()
logger_provider.add_log_record_processor(log_bridge)
_logs.set_logger_provider(logger_provider)

import os
os.environ["OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT"] = "true"
OpenAIInstrumentor().instrument(logger_provider=logger_provider)

After this, every LLM call instrumented by openai-v2 has its full prompt and completion content available on the span.

Python 3.14 users: pin wrapt<2. opentelemetry-instrumentation-openai-v2 2.3b0 calls wrap_function_wrapper(module=..., name=..., wrapper=...) and wrapt 2.0 renamed the first kwarg to target=. Without the pin, instrumentation fails silently and no log events are emitted.

Usage Examples

Multi-Turn Conversations

Track conversations across multiple turns automatically:

from last9_genai import conversation_context

# Track a complete conversation session
with conversation_context(conversation_id="support_session_456", user_id="user_456"):
    # Turn 1
    response1 = client.chat.completions.create(
        messages=[{"role": "user", "content": "I need help with my order"}]
    )

    # Turn 2
    response2 = client.chat.completions.create(
        messages=[
            {"role": "user", "content": "I need help with my order"},
            {"role": "assistant", "content": response1.choices[0].message.content},
            {"role": "user", "content": "Order #12345"}
        ]
    )

    # Both calls automatically tagged with:
    # - conversation_id = "support_session_456"
    # - user_id = "user_456"
    # All turns linked together for analysis!

Complex Workflows

Track multi-step AI workflows with automatic tagging:

from last9_genai import workflow_context

# RAG workflow example
with workflow_context(workflow_id="rag_pipeline", workflow_type="retrieval"):
    # Step 1: Query expansion (automatically tagged)
    expanded_query = expand_query(user_question)

    # Step 2: Retrieval (automatically tagged)
    documents = vector_search(expanded_query)

    # Step 3: Reranking (automatically tagged)
    relevant_docs = rerank(documents, user_question)

    # Step 4: Generation (automatically tagged)
    response = generate_answer(relevant_docs, user_question)

# All 4 steps automatically have:
# - workflow_id = "rag_pipeline"
# - workflow_type = "retrieval"
# Perfect for analyzing bottlenecks and performance!

### Nested Workflows and Conversations

Combine conversation and workflow tracking:

```python
# Track conversation
with conversation_context(conversation_id="user_session_789", user_id="user_789"):

    # Inside conversation, track a specific workflow
    with workflow_context(workflow_id="product_search", workflow_type="search"):
        # Search workflow steps
        results = search_products(query)
        recommendations = rank_results(results)

    # Outside workflow, still in conversation
    followup = handle_followup_question()

# Result:
# - search_products and rank_results: both conversation_id AND workflow_id
# - handle_followup_question: only conversation_id
# Perfect granularity for analysis!

Tool/Function Tracking

Track tool calls:

with tracer.start_span("gen_ai.tool.search") as span:
    l9.add_tool_attributes(
        span,
        tool_name="web_search",
        tool_type="search",
        arguments={"query": "weather"},
        result={"temp": 72},
        duration_ms=150
    )

OpenTelemetry Integration

Export to Last9

export OTEL_EXPORTER_OTLP_ENDPOINT="https://otlp.last9.io:443"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic YOUR_KEY"
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Setup
trace.set_tracer_provider(TracerProvider())
otlp_exporter = OTLPSpanExporter()
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(otlp_exporter)
)

Export to Console (Development)

from opentelemetry.sdk.trace.export import ConsoleSpanExporter

console_exporter = ConsoleSpanExporter()
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(console_exporter)
)

Configuration

Disable Cost Tracking

# Track tokens only, skip cost calculation
l9 = Last9GenAI(enable_cost_tracking=False)

Custom Workflow Tracker

from last9_genai import WorkflowCostTracker

tracker = WorkflowCostTracker()
l9 = Last9GenAI(workflow_tracker=tracker)

Attributes Reference

Standard OpenTelemetry (Always Set)

gen_ai.system = "openai"
gen_ai.request.model = "gpt-4o"
gen_ai.usage.input_tokens = 150
gen_ai.usage.output_tokens = 250

Last9 Extensions (Optional)

# Cost (when pricing provided)
gen_ai.usage.cost_usd = 0.00225
gen_ai.usage.cost_input_usd = 0.000375
gen_ai.usage.cost_output_usd = 0.0025

# Classification
gen_ai.l9.span.kind = "llm"  # or "tool", "prompt"

# Workflow
workflow.id = "customer_support"
workflow.total_cost_usd = 0.015
workflow.llm_calls = 3

# Conversation
gen_ai.conversation.id = "session_123"
gen_ai.conversation.turn_number = 2

# Agent (OTel GenAI semantic conventions)
gen_ai.agent.id = "support_bot_v2"
gen_ai.agent.name = "Support Bot"
gen_ai.agent.version = "2.0"

Model Pricing

No default pricing included. You provide pricing for models you use.

Finding Pricing

Pricing Format

All prices in USD per million tokens:

ModelPricing(
    input=3.0,   # $3 per 1M input tokens
    output=15.0  # $15 per 1M output tokens
)

Conversion:

  • Per-token: $0.0000033.0
  • Per-1K: $0.0033.0

Common Models (February 2026)

custom_pricing = {
    # Anthropic
    "claude-opus-4-6": ModelPricing(input=15.0, output=75.0),
    "claude-sonnet-4-5": ModelPricing(input=3.0, output=15.0),
    "claude-haiku-4-5": ModelPricing(input=0.8, output=4.0),

    # OpenAI
    "gpt-4o": ModelPricing(input=2.50, output=10.0),
    "gpt-4o-mini": ModelPricing(input=0.15, output=0.60),
    "o1": ModelPricing(input=15.0, output=60.0),

    # Google
    "gemini-1.5-pro": ModelPricing(input=1.25, output=10.0),
    "gemini-2.0-flash": ModelPricing(input=0.075, output=0.30),
}

Special Cases

Azure OpenAI:

custom_pricing = {
    "azure/gpt-4o": ModelPricing(input=2.50, output=10.0),
}

Self-hosted (free):

custom_pricing = {
    "ollama/llama3.1": ModelPricing(input=0.0, output=0.0),
}

Fine-tuned:

custom_pricing = {
    "ft:gpt-3.5-turbo:org:model:id": ModelPricing(input=12.0, output=16.0),
}

Examples

See examples/ directory:

Basic Usage:

Auto-Tracking (Recommended):

Advanced:

Contributing

Contributions welcome! Please:

  1. Fork the repo
  2. Create a feature branch
  3. Add tests
  4. Submit a PR

License

MIT License - see LICENSE

Support


Built with ❤️ by Last9

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

last9_genai-1.2.0.tar.gz (78.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

last9_genai-1.2.0-py3-none-any.whl (30.4 kB view details)

Uploaded Python 3

File details

Details for the file last9_genai-1.2.0.tar.gz.

File metadata

  • Download URL: last9_genai-1.2.0.tar.gz
  • Upload date:
  • Size: 78.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for last9_genai-1.2.0.tar.gz
Algorithm Hash digest
SHA256 e7a7c006931ef2282ec47d7110a35dd76e1d01e6801f19e8893f4668f8c11710
MD5 4107c5bcb801e6cbd2de1b6e374cf6e3
BLAKE2b-256 7b441e88dde3b16a74cc1336a232148f90bf1bcc81cb39856181f14322333a46

See more details on using hashes here.

File details

Details for the file last9_genai-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: last9_genai-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 30.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for last9_genai-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8cf4492b675701fd32c34df7510cdeda9791a849901af2e581ee3b1a69e38347
MD5 29c215bcc98739002a134821ad0b62ef
BLAKE2b-256 f0d31ac924dc1f3fe21eb81cbf15654560815703c3e28feba1daf86386e052d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page