OpenSearch AI Observability SDK — OTEL-native tracing and scoring for LLM applications

These details have not been verified by PyPI

Project description

OpenSearch GenAI SDK

OTel-native tracing and scoring for LLM applications. Instrument your AI workflows with standard OpenTelemetry spans and submit evaluation scores — all routed to OpenSearch through a single OTLP pipeline.

Features

One-line setup — register() configures the full OTel pipeline (TracerProvider, exporter, auto-instrumentation)
observe() — single decorator / context manager that creates OTel spans with GenAI semantic convention attributes
enrich() — add model, token usage, and other GenAI attributes to the active span from anywhere in your code
Auto-instrumentation — automatically discovers and activates installed instrumentor packages (OpenAI, Anthropic, Bedrock, LangChain, etc.)
Scoring — score() emits evaluation metrics as OTel spans at span, trace, or session level
Benchmarks — evaluate() runs your agent against a dataset with scorers; Benchmark uploads results from any eval framework (RAGAS, DeepEval, pytest)
AWS SigV4 — built-in SigV4 signing for AWS-hosted OpenSearch and Data Prepper endpoints
Zero lock-in — remove a decorator and your code still works; everything is standard OTel

Requirements

Python: 3.10, 3.11, 3.12, or 3.13
OpenTelemetry SDK: >=1.20.0, <2

Installation

pip install opensearch-genai-observability-sdk-py

The core package includes the OTel SDK and exporters. Auto-instrumentation of LLM libraries is opt-in — install only the providers you use:

# Single provider
pip install opensearch-genai-observability-sdk-py[openai]
pip install opensearch-genai-observability-sdk-py[anthropic]
pip install opensearch-genai-observability-sdk-py[bedrock]
pip install opensearch-genai-observability-sdk-py[langchain]

# Multiple providers
pip install "opensearch-genai-observability-sdk-py[openai,anthropic]"

# All instrumentors at once
pip install opensearch-genai-observability-sdk-py[otel-instrumentors]

# Everything
pip install opensearch-genai-observability-sdk-py[all]

Available extras: openai, anthropic, bedrock, google, langchain, llamaindex, otel-instrumentors (all instrumentors), all

Quick Start

from opensearch_genai_observability_sdk_py import register, observe, Op, enrich, score

# 1. Initialize tracing (one line)
register(endpoint="http://localhost:21890/opentelemetry/v1/traces")

# 2. Trace your functions
@observe(name="web_search", op=Op.EXECUTE_TOOL)
def search(query: str) -> list[dict]:
    return [{"title": f"Result for: {query}"}]

@observe(name="research_agent", op=Op.INVOKE_AGENT)
def research(query: str) -> str:
    results = search(query)
    enrich(model="gpt-4.1", provider="openai", input_tokens=150, output_tokens=50)
    return f"Summary of: {results}"

# 3. Use context managers for inline blocks
@observe(name="qa_pipeline", op=Op.INVOKE_AGENT)
def run(question: str) -> str:
    answer = research(question)
    with observe("safety_check", op="guardrail"):
        enrich(safe=True)
    return answer

result = run("What is OpenSearch?")

# 4. Submit scores (after workflow completes)
score(name="relevance", value=0.95, trace_id="...")

This produces the following span tree:

invoke_agent qa_pipeline
├── invoke_agent research_agent
│   └── execute_tool web_search
└── safety_check

Architecture

┌──────────────────────────────────────────────────────┐
│                   Your Application                    │
│                                                       │
│  @observe(op=Op.INVOKE_AGENT)   enrich()   score()   │
│  with observe("step", op=...)                         │
│                     │                                 │
│         opensearch-genai-observability-sdk-py         │
├──────────────────────────────────────────────────────┤
│  register()                                           │
│  ┌──────────────────────────────────────────────┐    │
│  │  TracerProvider                               │    │
│  │  ├── Resource (service.name)                  │    │
│  │  ├── BatchSpanProcessor                       │    │
│  │  │   └── OTLPSpanExporter (HTTP or gRPC)      │    │
│  │  │       └── SigV4 signing (AWS endpoints)    │    │
│  │  └── Auto-instrumentation                     │    │
│  │      ├── openai, anthropic, bedrock, ...      │    │
│  │      ├── langchain, llamaindex, haystack      │    │
│  │      └── chromadb, pinecone, qdrant, ...      │    │
│  └──────────────────────────────────────────────┘    │
└───────────────────────┬──────────────────────────────┘
                        │ OTLP (HTTP/gRPC)
                        ▼
               ┌─────────────────┐
               │  Data Prepper /  │
               │  OTel Collector  │
               └────────┬────────┘
                        │
                        ▼
               ┌─────────────────┐
               │   OpenSearch     │
               │  ├── traces      │
               │  └── scores      │
               └─────────────────┘

API Reference

`register()`

Configures the OTel tracing pipeline. Call once at startup.

register(
    endpoint="http://my-collector:4318/v1/traces",  # or use env vars
    service_name="my-app",
    batch=True,            # BatchSpanProcessor (True) or Simple (False)
    auto_instrument=True,  # discover installed instrumentor packages
)

Endpoint resolution (priority order):

endpoint= parameter — full URL, used as-is
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT env var — full URL, used as-is
OTEL_EXPORTER_OTLP_ENDPOINT env var — base URL, /v1/traces appended automatically
http://localhost:21890/opentelemetry/v1/traces — Data Prepper default

Protocol resolution (priority order):

protocol= parameter — "http" or "grpc"
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL env var
OTEL_EXPORTER_OTLP_PROTOCOL env var
Inferred from URL scheme

URL schemes:

URL scheme	Transport
`http://` / `https://`	HTTP OTLP (protobuf)
`grpc://`	gRPC (insecure)
`grpcs://`	gRPC (TLS)

http/json is not supported. A ValueError is raised if the protocol contradicts a grpc:// or grpcs:// URL scheme.

Authenticated endpoints (e.g. AWS OSIS): pass a custom exporter via exporter=:

from opensearch_genai_observability_sdk_py.exporters import AWSSigV4OTLPExporter

register(
    exporter=AWSSigV4OTLPExporter(
        endpoint="https://pipeline.us-east-1.osis.amazonaws.com/v1/traces",
        service="osis",
    )
)

AWSSigV4OTLPExporter is HTTP-only. AWS OSIS does not expose a gRPC endpoint.

`observe()`

Single tracing primitive — works as both a decorator and a context manager. Creates an OTel span with GenAI semantic convention attributes.

As a decorator:

@observe(name="planner", op=Op.INVOKE_AGENT)
def plan(query: str) -> str:
    enrich(model="gpt-4.1")
    return call_llm(query)

# Without parentheses (uses function name, no op)
@observe
def my_function():
    ...

As a context manager:

with observe("thinking", op=Op.CHAT) as span:
    enrich(model="gpt-4.1", input_tokens=1500)
    result = call_llm(prompt)

Parameters:

Parameter	Type	Default	Description
`name`	`str`	Function `__qualname__` (decorator) or `"unnamed"` (context manager)	Span name
`op`	`str`	`None`	`gen_ai.operation.name` value. Use `Op` constants or any custom string
`kind`	`SpanKind`	`INTERNAL`	OTel span kind. Use `SpanKind.CLIENT` for external service calls

Span naming: When op is a well-known value, the span name is "{op} {name}" (e.g. "invoke_agent planner"). Custom ops follow the same pattern.

Attributes set automatically:

Attribute	When set
`gen_ai.operation.name`	When `op` is provided
`gen_ai.agent.name`	All ops except `execute_tool`
`gen_ai.tool.name`	When `op=Op.EXECUTE_TOOL`
`gen_ai.input.messages` / `gen_ai.output.messages`	All ops except `execute_tool` (decorator only)
`gen_ai.tool.call.arguments` / `gen_ai.tool.call.result`	When `op=Op.EXECUTE_TOOL` (decorator only)

Supported function types: sync, async, generators, async generators. Errors are captured as span status + exception events.

`Op`

Constants for well-known gen_ai.operation.name values. Any custom string is also accepted.

Constant	Value	Use for
`Op.CHAT`	`"chat"`	LLM chat completions
`Op.INVOKE_AGENT`	`"invoke_agent"`	Agent invocations
`Op.CREATE_AGENT`	`"create_agent"`	Agent creation/setup
`Op.EXECUTE_TOOL`	`"execute_tool"`	Tool/function calls
`Op.RETRIEVAL`	`"retrieval"`	RAG retrieval steps
`Op.EMBEDDINGS`	`"embeddings"`	Embedding generation
`Op.GENERATE_CONTENT`	`"generate_content"`	Content generation
`Op.TEXT_COMPLETION`	`"text_completion"`	Text completions

Custom strings work too: @observe(name="check", op="guardrail").

`enrich()`

Add GenAI semantic convention attributes to the currently active span. Call from inside an @observe-decorated function or a with observe(...) block.

@observe(name="chat", op=Op.CHAT)
def chat(prompt: str) -> str:
    result = call_llm(prompt)
    enrich(
        model="gpt-4.1",
        provider="openai",
        input_tokens=150,
        output_tokens=50,
        temperature=0.7,
    )
    return result

Parameters:

Parameter	Attribute	Description
`model`	`gen_ai.request.model`	Model name
`provider`	`gen_ai.provider.name`	Provider name (openai, anthropic, etc.)
`input_tokens`	`gen_ai.usage.input_tokens`	Input token count
`output_tokens`	`gen_ai.usage.output_tokens`	Output token count
`total_tokens`	`gen_ai.usage.total_tokens`	Total token count
`response_id`	`gen_ai.response.id`	Response/completion ID
`finish_reason`	`gen_ai.response.finish_reasons`	Finish reason(s)
`temperature`	`gen_ai.request.temperature`	Temperature setting
`max_tokens`	`gen_ai.request.max_tokens`	Max tokens setting
`session_id`	`gen_ai.session.id`	Session/conversation ID
`**extra`	As provided	Any additional key-value attributes

`score()`

Submits evaluation scores as OTel spans. Use any evaluation framework you prefer (autoevals, RAGAS, custom) and submit the results through score().

The score span is attached to the evaluated trace so it appears in the same trace waterfall as the spans it evaluates.

Two scoring levels:

# Span-level: score a specific span (score becomes a child of that span)
score(
    name="accuracy",
    value=0.95,
    trace_id="6ebb9835f43af1552f2cebb9f5165e39",
    span_id="89829115c2128845",
    explanation="Weather data matches ground truth",
)

# Trace-level: score the entire trace (score attaches to the root span)
score(
    name="relevance",
    value=0.92,
    trace_id="6ebb9835f43af1552f2cebb9f5165e39",
    explanation="Response addresses the user's query",
    attributes={
        "test.suite.name": "nightly_eval",
        "test.case.result.status": "pass",
    },
)

Parameters:

Parameter	Type	Description
`name`	`str`	Metric name (e.g., `"relevance"`, `"factuality"`)
`value`	`float`	Numeric score
`trace_id`	`str`	Hex trace ID of the trace being scored
`span_id`	`str`	Hex span ID for span-level scoring. When omitted, attaches to root span
`label`	`str`	Human-readable label (`"pass"`, `"relevant"`, `"correct"`)
`explanation`	`str`	Evaluator justification (truncated to 500 chars)
`response_id`	`str`	LLM completion ID for correlation
`attributes`	`dict`	Additional span attributes (keys used as-is, e.g. `test.*` from semantic-conventions#3398)

Scores follow the OTel GenAI semantic conventions with gen_ai.evaluation.* attributes. Each score span also emits a gen_ai.evaluation.result event per the OTel GenAI event spec.

`evaluate()`

Run a task against a dataset, score outputs, and record results as OTel spans. Agent execution spans are children of each case span, giving full trace waterfall per case.

from opensearch_genai_observability_sdk_py import evaluate, EvalScore, observe, Op

@observe(op=Op.INVOKE_AGENT)
def my_agent(question: str) -> str:
    return call_llm(question)

def accuracy(input, output, expected) -> EvalScore:
    return EvalScore(name="accuracy", value=1.0 if expected in output else 0.0)

result = evaluate(
    name="rag-agent",
    task=my_agent,
    data=[
        {"input": "What is Python?", "expected": "programming language"},
        {"input": "What causes rain?", "expected": "water vapor"},
    ],
    scores=[accuracy],
    metadata={"agent_version": "v2"},
    record_io=True,
)

Produces:

test_suite_run rag-agent
 └── test_case [case: What is Python?]
      └── invoke_agent my_agent

Parameters:

Parameter	Type	Description
`name`	`str`	Benchmark name (`test.suite.name`). Stable across runs
`task`	`Callable`	Function that takes input and returns output. Decorate with `@observe()` for tracing
`data`	`list[dict]`	Dicts with `"input"` and optionally `"expected"`, `"case_id"`
`scores`	`list[Callable]`	Scorer functions. Each receives `(input, output, expected)` and returns `EvalScore`, `list[EvalScore]`, or `float`
`metadata`	`dict`	Attached to root span. Reserved keys (`test.`, `gen_ai.`) are filtered
`record_io`	`bool`	Record input/output/expected as span attributes (default `False`)

Returns BenchmarkResult with .summary (aggregate stats) and .cases (per-case results).

`Benchmark`

Upload pre-computed evaluation results from any framework (RAGAS, DeepEval, pytest, custom) as OTel spans. Use when you already have results and want to visualize them in agent-health or OpenSearch Dashboards.

from opensearch_genai_observability_sdk_py import Benchmark

# Upload results from your eval pipeline
with Benchmark(name="nightly-eval", metadata={"model": "gpt-4o"}, record_io=True) as b:
    b.log(input="What is Python?", output="A language", scores={"accuracy": 1.0})
    b.log(input="Capital of France?", output="Paris", scores={"accuracy": 1.0})

# Link to existing agent traces (click through from failed case → agent trace)
with Benchmark(name="ci-eval") as b:
    b.log(
        input="query",
        output="answer",
        scores={"accuracy": 0.9},
        trace_id="6ebb9835f43af1552f2cebb9f5165e39",
        span_id="89829115c2128845",
    )

`OpenSearchTraceRetriever`

Retrieves GenAI trace spans from OpenSearch. Works with any agent library that emits OTel GenAI semantic convention spans indexed by Data Prepper into otel-v1-apm-span-*.

from opensearch_genai_observability_sdk_py import OpenSearchTraceRetriever

# Option 1: Basic auth (local / docker-compose)
retriever = OpenSearchTraceRetriever(
    host="https://localhost:9200",
    auth=("admin", "admin"),
    verify_certs=False,
)

# Option 2: AWS OpenSearch Service (SigV4) — use this OR Option 1, not both
import boto3
from opensearchpy import RequestsAWSV4SignerAuth

credentials = boto3.Session().get_credentials()
auth = RequestsAWSV4SignerAuth(credentials, "us-west-2", "es")
retriever = OpenSearchTraceRetriever(
    host="https://search-my-domain.us-west-2.es.amazonaws.com",
    auth=auth,
)

# Retrieve all spans for a session or trace
session = retriever.get_traces("my-conversation-id")
for trace in session.traces:
    for span in trace.spans:
        print(f"{span.operation_name}: {span.name} ({span.model})")

# List recent root spans (for discovering traces to evaluate)
roots = retriever.list_root_spans(services=["my-agent"], max_results=10)

# Filter by time
from datetime import datetime
roots = retriever.list_root_spans(services=["my-agent"], since=datetime(2026, 3, 16))

# Check which traces already have evaluation spans
evaluated = retriever.find_evaluated_trace_ids(["trace-id-1", "trace-id-2"])

Constructor:

Parameter	Type	Default	Description
`host`	`str`	`"https://localhost:9200"`	OpenSearch endpoint
`index`	`str`	`"otel-v1-apm-span-*"`	Index pattern for span data
`auth`	`tuple \| RequestsAWSV4SignerAuth`	`None`	Basic auth tuple or SigV4 auth
`verify_certs`	`bool`	`True`	Verify TLS certificates

Methods:

Method	Returns	Description
`get_traces(identifier, max_spans=10000)`	`SessionRecord`	Fetch spans by conversation ID or trace ID
`list_root_spans(services=None, since=None, max_results=50)`	`list[SpanRecord]`	List recent root spans, optionally filtered by service
`find_evaluated_trace_ids(trace_ids)`	`set[str]`	Return subset of trace IDs that already have evaluation spans

Requires the [opensearch] extra: pip install opensearch-genai-observability-sdk-py[opensearch]

Auto-Instrumented Libraries

register() automatically discovers and activates any installed instrumentor packages via OTel entry points. No code changes needed — install the extras for the providers you use and their calls are traced automatically.

Category	Extras / packages
LLM providers	`[openai]`, `[anthropic]`
Cloud AI	`[bedrock]`, `[google]` (Vertex AI + Generative AI)
Frameworks	`[langchain]`, `[llamaindex]`
All of the above + more	`[otel-instrumentors]`

[otel-instrumentors] includes all of the above plus Cohere, Mistral, Groq, Ollama, Together, Replicate, Writer, Voyage AI, Aleph Alpha, SageMaker, watsonx, Haystack, CrewAI, Agno, MCP, Transformers, ChromaDB, Pinecone, Qdrant, Weaviate, Milvus, LanceDB, and Marqo.

Configuration

Environment Variable	Description	Default
`OTEL_EXPORTER_OTLP_TRACES_ENDPOINT`	Full OTLP traces endpoint URL	—
`OTEL_EXPORTER_OTLP_ENDPOINT`	Base OTLP endpoint URL (`/v1/traces` appended)	—
`OTEL_EXPORTER_OTLP_TRACES_PROTOCOL`	Protocol for traces (`http/protobuf`, `grpc`)	—
`OTEL_EXPORTER_OTLP_PROTOCOL`	Protocol for all signals (`http/protobuf`, `grpc`)	—
`OTEL_SERVICE_NAME`	Service name for spans	`"default"`
`OPENSEARCH_PROJECT`	Project/service name (fallback)	`"default"`
`AWS_DEFAULT_REGION`	AWS region for SigV4 signing	auto-detected

When no endpoint env var is set, register() defaults to the Data Prepper endpoint: http://localhost:21890/opentelemetry/v1/traces.

Examples

See the examples/ directory:

Example	Description
`01_tracing_basics.py`	`@observe` decorator, context manager, `enrich()`
`02_scoring.py`	Span-level, trace-level, and session-level scoring
`03_aws_sigv4.py`	AWS SigV4 authentication with `AWSSigV4OTLPExporter`
`04_async_tracing.py`	Async function tracing with `@observe`
`05_openai_auto_instrument.py`	OpenAI auto-instrumentation via `register()`
`06_retrieval_and_eval.py`	Retrieve traces from OpenSearch, evaluate, write scores back
`07_benchmarks.py`	`evaluate()` with scorers, compare agent versions
`08_upload_benchmark_results.py`	`Benchmark.log()` — upload results from RAGAS, DeepEval, custom, with trace links

License

Apache-2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.8

Mar 30, 2026

0.2.7

Mar 17, 2026

0.2.6

Mar 12, 2026

0.2.5

Mar 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opensearch_genai_observability_sdk_py-0.2.8.tar.gz (48.1 kB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

opensearch_genai_observability_sdk_py-0.2.8-py3-none-any.whl (40.1 kB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file opensearch_genai_observability_sdk_py-0.2.8.tar.gz.

File metadata

Download URL: opensearch_genai_observability_sdk_py-0.2.8.tar.gz
Upload date: Mar 30, 2026
Size: 48.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for opensearch_genai_observability_sdk_py-0.2.8.tar.gz
Algorithm	Hash digest
SHA256	`9fbca1fcabf7d4da743821fad211f099e4a7811a7b2be5e4322fb107e2f089da`
MD5	`63c633a5dab6da046276f8cfce4cc6d0`
BLAKE2b-256	`0c5347ee59a0086658d892a93021f1a68b3ccbf9b769c3fb415ac181563cc913`

See more details on using hashes here.

Provenance

The following attestation bundles were made for opensearch_genai_observability_sdk_py-0.2.8.tar.gz:

Publisher: release-drafter.yml on opensearch-project/genai-observability-sdk-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: opensearch_genai_observability_sdk_py-0.2.8.tar.gz
- Subject digest: 9fbca1fcabf7d4da743821fad211f099e4a7811a7b2be5e4322fb107e2f089da
- Sigstore transparency entry: 1200657000
- Sigstore integration time: Mar 30, 2026
Source repository:
- Permalink: opensearch-project/genai-observability-sdk-py@2f0b8fc40b7af3cc6b218bcac49da69ec5778fd2
- Branch / Tag: refs/tags/0.2.8
- Owner: https://github.com/opensearch-project
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-drafter.yml@2f0b8fc40b7af3cc6b218bcac49da69ec5778fd2
- Trigger Event: push

File details

Details for the file opensearch_genai_observability_sdk_py-0.2.8-py3-none-any.whl.

File metadata

Download URL: opensearch_genai_observability_sdk_py-0.2.8-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 40.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for opensearch_genai_observability_sdk_py-0.2.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4da2864b7a3f9e2635c26596866f6c4d35f3bb9ee736cb846f097f63afbe1528`
MD5	`ef37a1cbfb7a6be5e7b81c74dadb63b0`
BLAKE2b-256	`e2950da1db854936b957af90dc79dfa161a965a80ab9407369105e8c47a91f5d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for opensearch_genai_observability_sdk_py-0.2.8-py3-none-any.whl:

Publisher: release-drafter.yml on opensearch-project/genai-observability-sdk-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: opensearch_genai_observability_sdk_py-0.2.8-py3-none-any.whl
- Subject digest: 4da2864b7a3f9e2635c26596866f6c4d35f3bb9ee736cb846f097f63afbe1528
- Sigstore transparency entry: 1200657137
- Sigstore integration time: Mar 30, 2026
Source repository:
- Permalink: opensearch-project/genai-observability-sdk-py@2f0b8fc40b7af3cc6b218bcac49da69ec5778fd2
- Branch / Tag: refs/tags/0.2.8
- Owner: https://github.com/opensearch-project
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-drafter.yml@2f0b8fc40b7af3cc6b218bcac49da69ec5778fd2
- Trigger Event: push

opensearch-genai-observability-sdk-py 0.2.8

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

OpenSearch GenAI SDK

Features

Requirements

Installation

Quick Start

Architecture

API Reference

register()

observe()

Op

enrich()

score()

evaluate()

Benchmark

OpenSearchTraceRetriever

Auto-Instrumented Libraries

Configuration

Examples

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`register()`

`observe()`

`Op`

`enrich()`

`score()`

`evaluate()`

`Benchmark`

`OpenSearchTraceRetriever`