Skip to main content

Enterprise-grade Observability and Evaluation SDK for Voice Agents

Project description

VoiceEval SDK (Python)

Python License OpenTelemetry

VoiceEval is an enterprise-grade observability and evaluation SDK for Voice Agents and LLM-powered applications. Built on OpenTelemetry, it provides zero-config auto-instrumentation with detailed tracing, latency breakdown, and cost analysis.

Key Features

  • Zero-Config Auto-Instrumentation: Automatically traces calls from major LLM providers (OpenAI, Anthropic, Google Gemini) and LiveKit Agents — no code changes needed.
  • LiveKit Native: Automatically integrates with LiveKit's tracing infrastructure. Just initialize the Client and all agent spans are captured.
  • Selective Monitoring: Control which calls are traced with auto_monitor, sample_rate, monitor_call(), and skip_call().
  • High Performance: Built on OpenTelemetry with async batch exports (OTLP/HTTP), ensuring negligible runtime overhead.

Installation

pip install voiceeval-sdk
# or
uv add voiceeval-sdk

Quickstart

1. Initialize the Client

Add a single Client(...) call at the top of your agent file. This sets up OTel tracing and auto-instruments all installed LLM libraries and LiveKit.

from voiceeval import Client

client = Client(
    api_key="your_voiceeval_api_key",   # or set VOICE_EVAL_API_KEY env var
    agent_name="my-booking-agent",      # identifies this agent in the dashboard
)

2. LiveKit Agent Example

from livekit.agents import Agent, AgentSession, JobContext, cli
from voiceeval import Client

# Initialize VoiceEval — auto-instruments all LLM calls and LiveKit spans
client = Client(
    api_key="your_voiceeval_api_key",
    agent_name="my-booking-agent",
)

class MyAgent(Agent):
    def __init__(self):
        super().__init__(instructions="You are a helpful voice assistant.")

@server.rtc_session(agent_name="my-agent")
async def entrypoint(ctx: JobContext):
    session = AgentSession(
        stt=...,
        llm=...,
        tts=...,
    )
    await session.start(agent=MyAgent(), room=ctx.room)
    await ctx.connect()

3. Standalone LLM Example

Works without LiveKit too — any OpenAI/Anthropic/Gemini calls are automatically traced:

from voiceeval import Client
from openai import OpenAI

client = Client(api_key="your_voiceeval_api_key")

openai_client = OpenAI()
response = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello world"}]
)
# Trace is automatically captured and exported

Client Options

Parameter Type Default Description
api_key str VOICE_EVAL_API_KEY env var Your VoiceEval API key
base_url str https://api.voiceeval.com/v1/traces VoiceEval ingestion endpoint
agent_name str None Agent identifier shown in the dashboard
auto_monitor bool True Monitor all calls automatically
sample_rate float 1.0 Fraction of calls to monitor (0.0 to 1.0)
span_post_processors list None Custom span post-processing functions

Selective Monitoring

By default, every call is monitored (auto_monitor=True). You can control this at the client level or per-call.

Sample a fraction of calls

client = Client(
    api_key="your_voiceeval_api_key",
    agent_name="my-booking-agent",
    sample_rate=0.1,  # Randomly monitor 10% of calls
)

Skip specific calls

With the default auto_monitor=True, all calls are monitored. Use skip_call() inside your session handler to opt out a specific call:

from voiceeval import Client, skip_call

client = Client(
    api_key="your_voiceeval_api_key",
    agent_name="my-booking-agent",
)

@server.rtc_session(agent_name="my-agent")
async def entrypoint(ctx: JobContext):
    # Decide based on room metadata, participant info, etc.
    if ctx.room.name.startswith("internal-"):
        skip_call()  # This call won't be monitored or evaluated

    session = AgentSession(stt=..., llm=..., tts=...)
    await session.start(agent=MyAgent(), room=ctx.room)
    await ctx.connect()

Monitor only specific calls

Set auto_monitor=False so no calls are monitored by default, then use monitor_call() to opt in:

from voiceeval import Client, monitor_call

client = Client(
    api_key="your_voiceeval_api_key",
    agent_name="my-booking-agent",
    auto_monitor=False,
)

@server.rtc_session(agent_name="my-agent")
async def entrypoint(ctx: JobContext):
    # Only monitor production calls, not test rooms
    if not ctx.room.name.startswith("test-"):
        monitor_call()  # This call will be traced and evaluated

    session = AgentSession(stt=..., llm=..., tts=...)
    await session.start(agent=MyAgent(), room=ctx.room)
    await ctx.connect()

When a call is skipped (or not opted in), spans still flow to Langfuse for the dashboard but won't create backend records or trigger evaluations.

Manual Tracing (Optional)

For non-LLM functions like business logic or RAG pipelines, use the @observe decorator:

from voiceeval import observe

@observe(name_override="rag_retrieval")
def retrieve_documents(query: str):
    # Your logic here
    return docs

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voiceeval_sdk-0.1.9.tar.gz (67.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voiceeval_sdk-0.1.9-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file voiceeval_sdk-0.1.9.tar.gz.

File metadata

  • Download URL: voiceeval_sdk-0.1.9.tar.gz
  • Upload date:
  • Size: 67.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for voiceeval_sdk-0.1.9.tar.gz
Algorithm Hash digest
SHA256 4f5ada785fbc87cfa117cbfaf5896dffd0be38b49287bbc72304418db56a3736
MD5 0266a7739cca6a5d55548967c00eff36
BLAKE2b-256 fae431cb93b77eba29ea93042c11deeae0816c2cb7fb7944d1a833911302bea9

See more details on using hashes here.

File details

Details for the file voiceeval_sdk-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: voiceeval_sdk-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for voiceeval_sdk-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 4ebd146169c2b1e46fc13879502aeba271d1f8b046575962f67568621e606e0d
MD5 d0a4899e62bf0833e13fc9a128cf15a2
BLAKE2b-256 c911951e9351250d9798125890de125aedb2b2061640be0a88ca5a03da890a3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page