Skip to main content

Microsoft Corporation Azure Ai Voicelive Client Library for Python

Project description

Azure AI VoiceLive client library for Python

This package provides a real-time, speech-to-speech client for Azure AI VoiceLive. It opens a WebSocket session to stream microphone audio to the service and receive typed server events (including audio) for responsive, interruptible conversations.

Status: General Availability (GA). This is a stable release suitable for production use.

Important: As of version 1.0.0, this SDK is async-only. The synchronous API has been removed to focus exclusively on async patterns. All examples and samples use async/await syntax.


Getting started

Prerequisites

  • Python 3.9+
  • An Azure subscription
  • A VoiceLive resource and endpoint
  • A working microphone and speakers/headphones if you run the voice samples

Install

Install the stable GA version:

# Base install (core client only)
python -m pip install azure-ai-voicelive

# For asynchronous streaming (uses aiohttp)
python -m pip install "azure-ai-voicelive[aiohttp]"

# For voice samples (includes audio processing)
# First install PyAudio dependencies for your platform:
#   Linux: sudo apt-get install -y portaudio19-dev libasound2-dev
#   macOS: brew install portaudio
python -m pip install azure-ai-voicelive[aiohttp] pyaudio python-dotenv

The SDK provides async-only WebSocket connections using aiohttp for optimal performance and reliability.

Authenticate

You can authenticate with an API key or an Azure Active Directory (AAD) token.

API Key Authentication (Quick Start)

Set environment variables in a .env file or directly in your environment:

# In your .env file or environment variables
AZURE_VOICELIVE_API_KEY="your-api-key"
AZURE_VOICELIVE_ENDPOINT="your-endpoint"

Then, use the key in your code:

import asyncio
from azure.core.credentials import AzureKeyCredential
from azure.ai.voicelive import connect

async def main():
    async with connect(
        endpoint="your-endpoint",
        credential=AzureKeyCredential("your-api-key"),
        model="gpt-4o-realtime-preview"
    ) as connection:
        # Your async code here
        pass

asyncio.run(main())

AAD Token Authentication

For production applications, AAD authentication is recommended:

import asyncio
from azure.identity.aio import DefaultAzureCredential
from azure.ai.voicelive import connect

async def main():
    credential = DefaultAzureCredential()
    
    async with connect(
        endpoint="your-endpoint",
        credential=credential,
        model="gpt-4o-realtime-preview"
    ) as connection:
        # Your async code here
        pass

asyncio.run(main())

Key concepts

  • VoiceLiveConnection – Manages an active async WebSocket connection to the service
  • Session Management – Configure conversation parameters:
    • SessionResource – Update session parameters (voice, formats, VAD) with async methods
    • RequestSession – Strongly-typed session configuration
    • ServerVad – Configure voice activity detection
    • AzureStandardVoice – Configure voice settings
  • Audio Handling:
    • InputAudioBufferResource – Manage audio input to the service with async methods
    • OutputAudioBufferResource – Control audio output from the service with async methods
  • Conversation Management:
    • ResponseResource – Create or cancel model responses with async methods
    • ConversationResource – Manage conversation items with async methods
  • Error Handling:
    • ConnectionError – Base exception for WebSocket connection errors
    • ConnectionClosed – Raised when WebSocket connection is closed
  • Strongly-Typed Events – Process service events with type safety:
    • SESSION_UPDATED, RESPONSE_AUDIO_DELTA, RESPONSE_DONE
    • INPUT_AUDIO_BUFFER_SPEECH_STARTED, INPUT_AUDIO_BUFFER_SPEECH_STOPPED
    • ERROR, and more

Examples

Basic Voice Assistant (Featured Sample)

The Basic Voice Assistant sample demonstrates full-featured voice interaction with:

  • Real-time speech streaming
  • Server-side voice activity detection
  • Interruption handling
  • High-quality audio processing
# Run the basic voice assistant sample
# Requires [aiohttp] for async
python samples/basic_voice_assistant_async.py

# With custom parameters
python samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions "You're a helpful assistant"

Minimal example

import asyncio
from azure.core.credentials import AzureKeyCredential
from azure.ai.voicelive.aio import connect
from azure.ai.voicelive.models import (
    RequestSession, Modality, InputAudioFormat, OutputAudioFormat, ServerVad, ServerEventType
)

API_KEY = "your-api-key"
ENDPOINT = "wss://your-endpoint.com/openai/realtime"
MODEL = "gpt-4o-realtime-preview"

async def main():
    async with connect(
        endpoint=ENDPOINT,
        credential=AzureKeyCredential(API_KEY),
        model=MODEL,
    ) as conn:
        session = RequestSession(
            modalities=[Modality.TEXT, Modality.AUDIO],
            instructions="You are a helpful assistant.",
            input_audio_format=InputAudioFormat.PCM16,
            output_audio_format=OutputAudioFormat.PCM16,
            turn_detection=ServerVad(
                threshold=0.5, 
                prefix_padding_ms=300, 
                silence_duration_ms=500
            ),
        )
        await conn.session.update(session=session)

        # Process events
        async for evt in conn:
            print(f"Event: {evt.type}")
            if evt.type == ServerEventType.RESPONSE_DONE:
                break

asyncio.run(main())

Available Voice Options

Azure Neural Voices

# Use Azure Neural voices
voice_config = AzureStandardVoice(
    name="en-US-AvaNeural",  # Or another voice name
    type="azure-standard"
)

Popular voices include:

  • en-US-AvaNeural - Female, natural and professional
  • en-US-JennyNeural - Female, conversational
  • en-US-GuyNeural - Male, professional

OpenAI Voices

# Use OpenAI voices (as string)
voice_config = "alloy"  # Or another OpenAI voice

Available OpenAI voices:

  • alloy - Versatile, neutral
  • echo - Precise, clear
  • fable - Animated, expressive
  • onyx - Deep, authoritative
  • nova - Warm, conversational
  • shimmer - Optimistic, friendly

Handling Events

async for event in connection:
    if event.type == ServerEventType.SESSION_UPDATED:
        print(f"Session ready: {event.session.id}")
        # Start audio capture
        
    elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED:
        print("User started speaking")
        # Stop playback and cancel any current response
        
    elif event.type == ServerEventType.RESPONSE_AUDIO_DELTA:
        # Play the audio chunk
        audio_bytes = event.delta
        
    elif event.type == ServerEventType.ERROR:
        print(f"Error: {event.error.message}")

Troubleshooting

Connection Issues

  • WebSocket connection errors (1006/timeout):
    Verify AZURE_VOICELIVE_ENDPOINT, network rules, and that your credential has access.

  • Missing WebSocket dependencies:
    If you see import errors, make sure you have installed the package: pip install azure-ai-voicelive[aiohttp]

  • Auth failures:
    For API key, double-check AZURE_VOICELIVE_API_KEY. For AAD, ensure the identity is authorized.

Audio Device Issues

  • No microphone/speaker detected:
    Check device connections and permissions. On headless CI environments, audio samples can't run.

  • Audio library installation problems:
    On Linux/macOS you may need PortAudio:

    # Debian/Ubuntu
    sudo apt-get install -y portaudio19-dev libasound2-dev
    # macOS (Homebrew)
    brew install portaudio
    

Enable Verbose Logging

import logging
logging.basicConfig(level=logging.DEBUG)

Next steps

  1. Run the featured sample:

    • Try samples/basic_voice_assistant_async.py for a complete voice assistant implementation
  2. Customize your implementation:

    • Experiment with different voices and parameters
    • Add custom instructions for specialized assistants
    • Integrate with your own audio capture/playback systems
  3. Advanced scenarios:

    • Add function calling support
    • Implement tool usage
    • Create multi-turn conversations with history
  4. Explore other samples:

    • Check the samples/ directory for specialized examples
    • See samples/README.md for a full list of samples

Contributing

This project follows the Azure SDK guidelines. If you'd like to contribute:

  1. Fork the repo and create a feature branch
  2. Run linters and tests locally
  3. Submit a pull request with a clear description of the change

Release notes

Changelogs are available in the package directory.


License

This project is released under the MIT License.

Release History

1.2.0 (2026-05-22)

Features Added

  • Web Search & File Search: Added support for built-in web search and file search tools:
    • New item types: ResponseWebSearchCallItem, ResponseFileSearchCallItem
    • New server events for web/file search lifecycle (searching, in_progress, completed)
    • New models: ActionFind, ActionOpenPage, ActionSearch, ActionSearchSource, FileSearchResult
    • New enum values: ItemType.WEB_SEARCH_CALL, ItemType.FILE_SEARCH_CALL
    • New SessionIncludeOption enum for controlling what data is included in session responses
  • MCP (Model Context Protocol) Support: Added comprehensive support for Model Context Protocol integration:
    • MCPServer tool type for defining MCP server configurations with authorization, headers, and approval requirements
    • MCPTool model for representing MCP tool definitions with input schemas and annotations
    • MCPApprovalType enum for controlling approval workflows (never, always, or tool-specific)
    • New item types for MCP approval and call workflows
    • New server events for MCP tool listing, call lifecycle, and approval flows
  • Avatar Enhancements:
    • Added AzureAvatarVoiceSyncVoice for avatar voice sync configuration
    • Added ServerEventSessionAvatarSwitchToIdle and ServerEventSessionAvatarSwitchToSpeaking events
    • Added ServerEventResponseVideoDelta for avatar video frame streaming
    • Added ClientEventOutputAudioBufferClear and ServerEventOutputAudioBufferCleared for output buffer management
    • Added AvatarConfigTypes enum with support for video-avatar and photo-avatar types
    • Added AvatarOutputProtocol enum for avatar streaming protocols (webrtc, websocket)
    • Added Scene model for controlling avatar zoom, position, rotation, and movement amplitude
    • Added output_audit_audio field to AvatarConfig
  • OpenTelemetry Tracing Support: Added VoiceLiveInstrumentor for opt-in OpenTelemetry-based tracing of VoiceLive WebSocket connections, following Azure SDK and GenAI semantic conventions.
    • Enable via AZURE_EXPERIMENTAL_ENABLE_GENAI_TRACING=true environment variable
    • Content recording controlled by OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT
    • Comprehensive session-level telemetry: session ID, audio format, first-token latency, turn count, interruption count, audio bytes sent/received, message size
    • Response & function call ID tracking for end-to-end tracing
    • Agent v2 telemetry with agent identity and configuration tracking
    • MCP telemetry with tool call and approval flow tracking
  • Agent Session Configuration: Added flattened connect() keyword arguments for configuring Azure AI Foundry agents at connection time with agent_name, project_name, agent_version, conversation_id, and more
  • Transcription Improvements:
    • Added TranscriptionPhrase and TranscriptionWord models for detailed transcription data
    • Added ServerEventResponseAudioTranscriptAnnotationAdded event
    • Added gpt-4o-transcribe-diarize and mai-transcribe-1 transcription model support
  • Interim Response Configuration: Added StaticInterimResponseConfig and LlmInterimResponseConfig for generating interim responses during latency or tool calls
  • Image Content Support: Added RequestImageContentPart for image inputs in conversations
  • Reasoning Effort Control: Added reasoning_effort field with ReasoningEffort enum
  • Response Metadata: Added metadata field to Response and ResponseCreateParams
  • Server Warning Events: Added ServerEventWarning for handling non-fatal warnings
  • Personal Voice Models: Added DragonHDOmniLatestNeural and MAI-Voice-1 model options
  • Enhanced OpenAI Voices: Added marin and cedar voices to OpenAIVoiceName enum
  • Enhanced Azure Personal Voice: Added custom_lexicon_url, prefer_locales, locale, style, pitch, rate, and volume properties
  • Pre-generated Assistant Messages: Added pre_generated_assistant_message in ResponseCreateParams
  • Explicit Null Values: Enhanced RequestSession to properly serialize explicitly set None values

Breaking Changes

  • Removed Foundry Agent Tool classes (FoundryAgentTool, ResponseFoundryAgentCallItem, etc.) — use flattened Azure AI Foundry keyword arguments with connect() instead
  • Audio Format Values: Changed OutputAudioFormat enum values to use underscore format (pcm16_8000hz, pcm16_16000hz) instead of the previous hyphenated values. This is a breaking change for code that compares, persists, or serializes the raw enum values. Legacy hyphenated values continue to deserialize for backward compatibility.
  • Renamed AvatarConfig.type field to avatar_type to avoid conflict with Python's built-in type

Other Changes

  • Updated default API version to 2026-04-10

1.2.0b5 (2026-04-06)

Features Added

  • OpenTelemetry Tracing Support: Added VoiceLiveInstrumentor for opt-in OpenTelemetry-based tracing of VoiceLive WebSocket connections, following Azure SDK and GenAI semantic conventions (v1.34.0). Instrumentation covers connection lifecycle (connect, close), message send/receive, and captures voice-specific attributes (gen_ai.voice.session_id, gen_ai.voice.event_type).
    • Enable via AZURE_EXPERIMENTAL_ENABLE_GENAI_TRACING=true environment variable.
    • Content recording controlled by OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT.
    • Aligned with azure-ai-agents / azure-ai-projects tracing model.
  • Enhanced Telemetry Tracking: Added comprehensive session-level and per-message telemetry:
    • Session ID: Automatically captured from session.created/session.updated events and set on the parent connect span (gen_ai.voice.session_id).
    • Audio format/codec: Input and output audio formats extracted from session.update sends (gen_ai.voice.input_audio_format, gen_ai.voice.output_audio_format).
    • First-token latency: Time from response.create to first response.audio.delta or response.text.delta, recorded as gen_ai.voice.first_token_latency_ms. response.text.delta is used for latency detection only and is not tracked as a normal recv event.
    • Turn count: Number of completed responses (response.done) per session (gen_ai.voice.turn_count).
    • Interruption count: Number of response.cancel sends per session (gen_ai.voice.interruption_count).
    • Audio bytes sent/received: Total audio payload bytes transferred (gen_ai.voice.audio_bytes_sent, gen_ai.voice.audio_bytes_received).
    • Message size: WebSocket message size on each send/recv span (gen_ai.voice.message_size).
    • Rate limit / error events: Server error and rate_limits.updated events recorded as span events with error codes and rate limit details.
  • Response & Function Call ID Tracking: All recv and send spans now carry correlation IDs for end-to-end tracing across events:
    • gen_ai.response.id, gen_ai.conversation.id, gen_ai.voice.call_id, gen_ai.voice.item_id, gen_ai.voice.previous_item_id, gen_ai.voice.output_index extracted from top-level and nested fields on every event span.
    • gen_ai.response.finish_reasons from response.done events (also propagated to the connect span).
  • Agent v2 Telemetry: Added agent identity and configuration tracking on the connect span:
    • gen_ai.agent.id and gen_ai.agent.thread_id extracted from session.created/session.updated server events.
    • gen_ai.agent.version and gen_ai.agent.project_name from Azure AI Foundry connect() keyword arguments at connect time.
  • MCP (Model Context Protocol) Telemetry: Added tracking for MCP tool calls and approval flows:
    • Per-event: gen_ai.voice.mcp.server_label, gen_ai.voice.mcp.tool_name, gen_ai.voice.mcp.approval_request_id, gen_ai.voice.mcp.approve on recv/send spans.
    • Session-level: gen_ai.voice.mcp.call_count and gen_ai.voice.mcp.list_tools_count counters flushed on session close.
    • Nested item extraction is guarded by event type to prevent forward-compatibility issues.

Other Changes

  • Updated default API version to 2026-01-01-preview.

1.2.0b4 (2026-02-12)

Features Added

  • Agent Session Configuration: Added flattened connect() keyword arguments for configuring Azure AI Foundry agents at connection time:
    • agent_name: The name of the agent (required)
    • project_name: The Foundry project containing the agent (required)
    • agent_version: Optional version specification
    • conversation_id: Optional existing conversation ID to continue
    • authentication_identity_client_id: Optional client ID for authentication
    • foundry_resource_override: Optional Foundry resource override
  • Server Warning Events: Added ServerEventWarning and ServerEventWarningDetails for handling non-fatal warnings from the service
  • New Event Type: Added ServerEventType.WARNING for warning event handling

Breaking Changes

  • Removed Foundry Agent Tools: The following classes and enums related to Foundry agent tools have been removed:
    • FoundryAgentTool - Use flattened Azure AI Foundry keyword arguments with connect() instead
    • ResponseFoundryAgentCallItem
    • FoundryAgentContextType enum
    • ToolType.FOUNDRY_AGENT enum value
    • ItemType.FOUNDRY_AGENT_CALL enum value
    • ServerEventResponseFoundryAgentCallArgumentsDelta
    • ServerEventResponseFoundryAgentCallArgumentsDone
    • ServerEventResponseFoundryAgentCallInProgress
    • ServerEventResponseFoundryAgentCallCompleted
    • ServerEventResponseFoundryAgentCallFailed
    • Related ServerEventType enum values for Foundry agent events

1.2.0b3 (2026-02-02)

Features Added

  • Support for Explicit Null Values: Enhanced RequestSession to properly serialize explicitly set None values (e.g., turn_detection=None now correctly sends "turn_detection": null in the WebSocket message)
  • Interim Response Configuration: Added support for interim response generation during latency or tool calls:
    • StaticInterimResponseConfig for static interim response texts that are randomly selected
    • LlmInterimResponseConfig for LLM-generated context-aware interim responses
    • InterimResponseTrigger enum with latency and tool triggers
    • interim_response field in RequestSession and ResponseSession
  • Foundry Agent Integration: Added support for Azure AI Foundry agents:
    • FoundryAgentTool for defining Foundry agent configurations
    • ResponseFoundryAgentCallItem for Foundry agent call responses
    • FoundryAgentContextType enum for context management (no_context, agent_context)
    • Server events for Foundry agent call lifecycle: ServerEventResponseFoundryAgentCallArgumentsDelta, ServerEventResponseFoundryAgentCallArgumentsDone, ServerEventResponseFoundryAgentCallInProgress, ServerEventResponseFoundryAgentCallCompleted, ServerEventResponseFoundryAgentCallFailed
  • Reasoning Effort Control: Added reasoning_effort field to RequestSession, ResponseSession, and ResponseCreateParams for controlling reasoning models effort levels with ReasoningEffort enum (none, minimal, low, medium, high, xhigh)
  • Response Metadata: Added metadata field to Response and ResponseCreateParams for attaching up to 16 key-value pairs (max 64 chars for keys, 512 chars for values)
  • Array Encoding Support: Enhanced serialization to support pipe, space, comma, and newline-delimited array encoding formats
  • Custom Text Normalization: Added custom_text_normalization_url field to AzureStandardVoice, AzureCustomVoice, and AzurePersonalVoice for custom text normalization configurations
  • Avatar Scene Configuration: Added Scene model for controlling avatar's zoom level, position (x/y), rotation (x/y/z pitch/yaw/roll), and movement amplitude in the video frame
  • Enhanced Avatar Configuration: Added scene and output_audit_audio fields to AvatarConfig for scene control and audit audio forwarding via WebSocket

Other Changes

  • Dependency Update: Updated minimum azure-core version from 1.36.0 to 1.37.0
  • Security Enhancement: Removed eval() usage in serialization utilities, replaced with explicit type checking for improved security
  • Serialization Improvements: Enhanced model_base deserialization for mutable types and array-encoded strings

Bug Fixes

  • Audio Format Values: Fixed OutputAudioFormat enum values to use underscore format (pcm16_8000hz, pcm16_16000hz) instead of hyphenated format for consistency with wire protocol and backward compatibility

1.2.0b2 (2025-11-20)

Features Added

  • Enhanced Avatar Configuration: Expanded avatar functionality with new configuration options:
    • Added AvatarConfigTypes enum with support for video-avatar and photo-avatar types
    • Added PhotoAvatarBaseModes enum for photo avatar base models (e.g., vasa-1)
    • Added AvatarOutputProtocol enum for avatar streaming protocols (webrtc, websocket)
    • Enhanced AvatarConfig model with new properties: type, model, and output_protocol
  • Image Content Support: Added support for image inputs in conversations:
    • New RequestImageContentPart model for including images in requests
    • New RequestImageContentPartDetail enum for controlling image detail levels (auto, low, high)
    • Added INPUT_IMAGE to ContentPartType enum
    • Enhanced token details models (InputTokenDetails, CachedTokenDetails) with image_tokens tracking
  • Enhanced OpenAI Voices: Added new OpenAI voice options:
    • Added marin and cedar voices to OpenAIVoiceName enum
  • Extended Azure Personal Voice Configuration: Enhanced AzurePersonalVoice with additional customization options:
    • Added support for custom lexicon via custom_lexicon_url
    • Added prefer_locales for locale preferences
    • Added locale, style, pitch, rate, and volume properties for fine-tuned voice control
  • Enhanced MCP Server Events: Added completion status events for MCP tool calls:
    • ServerEventResponseMcpCallInProgress for tracking in-progress MCP calls
    • ServerEventResponseMcpCallCompleted for successful MCP call completion
    • ServerEventResponseMcpCallFailed for failed MCP calls
  • Pre-generated Assistant Messages: Added support for pre-generated assistant messages in ResponseCreateParams via the pre_generated_assistant_message property

1.2.0b1 (2025-11-14)

Features Added

  • MCP (Model Context Protocol) Support: Added comprehensive support for Model Context Protocol integration:
    • MCPServer tool type for defining MCP server configurations with authorization, headers, and approval requirements
    • MCPTool model for representing MCP tool definitions with input schemas and annotations
    • MCPApprovalType enum for controlling approval workflows (never, always, or tool-specific)
    • New item types: MCPApprovalResponseRequestItem, ResponseMCPApprovalRequestItem, ResponseMCPApprovalResponseItem, ResponseMCPCallItem, and ResponseMCPListToolItem
    • New server events: ServerEventMcpListToolsInProgress, ServerEventMcpListToolsCompleted, ServerEventMcpListToolsFailed, ServerEventResponseMcpCallArgumentsDelta, and ServerEventResponseMcpCallArgumentsDone
    • Client event MCP_APPROVAL_RESPONSE for responding to approval requests
    • Enhanced ItemType enum with MCP-related types: mcp_list_tools, mcp_call, mcp_approval_request, and mcp_approval_response

1.1.0 (2025-11-03)

Features Added

  • Added support for Agent configuration through the new AgentConfig model
  • Added agent field to ResponseSession model to support agent-based conversations
  • The AgentConfig model includes properties for agent type, name, description, agent_id, and thread_id

1.1.0b1 (2025-10-06)

Features Added

  • AgentConfig Support: Re-introduced AgentConfig functionality with enhanced capabilities:
    • AgentConfig model added back to public API with full import and export support
    • agent field re-added to ResponseSession model for session-level agent configuration
    • Updated cross-language package mappings to include AgentConfig support
    • Provides foundation for advanced agent configuration scenarios

1.0.0 (2025-10-01)

Features Added

  • Enhanced WebSocket Connection Options: Significantly improved WebSocket connection configuration with transport-agnostic design:
    • Added new timeout configuration options: receive_timeout, close_timeout, and handshake_timeout for fine-grained control
    • Enhanced compression parameter to support both boolean and integer types for advanced zlib window configuration
    • Added vendor_options parameter for implementation-specific options passthrough (escape hatch for advanced users)
    • Improved documentation with clearer descriptions for all connection parameters
    • Better support for common aliases from other WebSocket ecosystems (max_size, ping_interval, etc.)
    • More robust option mapping with proper type conversion and safety checks
  • Enhanced Type Safety: Improved type safety for content parts with proper enum usage:
    • InputAudioContentPart, InputTextContentPart, and OutputTextContentPart now use ContentPartType enum values instead of string literals
    • Better IntelliSense support and compile-time type checking for content part discriminators

Breaking Changes

  • Improved Naming Conventions: Updated model and enum names for better clarity and consistency:
    • OAIVoice enum renamed to OpenAIVoiceName for more descriptive naming
    • ToolChoiceObject model renamed to ToolChoiceSelection for better semantic meaning
    • ToolChoiceFunctionObject model renamed to ToolChoiceFunctionSelection for consistency
    • Updated type unions and imports to reflect the new naming conventions
    • Cross-language package mappings updated to maintain compatibility across SDKs
  • Session Model Architecture: Separated ResponseSession and RequestSession models for better design clarity:
    • ResponseSession no longer inherits from RequestSession and now inherits directly from _Model
    • All session configuration fields are now explicitly defined in ResponseSession instead of being inherited
    • This provides clearer separation of concerns between request and response session configurations
    • May affect type checking and code that relied on the previous inheritance relationship
  • Model Cleanup: Removed unused AgentConfig model and related fields from the public API:
    • AgentConfig class has been completely removed from imports and exports
    • agent field removed from ResponseSession model (including constructor parameter)
    • Updated cross-language package mappings to reflect the removal
  • Model Naming Convention Update: Renamed EOUDetection to EouDetection for better naming consistency:
    • Class name changed from EOUDetection to EouDetection
    • All inheritance relationships updated: AzureSemanticDetection, AzureSemanticDetectionEn, and AzureSemanticDetectionMultilingual now inherit from EouDetection
    • Type annotations updated in AzureSemanticVad, AzureSemanticVadEn, AzureSemanticVadMultilingual, and ServerVad classes
    • Import statements and exports updated to reflect the new naming
  • Enhanced Content Part Type Safety: Content part discriminators now use enum values instead of string literals:
    • InputAudioContentPart.type now uses ContentPartType.INPUT_AUDIO instead of "input_audio"
    • InputTextContentPart.type now uses ContentPartType.INPUT_TEXT instead of "input_text"
    • OutputTextContentPart.type now uses ContentPartType.TEXT instead of "text"

Other Changes

  • Initial GA release

1.0.0b5 (2025-09-26)

Features Added

  • Enhanced Semantic Detection Type Safety: Added new EouThresholdLevel enum for better type safety in end-of-utterance detection:
    • LOW for low sensitivity threshold level
    • MEDIUM for medium sensitivity threshold level
    • HIGH for high sensitivity threshold level
    • DEFAULT for default sensitivity threshold level
  • Improved Semantic Detection Configuration: Enhanced semantic detection classes with better type annotations:
    • threshold_level parameter now supports both string values and EouThresholdLevel enum
    • Cleaner type definitions for AzureSemanticDetection, AzureSemanticDetectionEn, and AzureSemanticDetectionMultilingual
    • Improved documentation for threshold level parameters
  • Comprehensive Unit Test Suite: Added extensive unit test coverage with 200+ test cases covering:
    • All enum types and their functionality
    • Model creation, validation, and serialization
    • Async connection functionality with proper mocking
    • Client event handling and workflows
    • Voice configuration across all supported types
    • Message handling with content part hierarchy
    • Integration scenarios and real-world usage patterns
    • Recent changes validation and backwards compatibility
  • API Version Update: Updated to API version 2025-10-01 (from 2025-05-01-preview)
  • Enhanced Type Safety: Added new AzureVoiceType enum with values for better Azure voice type categorization:
    • AZURE_CUSTOM for custom voice configurations
    • AZURE_STANDARD for standard voice configurations
    • AZURE_PERSONAL for personal voice configurations
  • Improved Message Handling: Added MessageRole enum for better role type safety in message items
  • Enhanced Model Documentation: Comprehensive documentation improvements across all models:
    • Added detailed docstrings for model classes and their parameters
    • Enhanced enum value documentation with descriptions
    • Improved type annotations and parameter descriptions
  • Enhanced Semantic Detection: Added improved configuration options for all semantic detection classes:
    • Added threshold_level parameter with options: "low", "medium", "high", "default" (recommended over deprecated threshold)
    • Added timeout_ms parameter for timeout configuration in milliseconds (recommended over deprecated timeout)
  • Video Background Support: Added new Background model for video background customization:
    • Support for solid color backgrounds in hex format (e.g., #00FF00FF)
    • Support for image URL backgrounds
    • Mutually exclusive color and image URL options
  • Enhanced Video Parameters: Extended VideoParams model with:
    • background parameter for configuring video backgrounds using the new Background model
    • gop_size parameter for Group of Pictures (GOP) size control, affecting compression efficiency and seeking performance
  • Improved Type Safety: Added TurnDetectionType enum for better type safety and IntelliSense support
  • Package Structure Modernization: Simplified package initialization with namespace package support
  • Enhanced Error Handling: Added ConnectionError and ConnectionClosed exception classes to the async API for better WebSocket error management

Breaking Changes

  • Cross-Language Package Identity Update: Updated package ID from VoiceLive to VoiceLive.WebSocket for better cross-language consistency
  • Model Refactoring:
    • Renamed UserContentPart to MessageContentPart for clearer content part hierarchy
    • All message items now require a content field with list of MessageContentPart objects
    • OutputTextContentPart now inherits from MessageContentPart instead of being standalone
  • Enhanced Type Safety:
    • Azure voice classes now use AzureVoiceType enum discriminators instead of string literals
    • Message role discriminators now use MessageRole enum values for better type safety
  • Removed Deprecated Parameters: Completely removed deprecated parameters from semantic detection classes:
    • Removed threshold parameter from all semantic detection classes (AzureSemanticDetection, AzureSemanticDetectionEn, AzureSemanticDetectionMultilingual)
    • Removed timeout parameter from all semantic detection classes
    • Users must now use threshold_level and timeout_ms parameters respectively
  • Removed Synchronous API: Completely removed synchronous WebSocket operations to focus exclusively on async patterns:
    • Removed sync connect() function and sync VoiceLiveConnection class from main patch implementation
    • Removed sync basic_voice_assistant.py sample (only async version remains)
    • Simplified sync patch to minimal structure with empty exports
    • All functionality now available only through async patterns
  • Updated Dependencies: Modified package dependencies to reflect async-only architecture:
    • Moved aiohttp>=3.9.0,<4.0.0 from optional to required dependency
    • Removed websockets optional dependency as sync API no longer exists
    • Removed optional dependency groups websockets, aiohttp, and all-websockets
  • Model Rename:
    • Renamed AudioInputTranscriptionSettings to AudioInputTranscriptionOptions for consistency with naming conventions
    • Renamed AzureMultilingualSemanticVad to AzureSemanticVadMultilingual for naming consistency with other multilingual variants
  • Enhanced Type Safety: Turn detection discriminator types now use enum values instead of string literals for better type safety

Bug Fixes

  • Serialization Improvements: Fixed type casting issue in serialization utilities for better enum handling and type safety

Other Changes

  • Testing Infrastructure: Added comprehensive unit test suite with extensive coverage:
    • 8 main test files with 200+ individual test methods
    • Tests for all enums, models, async operations, client events, voice configurations, and message handling
    • Integration tests covering real-world scenarios and recent changes
    • Proper mocking for async WebSocket connections
    • Backwards compatibility validation
    • Test coverage for all recent changes and enhancements
  • API Documentation: Updated API view properties to reflect model structure changes, new enums, and cross-language package identity
  • Documentation Updates: Comprehensive updates to all markdown documentation:
    • Updated README.md to reflect async-only nature with updated examples and installation instructions
    • Updated samples README.md to remove sync sample references
    • Enhanced BASIC_VOICE_ASSISTANT.md with comprehensive async implementation guide
    • Added MIGRATION_GUIDE.md for users upgrading from previous versions

1.0.0b4 (2025-09-19)

Features Added

  • Personal Voice Models: Added PersonalVoiceModels enum with support for DragonLatestNeural, PhoenixLatestNeural, and PhoenixV2Neural models
  • Enhanced Animation Support: Added comprehensive server event classes for animation blendshapes and viseme handling:
    • ServerEventResponseAnimationBlendshapeDelta and ServerEventResponseAnimationBlendshapeDone
    • ServerEventResponseAnimationVisemeDelta and ServerEventResponseAnimationVisemeDone
  • Audio Timestamp Events: Added ServerEventResponseAudioTimestampDelta and ServerEventResponseAudioTimestampDone for better audio timing control
  • Improved Error Handling: Added ErrorResponse class for better error management
  • Enhanced Base Classes: Added ConversationItemBase and SessionBase for better code organization and inheritance
  • Token Usage Improvements: Renamed Usage to TokenUsage for better clarity
  • Audio Format Improvements: Reorganized audio format enums with separate InputAudioFormat and OutputAudioFormat enums for better clarity
  • Enhanced Output Audio Format Support: Added more granular output audio format options including specific sampling rates (8kHz, 16kHz) for PCM16

Breaking Changes

  • Model Cleanup: Removed experimental classes AzurePlatformVoice, LLMVoice, AzureSemanticVadServer, InputAudio, NoTurnDetection, and ToolChoiceFunctionObjectFunction
  • Class Rename: Renamed Usage class to TokenUsage for better clarity
  • Enum Reorganization:
    • Replaced AudioFormat enum with separate InputAudioFormat and OutputAudioFormat enums
    • Removed Phi4mmVoice enum
    • Removed EMOTION value from AnimationOutputType enum
    • Removed IN_PROGRESS value from ItemParamStatus enum
  • Server Events: Removed RESPONSE_EMOTION_HYPOTHESIS from ServerEventType enum

Other Changes

  • Package Structure: Simplified package initialization with namespace package support
  • Sample Updates: Improved basic voice assistant samples
  • Code Optimization: Streamlined model definitions with significant code reduction
  • API Configuration: Updated API view properties for better tooling support

1.0.0b3 (2025-09-17)

Features Added

  • Transcription improvement: Added phrase list
  • New Voice Types: Added AzurePlatformVoice and LLMVoice classes
  • Enhanced Speech Detection: Added AzureSemanticVadServer class
  • Improved Function Calling: Enhanced async function calling sample with better error handling
  • English-Specific Detection: Added AzureSemanticDetectionEn class for optimized English-only semantic end-of-utterance detection
  • English-Specific Voice Activity Detection: Added AzureSemanticVadEn class for enhanced English-only voice activity detection

Breaking Changes

  • Transcription: Removed custom_model and enabled from AudioInputTranscriptionSettings.
  • Async Authentication: Fixed credential handling for async scenarios
  • Model Serialization: Improved error handling and deserialization

Other Changes

  • Code Modernization: Updated type annotations throughout

1.0.0b2 (2025-09-10)

Features Added

  • Async function call

Bugs Fixed

  • Fixed function calling: ensure FunctionCallOutputItem.output is properly serialized as a JSON string before sending to the service.

1.0.0b1 (2025-08-28)

Features Added

  • Added WebSocket connection support through connect().
  • Added VoiceLiveConnection for managing WebSocket connections.
  • Added models of Voice Live preview.
  • Added WebSocket-based examples in the samples directory.

Other Changes

  • Initial preview release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure_ai_voicelive-1.2.0.tar.gz (238.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

azure_ai_voicelive-1.2.0-py3-none-any.whl (153.0 kB view details)

Uploaded Python 3

File details

Details for the file azure_ai_voicelive-1.2.0.tar.gz.

File metadata

  • Download URL: azure_ai_voicelive-1.2.0.tar.gz
  • Upload date:
  • Size: 238.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: RestSharp/106.13.0.0

File hashes

Hashes for azure_ai_voicelive-1.2.0.tar.gz
Algorithm Hash digest
SHA256 578a388d2f4bae4543bd153316b469d5dcd781c43875fa546f9ca42e35ebb504
MD5 fe5a8012f4d3924982118dd69a43c5e3
BLAKE2b-256 1a1749f08c4dffc53e5acbb79db2d4c911adc751b584f1176101f4b7fc445349

See more details on using hashes here.

File details

Details for the file azure_ai_voicelive-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for azure_ai_voicelive-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 97184058da439d808bc1c916e06d38f30b167790812a74def2ffacdb1bebcd54
MD5 6c2a9b4c424300e1256a9feae0865211
BLAKE2b-256 4b5864a1b9df628eb1860866f1fe402f86b35c746f2d1a9b4966a9a01f945675

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page