Microsoft Corporation Azure Ai Voicelive Client Library for Python

These details have not been verified by PyPI

Project links

repository

Project description

Azure AI VoiceLive client library for Python

This package provides a real-time, speech-to-speech client for Azure AI VoiceLive. It opens a WebSocket session to stream microphone audio to the service and receive typed server events (including audio) for responsive, interruptible conversations.

Status: General Availability (GA). This is a stable release suitable for production use.

Important: As of version 1.0.0, this SDK is async-only. The synchronous API has been removed to focus exclusively on async patterns. All examples and samples use async/await syntax.

Getting started

Prerequisites

Python 3.9+
An Azure subscription
A VoiceLive resource and endpoint
A working microphone and speakers/headphones if you run the voice samples

Install

Install the stable GA version:

# Base install (core client only)
python -m pip install azure-ai-voicelive

# For asynchronous streaming (uses aiohttp)
python -m pip install "azure-ai-voicelive[aiohttp]"

# For voice samples (includes audio processing)
# First install PyAudio dependencies for your platform:
#   Linux: sudo apt-get install -y portaudio19-dev libasound2-dev
#   macOS: brew install portaudio
python -m pip install azure-ai-voicelive[aiohttp] pyaudio python-dotenv

The SDK provides async-only WebSocket connections using aiohttp for optimal performance and reliability.

Authenticate

You can authenticate with an API key or an Azure Active Directory (AAD) token.

API Key Authentication (Quick Start)

Set environment variables in a .env file or directly in your environment:

# In your .env file or environment variables
AZURE_VOICELIVE_API_KEY="your-api-key"
AZURE_VOICELIVE_ENDPOINT="your-endpoint"

Then, use the key in your code:

import asyncio
from azure.core.credentials import AzureKeyCredential
from azure.ai.voicelive import connect

async def main():
    async with connect(
        endpoint="your-endpoint",
        credential=AzureKeyCredential("your-api-key"),
        model="gpt-4o-realtime-preview"
    ) as connection:
        # Your async code here
        pass

asyncio.run(main())

AAD Token Authentication

For production applications, AAD authentication is recommended:

import asyncio
from azure.identity.aio import DefaultAzureCredential
from azure.ai.voicelive import connect

async def main():
    credential = DefaultAzureCredential()
    
    async with connect(
        endpoint="your-endpoint",
        credential=credential,
        model="gpt-4o-realtime-preview"
    ) as connection:
        # Your async code here
        pass

asyncio.run(main())

Key concepts

VoiceLiveConnection – Manages an active async WebSocket connection to the service
Session Management – Configure conversation parameters:
- SessionResource – Update session parameters (voice, formats, VAD) with async methods
- RequestSession – Strongly-typed session configuration
- ServerVad – Configure voice activity detection
- AzureStandardVoice – Configure voice settings
Audio Handling:
- InputAudioBufferResource – Manage audio input to the service with async methods
- OutputAudioBufferResource – Control audio output from the service with async methods
Conversation Management:
- ResponseResource – Create or cancel model responses with async methods
- ConversationResource – Manage conversation items with async methods
Error Handling:
- ConnectionError – Base exception for WebSocket connection errors
- ConnectionClosed – Raised when WebSocket connection is closed
Strongly-Typed Events – Process service events with type safety:
- SESSION_UPDATED, RESPONSE_AUDIO_DELTA, RESPONSE_DONE
- INPUT_AUDIO_BUFFER_SPEECH_STARTED, INPUT_AUDIO_BUFFER_SPEECH_STOPPED
- ERROR, and more

Examples

Basic Voice Assistant (Featured Sample)

The Basic Voice Assistant sample demonstrates full-featured voice interaction with:

Real-time speech streaming
Server-side voice activity detection
Interruption handling
High-quality audio processing

# Run the basic voice assistant sample
# Requires [aiohttp] for async
python samples/basic_voice_assistant_async.py

# With custom parameters
python samples/basic_voice_assistant_async.py --model gpt-4o-realtime-preview --voice alloy --instructions "You're a helpful assistant"

Minimal example

import asyncio
from azure.core.credentials import AzureKeyCredential
from azure.ai.voicelive.aio import connect
from azure.ai.voicelive.models import (
    RequestSession, Modality, InputAudioFormat, OutputAudioFormat, ServerVad, ServerEventType
)

API_KEY = "your-api-key"
ENDPOINT = "wss://your-endpoint.com/openai/realtime"
MODEL = "gpt-4o-realtime-preview"

async def main():
    async with connect(
        endpoint=ENDPOINT,
        credential=AzureKeyCredential(API_KEY),
        model=MODEL,
    ) as conn:
        session = RequestSession(
            modalities=[Modality.TEXT, Modality.AUDIO],
            instructions="You are a helpful assistant.",
            input_audio_format=InputAudioFormat.PCM16,
            output_audio_format=OutputAudioFormat.PCM16,
            turn_detection=ServerVad(
                threshold=0.5, 
                prefix_padding_ms=300, 
                silence_duration_ms=500
            ),
        )
        await conn.session.update(session=session)

        # Process events
        async for evt in conn:
            print(f"Event: {evt.type}")
            if evt.type == ServerEventType.RESPONSE_DONE:
                break

asyncio.run(main())

Available Voice Options

Azure Neural Voices

# Use Azure Neural voices
voice_config = AzureStandardVoice(
    name="en-US-AvaNeural",  # Or another voice name
    type="azure-standard"
)

Popular voices include:

en-US-AvaNeural - Female, natural and professional
en-US-JennyNeural - Female, conversational
en-US-GuyNeural - Male, professional

OpenAI Voices

# Use OpenAI voices (as string)
voice_config = "alloy"  # Or another OpenAI voice

Available OpenAI voices:

alloy - Versatile, neutral
echo - Precise, clear
fable - Animated, expressive
onyx - Deep, authoritative
nova - Warm, conversational
shimmer - Optimistic, friendly

Handling Events

async for event in connection:
    if event.type == ServerEventType.SESSION_UPDATED:
        print(f"Session ready: {event.session.id}")
        # Start audio capture
        
    elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED:
        print("User started speaking")
        # Stop playback and cancel any current response
        
    elif event.type == ServerEventType.RESPONSE_AUDIO_DELTA:
        # Play the audio chunk
        audio_bytes = event.delta
        
    elif event.type == ServerEventType.ERROR:
        print(f"Error: {event.error.message}")

Troubleshooting

Connection Issues

WebSocket connection errors (1006/timeout):
Verify AZURE_VOICELIVE_ENDPOINT, network rules, and that your credential has access.
Missing WebSocket dependencies:
If you see import errors, make sure you have installed the package: pip install azure-ai-voicelive[aiohttp]
Auth failures:
For API key, double-check AZURE_VOICELIVE_API_KEY. For AAD, ensure the identity is authorized.

Audio Device Issues

No microphone/speaker detected:
Check device connections and permissions. On headless CI environments, audio samples can't run.

Audio library installation problems:
On Linux/macOS you may need PortAudio:

# Debian/Ubuntu
sudo apt-get install -y portaudio19-dev libasound2-dev
# macOS (Homebrew)
brew install portaudio

Enable Verbose Logging

import logging
logging.basicConfig(level=logging.DEBUG)

Next steps

Run the featured sample:
- Try samples/basic_voice_assistant_async.py for a complete voice assistant implementation
Customize your implementation:
- Experiment with different voices and parameters
- Add custom instructions for specialized assistants
- Integrate with your own audio capture/playback systems
Advanced scenarios:
- Add function calling support
- Implement tool usage
- Create multi-turn conversations with history
Explore other samples:
- Check the samples/ directory for specialized examples
- See samples/README.md for a full list of samples

Contributing

This project follows the Azure SDK guidelines. If you'd like to contribute:

Fork the repo and create a feature branch
Run linters and tests locally
Submit a pull request with a clear description of the change

Release notes

Changelogs are available in the package directory.

License

This project is released under the MIT License.

Release History

1.2.0 (2026-05-22)

Features Added

Web Search & File Search: Added support for built-in web search and file search tools:
- New item types: ResponseWebSearchCallItem, ResponseFileSearchCallItem
- New server events for web/file search lifecycle (searching, in_progress, completed)
- New models: ActionFind, ActionOpenPage, ActionSearch, ActionSearchSource, FileSearchResult
- New enum values: ItemType.WEB_SEARCH_CALL, ItemType.FILE_SEARCH_CALL
- New SessionIncludeOption enum for controlling what data is included in session responses
MCP (Model Context Protocol) Support: Added comprehensive support for Model Context Protocol integration:
- MCPServer tool type for defining MCP server configurations with authorization, headers, and approval requirements
- MCPTool model for representing MCP tool definitions with input schemas and annotations
- MCPApprovalType enum for controlling approval workflows (never, always, or tool-specific)
- New item types for MCP approval and call workflows
- New server events for MCP tool listing, call lifecycle, and approval flows
Avatar Enhancements:
- Added AzureAvatarVoiceSyncVoice for avatar voice sync configuration
- Added ServerEventSessionAvatarSwitchToIdle and ServerEventSessionAvatarSwitchToSpeaking events
- Added ServerEventResponseVideoDelta for avatar video frame streaming
- Added ClientEventOutputAudioBufferClear and ServerEventOutputAudioBufferCleared for output buffer management
- Added AvatarConfigTypes enum with support for video-avatar and photo-avatar types
- Added AvatarOutputProtocol enum for avatar streaming protocols (webrtc, websocket)
- Added Scene model for controlling avatar zoom, position, rotation, and movement amplitude
- Added output_audit_audio field to AvatarConfig
OpenTelemetry Tracing Support: Added VoiceLiveInstrumentor for opt-in OpenTelemetry-based tracing of VoiceLive WebSocket connections, following Azure SDK and GenAI semantic conventions.
- Enable via AZURE_EXPERIMENTAL_ENABLE_GENAI_TRACING=true environment variable
- Content recording controlled by OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT
- Comprehensive session-level telemetry: session ID, audio format, first-token latency, turn count, interruption count, audio bytes sent/received, message size
- Response & function call ID tracking for end-to-end tracing
- Agent v2 telemetry with agent identity and configuration tracking
- MCP telemetry with tool call and approval flow tracking
Agent Session Configuration: Added flattened connect() keyword arguments for configuring Azure AI Foundry agents at connection time with agent_name, project_name, agent_version, conversation_id, and more
Transcription Improvements:
- Added TranscriptionPhrase and TranscriptionWord models for detailed transcription data
- Added ServerEventResponseAudioTranscriptAnnotationAdded event
- Added gpt-4o-transcribe-diarize and mai-transcribe-1 transcription model support
Interim Response Configuration: Added StaticInterimResponseConfig and LlmInterimResponseConfig for generating interim responses during latency or tool calls
Image Content Support: Added RequestImageContentPart for image inputs in conversations
Reasoning Effort Control: Added reasoning_effort field with ReasoningEffort enum
Response Metadata: Added metadata field to Response and ResponseCreateParams
Server Warning Events: Added ServerEventWarning for handling non-fatal warnings
Personal Voice Models: Added DragonHDOmniLatestNeural and MAI-Voice-1 model options
Enhanced OpenAI Voices: Added marin and cedar voices to OpenAIVoiceName enum
Enhanced Azure Personal Voice: Added custom_lexicon_url, prefer_locales, locale, style, pitch, rate, and volume properties
Pre-generated Assistant Messages: Added pre_generated_assistant_message in ResponseCreateParams
Explicit Null Values: Enhanced RequestSession to properly serialize explicitly set None values

Breaking Changes

Removed Foundry Agent Tool classes (FoundryAgentTool, ResponseFoundryAgentCallItem, etc.) — use flattened Azure AI Foundry keyword arguments with connect() instead
Audio Format Values: Changed OutputAudioFormat enum values to use underscore format (pcm16_8000hz, pcm16_16000hz) instead of the previous hyphenated values. This is a breaking change for code that compares, persists, or serializes the raw enum values. Legacy hyphenated values continue to deserialize for backward compatibility.
Renamed AvatarConfig.type field to avatar_type to avoid conflict with Python's built-in type

Other Changes

Updated default API version to 2026-04-10

1.2.0b5 (2026-04-06)

Features Added

OpenTelemetry Tracing Support: Added VoiceLiveInstrumentor for opt-in OpenTelemetry-based tracing of VoiceLive WebSocket connections, following Azure SDK and GenAI semantic conventions (v1.34.0). Instrumentation covers connection lifecycle (connect, close), message send/receive, and captures voice-specific attributes (gen_ai.voice.session_id, gen_ai.voice.event_type).
- Enable via AZURE_EXPERIMENTAL_ENABLE_GENAI_TRACING=true environment variable.
- Content recording controlled by OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT.
- Aligned with azure-ai-agents / azure-ai-projects tracing model.
Enhanced Telemetry Tracking: Added comprehensive session-level and per-message telemetry:
- Session ID: Automatically captured from session.created/session.updated events and set on the parent connect span (gen_ai.voice.session_id).
- Audio format/codec: Input and output audio formats extracted from session.update sends (gen_ai.voice.input_audio_format, gen_ai.voice.output_audio_format).
- First-token latency: Time from response.create to first response.audio.delta or response.text.delta, recorded as gen_ai.voice.first_token_latency_ms. response.text.delta is used for latency detection only and is not tracked as a normal recv event.
- Turn count: Number of completed responses (response.done) per session (gen_ai.voice.turn_count).
- Interruption count: Number of response.cancel sends per session (gen_ai.voice.interruption_count).
- Audio bytes sent/received: Total audio payload bytes transferred (gen_ai.voice.audio_bytes_sent, gen_ai.voice.audio_bytes_received).
- Message size: WebSocket message size on each send/recv span (gen_ai.voice.message_size).
- Rate limit / error events: Server error and rate_limits.updated events recorded as span events with error codes and rate limit details.
Response & Function Call ID Tracking: All recv and send spans now carry correlation IDs for end-to-end tracing across events:
- gen_ai.response.id, gen_ai.conversation.id, gen_ai.voice.call_id, gen_ai.voice.item_id, gen_ai.voice.previous_item_id, gen_ai.voice.output_index extracted from top-level and nested fields on every event span.
- gen_ai.response.finish_reasons from response.done events (also propagated to the connect span).
Agent v2 Telemetry: Added agent identity and configuration tracking on the connect span:
- gen_ai.agent.id and gen_ai.agent.thread_id extracted from session.created/session.updated server events.
- gen_ai.agent.version and gen_ai.agent.project_name from Azure AI Foundry connect() keyword arguments at connect time.
MCP (Model Context Protocol) Telemetry: Added tracking for MCP tool calls and approval flows:
- Per-event: gen_ai.voice.mcp.server_label, gen_ai.voice.mcp.tool_name, gen_ai.voice.mcp.approval_request_id, gen_ai.voice.mcp.approve on recv/send spans.
- Session-level: gen_ai.voice.mcp.call_count and gen_ai.voice.mcp.list_tools_count counters flushed on session close.
- Nested item extraction is guarded by event type to prevent forward-compatibility issues.

Other Changes

Updated default API version to 2026-01-01-preview.

1.2.0b4 (2026-02-12)

Features Added

Agent Session Configuration: Added flattened connect() keyword arguments for configuring Azure AI Foundry agents at connection time:
- agent_name: The name of the agent (required)
- project_name: The Foundry project containing the agent (required)
- agent_version: Optional version specification
- conversation_id: Optional existing conversation ID to continue
- authentication_identity_client_id: Optional client ID for authentication
- foundry_resource_override: Optional Foundry resource override
Server Warning Events: Added ServerEventWarning and ServerEventWarningDetails for handling non-fatal warnings from the service
New Event Type: Added ServerEventType.WARNING for warning event handling

Breaking Changes

Removed Foundry Agent Tools: The following classes and enums related to Foundry agent tools have been removed:
- FoundryAgentTool - Use flattened Azure AI Foundry keyword arguments with connect() instead
- ResponseFoundryAgentCallItem
- FoundryAgentContextType enum
- ToolType.FOUNDRY_AGENT enum value
- ItemType.FOUNDRY_AGENT_CALL enum value
- ServerEventResponseFoundryAgentCallArgumentsDelta
- ServerEventResponseFoundryAgentCallArgumentsDone
- ServerEventResponseFoundryAgentCallInProgress
- ServerEventResponseFoundryAgentCallCompleted
- ServerEventResponseFoundryAgentCallFailed
- Related ServerEventType enum values for Foundry agent events

1.2.0b3 (2026-02-02)

Features Added

Support for Explicit Null Values: Enhanced RequestSession to properly serialize explicitly set None values (e.g., turn_detection=None now correctly sends "turn_detection": null in the WebSocket message)
Interim Response Configuration: Added support for interim response generation during latency or tool calls:
- StaticInterimResponseConfig for static interim response texts that are randomly selected
- LlmInterimResponseConfig for LLM-generated context-aware interim responses
- InterimResponseTrigger enum with latency and tool triggers
- interim_response field in RequestSession and ResponseSession
Foundry Agent Integration: Added support for Azure AI Foundry agents:
- FoundryAgentTool for defining Foundry agent configurations
- ResponseFoundryAgentCallItem for Foundry agent call responses
- FoundryAgentContextType enum for context management (no_context, agent_context)
- Server events for Foundry agent call lifecycle: ServerEventResponseFoundryAgentCallArgumentsDelta, ServerEventResponseFoundryAgentCallArgumentsDone, ServerEventResponseFoundryAgentCallInProgress, ServerEventResponseFoundryAgentCallCompleted, ServerEventResponseFoundryAgentCallFailed
Reasoning Effort Control: Added reasoning_effort field to RequestSession, ResponseSession, and ResponseCreateParams for controlling reasoning models effort levels with ReasoningEffort enum (none, minimal, low, medium, high, xhigh)
Response Metadata: Added metadata field to Response and ResponseCreateParams for attaching up to 16 key-value pairs (max 64 chars for keys, 512 chars for values)
Array Encoding Support: Enhanced serialization to support pipe, space, comma, and newline-delimited array encoding formats
Custom Text Normalization: Added custom_text_normalization_url field to AzureStandardVoice, AzureCustomVoice, and AzurePersonalVoice for custom text normalization configurations
Avatar Scene Configuration: Added Scene model for controlling avatar's zoom level, position (x/y), rotation (x/y/z pitch/yaw/roll), and movement amplitude in the video frame
Enhanced Avatar Configuration: Added scene and output_audit_audio fields to AvatarConfig for scene control and audit audio forwarding via WebSocket

Other Changes

Dependency Update: Updated minimum azure-core version from 1.36.0 to 1.37.0
Security Enhancement: Removed eval() usage in serialization utilities, replaced with explicit type checking for improved security
Serialization Improvements: Enhanced model_base deserialization for mutable types and array-encoded strings

Bug Fixes

Audio Format Values: Fixed OutputAudioFormat enum values to use underscore format (pcm16_8000hz, pcm16_16000hz) instead of hyphenated format for consistency with wire protocol and backward compatibility

1.2.0b2 (2025-11-20)

Features Added

Enhanced Avatar Configuration: Expanded avatar functionality with new configuration options:
- Added AvatarConfigTypes enum with support for video-avatar and photo-avatar types
- Added PhotoAvatarBaseModes enum for photo avatar base models (e.g., vasa-1)
- Added AvatarOutputProtocol enum for avatar streaming protocols (webrtc, websocket)
- Enhanced AvatarConfig model with new properties: type, model, and output_protocol
Image Content Support: Added support for image inputs in conversations:
- New RequestImageContentPart model for including images in requests
- New RequestImageContentPartDetail enum for controlling image detail levels (auto, low, high)
- Added INPUT_IMAGE to ContentPartType enum
- Enhanced token details models (InputTokenDetails, CachedTokenDetails) with image_tokens tracking
Enhanced OpenAI Voices: Added new OpenAI voice options:
- Added marin and cedar voices to OpenAIVoiceName enum
Extended Azure Personal Voice Configuration: Enhanced AzurePersonalVoice with additional customization options:
- Added support for custom lexicon via custom_lexicon_url
- Added prefer_locales for locale preferences
- Added locale, style, pitch, rate, and volume properties for fine-tuned voice control
Enhanced MCP Server Events: Added completion status events for MCP tool calls:
- ServerEventResponseMcpCallInProgress for tracking in-progress MCP calls
- ServerEventResponseMcpCallCompleted for successful MCP call completion
- ServerEventResponseMcpCallFailed for failed MCP calls
Pre-generated Assistant Messages: Added support for pre-generated assistant messages in ResponseCreateParams via the pre_generated_assistant_message property

1.2.0b1 (2025-11-14)

Features Added

MCP (Model Context Protocol) Support: Added comprehensive support for Model Context Protocol integration:
- MCPServer tool type for defining MCP server configurations with authorization, headers, and approval requirements
- MCPTool model for representing MCP tool definitions with input schemas and annotations
- MCPApprovalType enum for controlling approval workflows (never, always, or tool-specific)
- New item types: MCPApprovalResponseRequestItem, ResponseMCPApprovalRequestItem, ResponseMCPApprovalResponseItem, ResponseMCPCallItem, and ResponseMCPListToolItem
- New server events: ServerEventMcpListToolsInProgress, ServerEventMcpListToolsCompleted, ServerEventMcpListToolsFailed, ServerEventResponseMcpCallArgumentsDelta, and ServerEventResponseMcpCallArgumentsDone
- Client event MCP_APPROVAL_RESPONSE for responding to approval requests
- Enhanced ItemType enum with MCP-related types: mcp_list_tools, mcp_call, mcp_approval_request, and mcp_approval_response

1.1.0 (2025-11-03)

Features Added

Added support for Agent configuration through the new AgentConfig model
Added agent field to ResponseSession model to support agent-based conversations
The AgentConfig model includes properties for agent type, name, description, agent_id, and thread_id

1.1.0b1 (2025-10-06)

Features Added

AgentConfig Support: Re-introduced AgentConfig functionality with enhanced capabilities:
- AgentConfig model added back to public API with full import and export support
- agent field re-added to ResponseSession model for session-level agent configuration
- Updated cross-language package mappings to include AgentConfig support
- Provides foundation for advanced agent configuration scenarios

1.0.0 (2025-10-01)

Features Added

Enhanced WebSocket Connection Options: Significantly improved WebSocket connection configuration with transport-agnostic design:
- Added new timeout configuration options: receive_timeout, close_timeout, and handshake_timeout for fine-grained control
- Enhanced compression parameter to support both boolean and integer types for advanced zlib window configuration
- Added vendor_options parameter for implementation-specific options passthrough (escape hatch for advanced users)
- Improved documentation with clearer descriptions for all connection parameters
- Better support for common aliases from other WebSocket ecosystems (max_size, ping_interval, etc.)
- More robust option mapping with proper type conversion and safety checks
Enhanced Type Safety: Improved type safety for content parts with proper enum usage:
- InputAudioContentPart, InputTextContentPart, and OutputTextContentPart now use ContentPartType enum values instead of string literals
- Better IntelliSense support and compile-time type checking for content part discriminators

Breaking Changes

Improved Naming Conventions: Updated model and enum names for better clarity and consistency:
- OAIVoice enum renamed to OpenAIVoiceName for more descriptive naming
- ToolChoiceObject model renamed to ToolChoiceSelection for better semantic meaning
- ToolChoiceFunctionObject model renamed to ToolChoiceFunctionSelection for consistency
- Updated type unions and imports to reflect the new naming conventions
- Cross-language package mappings updated to maintain compatibility across SDKs
Session Model Architecture: Separated ResponseSession and RequestSession models for better design clarity:
- ResponseSession no longer inherits from RequestSession and now inherits directly from _Model
- All session configuration fields are now explicitly defined in ResponseSession instead of being inherited
- This provides clearer separation of concerns between request and response session configurations
- May affect type checking and code that relied on the previous inheritance relationship
Model Cleanup: Removed unused AgentConfig model and related fields from the public API:
- AgentConfig class has been completely removed from imports and exports
- agent field removed from ResponseSession model (including constructor parameter)
- Updated cross-language package mappings to reflect the removal
Model Naming Convention Update: Renamed EOUDetection to EouDetection for better naming consistency:
- Class name changed from EOUDetection to EouDetection
- All inheritance relationships updated: AzureSemanticDetection, AzureSemanticDetectionEn, and AzureSemanticDetectionMultilingual now inherit from EouDetection
- Type annotations updated in AzureSemanticVad, AzureSemanticVadEn, AzureSemanticVadMultilingual, and ServerVad classes
- Import statements and exports updated to reflect the new naming
Enhanced Content Part Type Safety: Content part discriminators now use enum values instead of string literals:
- InputAudioContentPart.type now uses ContentPartType.INPUT_AUDIO instead of "input_audio"
- InputTextContentPart.type now uses ContentPartType.INPUT_TEXT instead of "input_text"
- OutputTextContentPart.type now uses ContentPartType.TEXT instead of "text"

Other Changes

Initial GA release

1.0.0b5 (2025-09-26)

Features Added

Enhanced Semantic Detection Type Safety: Added new EouThresholdLevel enum for better type safety in end-of-utterance detection:
- LOW for low sensitivity threshold level
- MEDIUM for medium sensitivity threshold level
- HIGH for high sensitivity threshold level
- DEFAULT for default sensitivity threshold level
Improved Semantic Detection Configuration: Enhanced semantic detection classes with better type annotations:
- threshold_level parameter now supports both string values and EouThresholdLevel enum
- Cleaner type definitions for AzureSemanticDetection, AzureSemanticDetectionEn, and AzureSemanticDetectionMultilingual
- Improved documentation for threshold level parameters
Comprehensive Unit Test Suite: Added extensive unit test coverage with 200+ test cases covering:
- All enum types and their functionality
- Model creation, validation, and serialization
- Async connection functionality with proper mocking
- Client event handling and workflows
- Voice configuration across all supported types
- Message handling with content part hierarchy
- Integration scenarios and real-world usage patterns
- Recent changes validation and backwards compatibility
API Version Update: Updated to API version 2025-10-01 (from 2025-05-01-preview)
Enhanced Type Safety: Added new AzureVoiceType enum with values for better Azure voice type categorization:
- AZURE_CUSTOM for custom voice configurations
- AZURE_STANDARD for standard voice configurations
- AZURE_PERSONAL for personal voice configurations
Improved Message Handling: Added MessageRole enum for better role type safety in message items
Enhanced Model Documentation: Comprehensive documentation improvements across all models:
- Added detailed docstrings for model classes and their parameters
- Enhanced enum value documentation with descriptions
- Improved type annotations and parameter descriptions
Enhanced Semantic Detection: Added improved configuration options for all semantic detection classes:
- Added threshold_level parameter with options: "low", "medium", "high", "default" (recommended over deprecated threshold)
- Added timeout_ms parameter for timeout configuration in milliseconds (recommended over deprecated timeout)
Video Background Support: Added new Background model for video background customization:
- Support for solid color backgrounds in hex format (e.g., #00FF00FF)
- Support for image URL backgrounds
- Mutually exclusive color and image URL options
Enhanced Video Parameters: Extended VideoParams model with:
- background parameter for configuring video backgrounds using the new Background model
- gop_size parameter for Group of Pictures (GOP) size control, affecting compression efficiency and seeking performance
Improved Type Safety: Added TurnDetectionType enum for better type safety and IntelliSense support
Package Structure Modernization: Simplified package initialization with namespace package support
Enhanced Error Handling: Added ConnectionError and ConnectionClosed exception classes to the async API for better WebSocket error management

Breaking Changes

Cross-Language Package Identity Update: Updated package ID from VoiceLive to VoiceLive.WebSocket for better cross-language consistency
Model Refactoring:
- Renamed UserContentPart to MessageContentPart for clearer content part hierarchy
- All message items now require a content field with list of MessageContentPart objects
- OutputTextContentPart now inherits from MessageContentPart instead of being standalone
Enhanced Type Safety:
- Azure voice classes now use AzureVoiceType enum discriminators instead of string literals
- Message role discriminators now use MessageRole enum values for better type safety
Removed Deprecated Parameters: Completely removed deprecated parameters from semantic detection classes:
- Removed threshold parameter from all semantic detection classes (AzureSemanticDetection, AzureSemanticDetectionEn, AzureSemanticDetectionMultilingual)
- Removed timeout parameter from all semantic detection classes
- Users must now use threshold_level and timeout_ms parameters respectively
Removed Synchronous API: Completely removed synchronous WebSocket operations to focus exclusively on async patterns:
- Removed sync connect() function and sync VoiceLiveConnection class from main patch implementation
- Removed sync basic_voice_assistant.py sample (only async version remains)
- Simplified sync patch to minimal structure with empty exports
- All functionality now available only through async patterns
Updated Dependencies: Modified package dependencies to reflect async-only architecture:
- Moved aiohttp>=3.9.0,<4.0.0 from optional to required dependency
- Removed websockets optional dependency as sync API no longer exists
- Removed optional dependency groups websockets, aiohttp, and all-websockets
Model Rename:
- Renamed AudioInputTranscriptionSettings to AudioInputTranscriptionOptions for consistency with naming conventions
- Renamed AzureMultilingualSemanticVad to AzureSemanticVadMultilingual for naming consistency with other multilingual variants
Enhanced Type Safety: Turn detection discriminator types now use enum values instead of string literals for better type safety

Bug Fixes

Serialization Improvements: Fixed type casting issue in serialization utilities for better enum handling and type safety

Other Changes

Testing Infrastructure: Added comprehensive unit test suite with extensive coverage:
- 8 main test files with 200+ individual test methods
- Tests for all enums, models, async operations, client events, voice configurations, and message handling
- Integration tests covering real-world scenarios and recent changes
- Proper mocking for async WebSocket connections
- Backwards compatibility validation
- Test coverage for all recent changes and enhancements
API Documentation: Updated API view properties to reflect model structure changes, new enums, and cross-language package identity
Documentation Updates: Comprehensive updates to all markdown documentation:
- Updated README.md to reflect async-only nature with updated examples and installation instructions
- Updated samples README.md to remove sync sample references
- Enhanced BASIC_VOICE_ASSISTANT.md with comprehensive async implementation guide
- Added MIGRATION_GUIDE.md for users upgrading from previous versions

1.0.0b4 (2025-09-19)

Features Added

Personal Voice Models: Added PersonalVoiceModels enum with support for DragonLatestNeural, PhoenixLatestNeural, and PhoenixV2Neural models
Enhanced Animation Support: Added comprehensive server event classes for animation blendshapes and viseme handling:
- ServerEventResponseAnimationBlendshapeDelta and ServerEventResponseAnimationBlendshapeDone
- ServerEventResponseAnimationVisemeDelta and ServerEventResponseAnimationVisemeDone
Audio Timestamp Events: Added ServerEventResponseAudioTimestampDelta and ServerEventResponseAudioTimestampDone for better audio timing control
Improved Error Handling: Added ErrorResponse class for better error management
Enhanced Base Classes: Added ConversationItemBase and SessionBase for better code organization and inheritance
Token Usage Improvements: Renamed Usage to TokenUsage for better clarity
Audio Format Improvements: Reorganized audio format enums with separate InputAudioFormat and OutputAudioFormat enums for better clarity
Enhanced Output Audio Format Support: Added more granular output audio format options including specific sampling rates (8kHz, 16kHz) for PCM16

Breaking Changes

Model Cleanup: Removed experimental classes AzurePlatformVoice, LLMVoice, AzureSemanticVadServer, InputAudio, NoTurnDetection, and ToolChoiceFunctionObjectFunction
Class Rename: Renamed Usage class to TokenUsage for better clarity
Enum Reorganization:
- Replaced AudioFormat enum with separate InputAudioFormat and OutputAudioFormat enums
- Removed Phi4mmVoice enum
- Removed EMOTION value from AnimationOutputType enum
- Removed IN_PROGRESS value from ItemParamStatus enum
Server Events: Removed RESPONSE_EMOTION_HYPOTHESIS from ServerEventType enum

Other Changes

Package Structure: Simplified package initialization with namespace package support
Sample Updates: Improved basic voice assistant samples
Code Optimization: Streamlined model definitions with significant code reduction
API Configuration: Updated API view properties for better tooling support

1.0.0b3 (2025-09-17)

Features Added

Transcription improvement: Added phrase list
New Voice Types: Added AzurePlatformVoice and LLMVoice classes
Enhanced Speech Detection: Added AzureSemanticVadServer class
Improved Function Calling: Enhanced async function calling sample with better error handling
English-Specific Detection: Added AzureSemanticDetectionEn class for optimized English-only semantic end-of-utterance detection
English-Specific Voice Activity Detection: Added AzureSemanticVadEn class for enhanced English-only voice activity detection

Breaking Changes

Transcription: Removed custom_model and enabled from AudioInputTranscriptionSettings.
Async Authentication: Fixed credential handling for async scenarios
Model Serialization: Improved error handling and deserialization

Other Changes

Code Modernization: Updated type annotations throughout

1.0.0b2 (2025-09-10)

Features Added

Async function call

Bugs Fixed

Fixed function calling: ensure FunctionCallOutputItem.output is properly serialized as a JSON string before sending to the service.

1.0.0b1 (2025-08-28)

Features Added

Added WebSocket connection support through connect().
Added VoiceLiveConnection for managing WebSocket connections.
Added models of Voice Live preview.
Added WebSocket-based examples in the samples directory.

Other Changes

Initial preview release.

Project details

These details have not been verified by PyPI

Project links

repository

Release history Release notifications | RSS feed

This version

1.2.0

May 22, 2026

1.2.0b5 pre-release

Apr 6, 2026

1.2.0b4 pre-release

Feb 13, 2026

1.2.0b3 pre-release

Feb 3, 2026

1.2.0b2 pre-release

Nov 21, 2025

1.2.0b1 pre-release

Nov 14, 2025

1.1.0

Nov 4, 2025

1.1.0b1 pre-release

Oct 7, 2025

1.0.0

Oct 2, 2025

1.0.0b5 pre-release

Sep 30, 2025

1.0.0b4 pre-release

Sep 19, 2025

1.0.0b3 pre-release

Sep 19, 2025

1.0.0b2 pre-release

Sep 11, 2025

1.0.0b1 pre-release

Aug 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure_ai_voicelive-1.2.0.tar.gz (238.8 kB view details)

Uploaded May 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

azure_ai_voicelive-1.2.0-py3-none-any.whl (153.0 kB view details)

Uploaded May 22, 2026 Python 3

File details

Details for the file azure_ai_voicelive-1.2.0.tar.gz.

File metadata

Download URL: azure_ai_voicelive-1.2.0.tar.gz
Upload date: May 22, 2026
Size: 238.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: RestSharp/106.13.0.0

File hashes

Hashes for azure_ai_voicelive-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`578a388d2f4bae4543bd153316b469d5dcd781c43875fa546f9ca42e35ebb504`
MD5	`fe5a8012f4d3924982118dd69a43c5e3`
BLAKE2b-256	`1a1749f08c4dffc53e5acbb79db2d4c911adc751b584f1176101f4b7fc445349`

See more details on using hashes here.

File details

Details for the file azure_ai_voicelive-1.2.0-py3-none-any.whl.

File metadata

Download URL: azure_ai_voicelive-1.2.0-py3-none-any.whl
Upload date: May 22, 2026
Size: 153.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: RestSharp/106.13.0.0

File hashes

Hashes for azure_ai_voicelive-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`97184058da439d808bc1c916e06d38f30b167790812a74def2ffacdb1bebcd54`
MD5	`6c2a9b4c424300e1256a9feae0865211`
BLAKE2b-256	`4b5864a1b9df628eb1860866f1fe402f86b35c746f2d1a9b4966a9a01f945675`

See more details on using hashes here.

azure-ai-voicelive 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Azure AI VoiceLive client library for Python

Getting started

Prerequisites

Install

Authenticate

API Key Authentication (Quick Start)

AAD Token Authentication

Key concepts

Examples

Basic Voice Assistant (Featured Sample)

Minimal example

Available Voice Options

Azure Neural Voices

OpenAI Voices

Handling Events

Troubleshooting

Connection Issues

Audio Device Issues

Enable Verbose Logging

Next steps

Contributing

Release notes

License

Release History

1.2.0 (2026-05-22)

Features Added

Breaking Changes

Other Changes

1.2.0b5 (2026-04-06)

Features Added

Other Changes

1.2.0b4 (2026-02-12)

Features Added

Breaking Changes

1.2.0b3 (2026-02-02)

Features Added

Other Changes

Bug Fixes

1.2.0b2 (2025-11-20)

Features Added

1.2.0b1 (2025-11-14)

Features Added

1.1.0 (2025-11-03)

Features Added

1.1.0b1 (2025-10-06)

Features Added

1.0.0 (2025-10-01)

Features Added

Breaking Changes

Other Changes

1.0.0b5 (2025-09-26)

Features Added

Breaking Changes

Bug Fixes

Other Changes

1.0.0b4 (2025-09-19)

Features Added

Breaking Changes

Other Changes

1.0.0b3 (2025-09-17)

Features Added

Breaking Changes

Other Changes

1.0.0b2 (2025-09-10)

Features Added

Bugs Fixed

1.0.0b1 (2025-08-28)

Features Added

Other Changes

Project details

Verified details