Skip to main content

A Python SDK for voice processing and real-time audio communication

Project description

Vocals SDK Python

PyPI version License: MIT GitHub issues

A Python SDK for voice processing and real-time audio communication with AI assistants. Stream microphone input or audio files to receive live transcription, AI responses, and text-to-speech audio.

Features both class-based and functional interfaces for maximum flexibility and ease of use.

Features

  • ๐ŸŽค Real-time microphone streaming with voice activity detection
  • ๐Ÿ“ Audio file playback support (WAV format)
  • โœจ Live transcription with partial and final results
  • ๐Ÿค– Streaming AI responses with real-time text display
  • ๐Ÿ”Š Text-to-speech playback with automatic audio queueing
  • ๐Ÿ“Š Conversation tracking and session statistics
  • ๐Ÿš€ Easy setup with minimal configuration required
  • ๐Ÿ”„ Auto-reconnection and robust error handling
  • ๐ŸŽ›๏ธ Class-based API with modern Python patterns
  • ๐Ÿ”€ Context manager support for automatic cleanup

Table of Contents

Installation

pip install vocals

Quick Setup

After installation, use the built-in setup wizard to configure your environment:

vocals setup

Or test your installation:

vocals test

Run a quick demo:

vocals demo

๐ŸŒ Web UI Demo

NEW! Launch an interactive web interface to try the voice assistant:

vocals demo --ui

This will:

  • โœ… Automatically install Gradio (if not already installed)
  • ๐Ÿš€ Launch a web interface in your browser
  • ๐ŸŽค Real-time voice interaction with visual feedback
  • ๐Ÿ“ฑ Easy-to-use interface with buttons and live updates
  • ๐Ÿ”Š Live transcription and AI responses in the browser

Perfect for:

  • ๐ŸŽฏ Quick demonstrations and testing
  • ๐Ÿ‘ฅ Showing to others without command line
  • ๐Ÿ–ฅ๏ธ Visual feedback and status indicators
  • ๐Ÿ“Š Real-time conversation tracking

The web UI provides the same functionality as the command line demo but with an intuitive graphical interface that's perfect for demonstrations and interactive testing.

System Requirements

  • Python 3.8 or higher
  • Working microphone (for microphone streaming)
  • Audio output device (for TTS playback)

Additional Dependencies

The SDK automatically installs all required Python dependencies including pyaudio, sounddevice, numpy, websockets, and others.

On some Linux systems, you may need to install system-level audio libraries:

Ubuntu/Debian:

sudo apt-get install portaudio19-dev

Other Linux distributions:

# Install portaudio development headers using your package manager
# For example, on CentOS/RHEL: sudo yum install portaudio-devel

Quick Start

1. Get Your API Key

Set up your Vocals API key as an environment variable:

export VOCALS_DEV_API_KEY="your_api_key_here"

Or create a .env file in your project:

VOCALS_DEV_API_KEY=your_api_key_here

2. Basic Usage

The Vocals SDK provides a modern class-based API as the primary interface

Microphone Streaming (Minimal Example)

import asyncio
from vocals import VocalsClient

async def main():
    # Create client instance
    client = VocalsClient()

    # Stream microphone for 10 seconds
    await client.stream_microphone(duration=10.0)

    # Clean up
    await client.disconnect()
    client.cleanup()

if __name__ == "__main__":
    asyncio.run(main())

Audio File Playback (Minimal Example)

import asyncio
from vocals import VocalsClient

async def main():
    # Create client instance
    client = VocalsClient()

    # Stream audio file
    await client.stream_audio_file("path/to/your/audio.wav")

    # Clean up
    await client.disconnect()
    client.cleanup()

if __name__ == "__main__":
    asyncio.run(main())

Context Manager Usage (Recommended)

import asyncio
from vocals import VocalsClient

async def main():
    # Use context manager for automatic cleanup
    async with VocalsClient() as client:
        await client.stream_microphone(duration=10.0)

if __name__ == "__main__":
    asyncio.run(main())

SDK Modes

The Vocals SDK supports two usage patterns:

Default Experience (No Modes)

When you create the client without specifying modes, you get a full auto-contained experience:

# Full experience with automatic handlers, playback, and beautiful console output
client = VocalsClient()

Features:

  • โœ… Automatic transcription display with partial updates
  • โœ… Streaming AI response display in real-time
  • โœ… Automatic TTS audio playback
  • โœ… Speech interruption handling
  • โœ… Beautiful console output with emojis
  • โœ… Perfect for getting started quickly

Controlled Experience (With Modes)

When you specify modes, the client becomes passive and you control everything:

# Controlled experience - you handle all logic
client = VocalsClient(modes=['transcription', 'voice_assistant'])

Available Modes:

  • 'transcription': Enables transcription-related internal processing
  • 'voice_assistant': Enables AI response handling and speech interruption

Features:

  • โœ… No automatic handlers attached
  • โœ… No automatic playback
  • โœ… You attach your own message handlers
  • โœ… You control when to play audio
  • โœ… Perfect for custom applications

Example: Controlled Experience

import asyncio
from vocals import VocalsClient

async def main():
    # Create client with controlled experience
    client = VocalsClient(modes=['transcription', 'voice_assistant'])

    # Custom message handler
    def handle_messages(message):
        if message.type == "transcription" and message.data:
            text = message.data.get("text", "")
            is_partial = message.data.get("is_partial", False)
            if not is_partial:
                print(f"You said: {text}")

        elif message.type == "tts_audio" and message.data:
            text = message.data.get("text", "")
            print(f"AI speaking: {text}")
            # Manually start playback
            asyncio.create_task(client.play_audio())

    # Register your handler
    client.on_message(handle_messages)

    # Stream microphone with context manager
    async with client:
        await client.stream_microphone(
            duration=30.0,
            auto_playback=False  # We control playback
        )

if __name__ == "__main__":
    asyncio.run(main())

Advanced Usage

Enhanced Microphone Streaming

import asyncio
import logging
from vocals import (
    VocalsClient,
    create_enhanced_message_handler,
    create_default_connection_handler,
    create_default_error_handler,
)

async def main():
    # Configure logging for cleaner output
    logging.getLogger("vocals").setLevel(logging.WARNING)

    # Create client with default full experience
    client = VocalsClient()

    try:
        print("๐ŸŽค Starting microphone streaming...")
        print("Speak into your microphone!")

        # Stream microphone with enhanced features
        async with client:
            stats = await client.stream_microphone(
                duration=30.0,            # Record for 30 seconds
                auto_connect=True,        # Auto-connect if needed
                auto_playback=True,       # Auto-play received audio
                verbose=False,            # Client handles display automatically
                stats_tracking=True,      # Track session statistics
                amplitude_threshold=0.01, # Voice activity detection threshold
            )

        # Print session statistics
        print(f"\n๐Ÿ“Š Session Statistics:")
        print(f"   โ€ข Transcriptions: {stats.get('transcriptions', 0)}")
        print(f"   โ€ข AI Responses: {stats.get('responses', 0)}")
        print(f"   โ€ข TTS Segments: {stats.get('tts_segments_received', 0)}")

    except Exception as e:
        print(f"Error: {e}")
        await client.disconnect()
        client.cleanup()

if __name__ == "__main__":
    asyncio.run(main())

Conversation Tracking Example

import asyncio
from vocals import (
    VocalsClient,
    create_conversation_tracker,
    create_enhanced_message_handler,
)

async def main():
    # Create client with controlled experience for custom tracking
    client = VocalsClient(modes=['transcription', 'voice_assistant'])
    conversation_tracker = create_conversation_tracker()

    # Custom message handler with conversation tracking
    def tracking_handler(message):
        # Custom display logic
        if message.type == "transcription" and message.data:
            text = message.data.get("text", "")
            is_partial = message.data.get("is_partial", False)
            if not is_partial and text:
                print(f"๐ŸŽค You: {text}")

        elif message.type == "llm_response" and message.data:
            response = message.data.get("response", "")
            if response:
                print(f"๐Ÿค– AI: {response}")

        elif message.type == "tts_audio" and message.data:
            text = message.data.get("text", "")
            if text:
                print(f"๐Ÿ”Š Playing: {text}")
                # Manually start playback since we're in controlled mode
                asyncio.create_task(client.play_audio())

        # Track conversation based on message type
        if message.type == "transcription" and message.data:
            text = message.data.get("text", "")
            is_partial = message.data.get("is_partial", False)
            if text and not is_partial:
                conversation_tracker["add_transcription"](text, is_partial)

        elif message.type == "llm_response" and message.data:
            response = message.data.get("response", "")
            if response:
                conversation_tracker["add_response"](response)

    # Set up handler
    client.on_message(tracking_handler)

    try:
        # Stream microphone with context manager
        async with client:
            await client.stream_microphone(
                duration=15.0,
                auto_playback=False  # We handle playback manually
            )

        # Print conversation history
        print("\n" + "="*50)
        print("๐Ÿ“œ CONVERSATION HISTORY")
        print("="*50)
        conversation_tracker["print_conversation"]()

        # Print conversation statistics
        stats = conversation_tracker["get_stats"]()
        print(f"\n๐Ÿ“ˆ Session lasted {stats['duration']:.1f} seconds")

    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    asyncio.run(main())

Infinite Streaming with Signal Handling

import asyncio
import signal
from vocals import VocalsClient

# Global shutdown event
shutdown_event = asyncio.Event()

def setup_signal_handlers():
    """Setup signal handlers for graceful shutdown."""
    def signal_handler(signum, frame):
        if not shutdown_event.is_set():
            print(f"\n๐Ÿ“ก Received signal {signum}, shutting down...")
            shutdown_event.set()

    signal.signal(signal.SIGINT, signal_handler)
    signal.signal(signal.SIGTERM, signal_handler)

async def main():
    setup_signal_handlers()

    # Create client
    client = VocalsClient()

    try:
        print("๐ŸŽค Starting infinite streaming...")
        print("Press Ctrl+C to stop")

        # Connect to service
        await client.connect()

        # Create streaming task
        async def stream_task():
            await client.stream_microphone(
                duration=0,  # 0 = infinite streaming
                auto_connect=True,
                auto_playback=True,
                verbose=False,
                stats_tracking=True,
            )

        # Run streaming and wait for shutdown
        streaming_task = asyncio.create_task(stream_task())
        shutdown_task = asyncio.create_task(shutdown_event.wait())

        # Wait for shutdown signal
        await shutdown_task

        # Stop recording gracefully
        await client.stop_recording()

    finally:
        # Cancel streaming task
        if 'streaming_task' in locals():
            streaming_task.cancel()
        await client.disconnect()
        client.cleanup()

if __name__ == "__main__":
    asyncio.run(main())

Custom Audio Processing (Alternative to Local Playback)

Instead of playing audio locally, you can process audio segments with custom handlers - perfect for saving audio files, sending to external players, or implementing custom audio processing:

import asyncio
import base64
from vocals import VocalsClient

async def main():
    """Advanced voice assistant with custom audio processing"""

    # Create client with controlled mode for manual audio handling
    client = VocalsClient(modes=["transcription", "voice_assistant"])

    # Custom state tracking
    conversation_state = {"listening": False, "processing": False, "speaking": False}

    def handle_messages(message):
        """Custom message handler with audio processing control"""

        if message.type == "transcription" and message.data:
            text = message.data.get("text", "")
            is_partial = message.data.get("is_partial", False)

            if is_partial:
                print(f"\r๐ŸŽค Listening: {text}...", end="", flush=True)
            else:
                print(f"\nโœ… You said: {text}")

        elif message.type == "llm_response_streaming" and message.data:
            token = message.data.get("token", "")
            is_complete = message.data.get("is_complete", False)

            if token:
                print(token, end="", flush=True)
            if is_complete:
                print()  # New line

        elif message.type == "tts_audio" and message.data:
            text = message.data.get("text", "")
            if text and not conversation_state["speaking"]:
                print(f"๐Ÿ”Š AI speaking: {text}")
                conversation_state["speaking"] = True

                # Custom audio processing instead of local playback
                def custom_audio_handler(segment):
                    """Process each audio segment with custom logic"""
                    print(f"๐ŸŽต Processing audio: {segment.text}")

                    # Option 1: Save to file
                    audio_data = base64.b64decode(segment.audio_data)
                    filename = f"audio_{segment.segment_id}.wav"
                    with open(filename, "wb") as f:
                        f.write(audio_data)
                    print(f"๐Ÿ’พ Saved audio to: {filename}")

                    # Option 2: Send to external audio player
                    # subprocess.run(["ffplay", "-nodisp", "-autoexit", filename])

                    # Option 3: Stream to audio device
                    # your_audio_device.play(audio_data)

                    # Option 4: Convert format
                    # converted_audio = convert_audio_format(audio_data, target_format)

                    # Option 5: Process with AI/ML
                    # audio_features = extract_audio_features(audio_data)
                    # emotion_score = analyze_emotion(audio_features)

                # Process all available audio segments
                processed_count = client.process_audio_queue(
                    custom_audio_handler,
                    consume_all=True
                )
                print(f"โœ… Processed {processed_count} audio segments")

        elif message.type == "speech_interruption":
            print("\n๐Ÿ›‘ Speech interrupted")
            conversation_state["speaking"] = False

    # Register message handler
    client.on_message(handle_messages)

    # Connection handler
    def handle_connection(state):
        if state.name == "CONNECTED":
            print("โœ… Connected to voice assistant")
        elif state.name == "DISCONNECTED":
            print("โŒ Disconnected from voice assistant")

    client.on_connection_change(handle_connection)

    try:
        print("๐ŸŽค Voice Assistant with Custom Audio Processing")
        print("Audio will be saved to files instead of played locally")
        print("Speak into your microphone...")
        print("Press Ctrl+C to stop")

        # Stream microphone with custom audio handling
        async with client:
            await client.stream_microphone(
                duration=0,           # Infinite recording
                auto_connect=True,    # Auto-connect to service
                auto_playback=False,  # Disable automatic playback - we handle it
                verbose=False,        # Clean output
            )

    except KeyboardInterrupt:
        print("\n๐Ÿ‘‹ Custom audio processing stopped")
    finally:
        await client.disconnect()
        client.cleanup()

if __name__ == "__main__":
    asyncio.run(main())

Key Features of Custom Audio Processing:

  • ๐ŸŽ›๏ธ Full Control: Complete control over audio handling instead of automatic playback
  • ๐Ÿ’พ Save to Files: Save audio segments as individual WAV files
  • ๐Ÿ”„ Format Conversion: Convert audio to different formats before processing
  • ๐ŸŽต External Players: Send audio to external audio players or devices
  • ๐Ÿค– AI Processing: Analyze audio with machine learning models
  • ๐Ÿ“Š Audio Analytics: Extract features, analyze emotion, or process speech patterns
  • ๐Ÿ”Œ Integration: Easily integrate with existing audio pipelines

Use Cases:

  • Recording conversations for later playback
  • Building custom audio players with UI controls
  • Streaming audio to multiple devices simultaneously
  • Processing audio with AI/ML models for analysis
  • Converting audio formats for different platforms
  • Creating audio archives or transcription systems

Configuration

Environment Variables

# Required: Your Vocals API key
export VOCALS_DEV_API_KEY="vdev_your_api_key_here"

Audio Configuration

from vocals import VocalsClient, AudioConfig

# Create custom audio configuration
audio_config = AudioConfig(
    sample_rate=24000,    # Sample rate in Hz
    channels=1,           # Number of audio channels
    format="pcm_f32le",   # Audio format
    buffer_size=1024,     # Audio buffer size
)

# Use with client
client = VocalsClient(audio_config=audio_config)

SDK Configuration

from vocals import VocalsClient, get_default_config

# Get default configuration
config = get_default_config()

# Customize configuration
config.max_reconnect_attempts = 5
config.reconnect_delay = 2.0
config.auto_connect = True
config.token_refresh_buffer = 60.0

# Use with client
client = VocalsClient(config=config)

Complete API Reference

The Vocals SDK provides comprehensive control over voice processing, connection management, audio playback, and event handling. Here's a complete reference of all available controls:

๐ŸŽ›๏ธ Main Control Categories:

  • SDK Creation & Configuration - Initialize and configure the SDK
  • Stream Methods - Control microphone and file streaming
  • Connection Management - Connect, disconnect, and manage WebSocket connections
  • Audio Playback - Control TTS audio playback, queueing, and timing
  • Event Handling - Register handlers for messages, connections, errors, and audio data
  • State Management - Access real-time state information
  • Device Management - Manage and test audio devices

๐Ÿ“‹ Quick Reference:

Control Category Key Methods Purpose
Streaming stream_microphone(), stream_audio_file() Start voice/audio processing
Connection connect(), disconnect(), reconnect() Manage WebSocket connection
Recording start_recording(), stop_recording() Control audio input
Playback play_audio(), pause_audio(), stop_audio() Control TTS audio output
Queue clear_queue(), add_to_queue(), get_audio_queue() Manage audio queue
Events on_message(), on_connection_change(), on_error() Handle events
State get_is_connected(), get_is_playing(), get_recording_state() Check current state

Core Functions

  • VocalsClient(config?, audio_config?, user_id?, modes?) - Create client instance
  • get_default_config() - Get default configuration
  • AudioConfig(...) - Audio configuration class

VocalsClient() Constructor

VocalsClient(
    config: Optional[VocalsConfig] = None,
    audio_config: Optional[AudioConfig] = None,
    user_id: Optional[str] = None,
    modes: List[str] = []  # Controls client behavior
)

Parameters:

  • config: Client configuration options (connection, logging, etc.)
  • audio_config: Audio processing configuration (sample rate, channels, etc.)
  • user_id: Optional user ID for token generation
  • modes: List of modes to control client behavior

Modes:

  • [] (empty list): Default Experience - Full auto-contained behavior with automatic handlers
  • ['transcription']: Controlled - Only transcription-related internal processing
  • ['voice_assistant']: Controlled - Only AI response handling and speech interruption
  • ['transcription', 'voice_assistant']: Controlled - Both features, but no automatic handlers

Audio Configuration

AudioConfig(
    sample_rate: int = 24000,     # Sample rate in Hz
    channels: int = 1,            # Number of audio channels
    format: str = "pcm_f32le",    # Audio format
    buffer_size: int = 1024,      # Audio buffer size
)

Stream Methods

stream_microphone() Parameters

await client.stream_microphone(
    duration: float = 30.0,           # Recording duration in seconds (0 for infinite)
    auto_connect: bool = True,        # Whether to automatically connect if not connected
    auto_playback: bool = True,       # Whether to automatically play received audio
    verbose: bool = True,             # Whether to log detailed progress
    stats_tracking: bool = True,      # Whether to track and return statistics
    amplitude_threshold: float = 0.01 # Minimum amplitude to consider as speech
)

Important: In Controlled Experience (with modes), TTS audio is always added to the queue, but auto_playback=False prevents automatic playback. You must manually call client.play_audio() to play queued audio.

stream_audio_file() Parameters

await client.stream_audio_file(
    file_path: str,                   # Path to the audio file to stream
    chunk_size: int = 1024,           # Size of each chunk to send
    verbose: bool = True,             # Whether to log detailed progress
    auto_connect: bool = True         # Whether to automatically connect if not connected
)

Connection & Recording Methods

await client.connect()                # Connect to WebSocket
await client.disconnect()             # Disconnect from WebSocket
await client.reconnect()              # Reconnect to WebSocket
await client.start_recording()        # Start recording
await client.stop_recording()         # Stop recording

Audio Playback Methods

await client.play_audio()             # Start/resume audio playback
await client.pause_audio()            # Pause audio playback
await client.stop_audio()             # Stop audio playback
await client.fade_out_audio(duration) # Fade out audio over specified duration
client.clear_queue()                  # Clear the audio playback queue
client.add_to_queue(segment)          # Add audio segment to queue

Event Handlers

client.on_message(handler)            # Handle incoming messages
client.on_connection_change(handler)  # Handle connection state changes
client.on_error(handler)              # Handle errors
client.on_audio_data(handler)         # Handle audio data

Handler Functions:

  • handler(message) - Message handler receives WebSocket messages
  • handler(connection_state) - Connection handler receives connection state changes
  • handler(error) - Error handler receives error objects
  • handler(audio_data) - Audio data handler receives real-time audio data

Properties

# Connection properties
client.connection_state               # Get current connection state
client.is_connected                   # Check if connected
client.is_connecting                  # Check if connecting

# Recording properties
client.recording_state                # Get current recording state
client.is_recording                   # Check if recording

# Playback properties
client.playback_state                 # Get current playback state
client.is_playing                     # Check if playing audio
client.audio_queue                    # Get current audio queue
client.current_segment                # Get currently playing segment
client.current_amplitude              # Get current audio amplitude

# Token properties
client.token                          # Get current token
client.token_expires_at               # Get token expiration timestamp

Utility Methods

client.set_user_id(user_id)           # Set user ID for token generation
client.cleanup()                      # Clean up resources
client.process_audio_queue(handler)   # Process audio queue with custom handler

Utility Functions

These utility functions work with both the class-based and functional APIs:

# Message handlers
create_enhanced_message_handler(
    verbose: bool = True,
    show_transcription: bool = True,
    show_responses: bool = True,
    show_streaming: bool = True,
    show_detection: bool = False
)

# Conversation tracking
create_conversation_tracker()

# Statistics tracking
create_microphone_stats_tracker(verbose: bool = True)

# Connection handlers
create_default_connection_handler(verbose: bool = True)
create_default_error_handler(verbose: bool = True)

Audio Device Management

# Device management
list_audio_devices()                  # List available audio devices
get_default_audio_device()            # Get default audio device
test_audio_device(device_id, duration) # Test audio device
validate_audio_device(device_id)      # Validate audio device
get_audio_device_info(device_id)      # Get device information
print_audio_devices()                 # Print formatted device list
create_audio_device_selector()        # Interactive device selector

Auto-Playback Behavior

Default Experience (no modes):

  • auto_playback=True (default): TTS audio plays automatically
  • auto_playback=False: TTS audio is added to queue but doesn't play automatically

Controlled Experience (with modes):

  • auto_playback=True: TTS audio is added to queue and plays automatically
  • auto_playback=False: TTS audio is added to queue but requires manual client.play_audio() call

Key Point: In controlled mode, TTS audio is always added to the queue regardless of auto_playback setting. The auto_playback parameter only controls whether playback starts automatically.

Message Types

Common message types you'll receive in handlers:

# Transcription messages
{
    "type": "transcription",
    "data": {
        "text": "Hello world",
        "is_partial": False,
        "segment_id": "abc123"
    }
}

# LLM streaming response
{
    "type": "llm_response_streaming",
    "data": {
        "token": "Hello",
        "accumulated_response": "Hello",
        "is_complete": False,
        "segment_id": "def456"
    }
}

# TTS audio
{
    "type": "tts_audio",
    "data": {
        "text": "Hello there",
        "audio_data": "base64_encoded_wav_data",
        "sample_rate": 24000,
        "segment_id": "ghi789",
        "duration_seconds": 1.5
    }
}

# Speech interruption
{
    "type": "speech_interruption",
    "data": {}
}

Testing Your Setup

After setting up the SDK, you can test all the controls to ensure everything is working properly:

1. Test Basic Audio Setup

# List available audio devices
vocals devices

# Test your microphone
vocals test-device

# Run system diagnostics
vocals diagnose

2. Test Default Experience

import asyncio
from vocals import VocalsClient

async def test_default():
    """Test default experience with automatic handlers"""
    client = VocalsClient()  # No modes = full automatic experience

    print("๐ŸŽค Testing default experience...")
    print("Speak and listen for AI responses...")

    # Test with automatic playback
    async with client:
        await client.stream_microphone(
            duration=15.0,
            auto_playback=True,  # Should auto-play TTS
            verbose=False
        )

    print("โœ… Default experience test completed")

asyncio.run(test_default())

3. Test Controlled Experience

import asyncio
from vocals import VocalsClient

async def test_controlled():
    """Test controlled experience with manual handlers"""
    client = VocalsClient(modes=['transcription', 'voice_assistant'])

    # Track what we receive
    received_messages = []

    def test_handler(message):
        received_messages.append(message.type)
        print(f"โœ… Received: {message.type}")

        # Test manual playback control
        if message.type == "tts_audio":
            print("๐Ÿ”Š Manually triggering playback...")
            asyncio.create_task(client.play_audio())

    # Register handler
    client.on_message(test_handler)

    print("๐ŸŽค Testing controlled experience...")
    print("Should receive transcription and TTS messages...")

    # Test with manual playback control
    async with client:
        await client.stream_microphone(
            duration=15.0,
            auto_playback=False,  # We control playback manually
            verbose=False
        )

    print(f"๐Ÿ“Š Received message types: {set(received_messages)}")

    # Verify we got the expected message types
    expected_types = ["transcription", "tts_audio"]
    for msg_type in expected_types:
        if msg_type in received_messages:
            print(f"โœ… {msg_type} messages working")
        else:
            print(f"โŒ {msg_type} messages not received")

    print("โœ… Controlled experience test completed")

asyncio.run(test_controlled())

4. Test Audio Playback Controls

import asyncio
from vocals import VocalsClient

async def test_playback_controls():
    """Test all audio playback controls"""
    client = VocalsClient(modes=['transcription', 'voice_assistant'])

    # Test queue management
    print("๐ŸŽต Testing audio playback controls...")

    # Check initial state
    print(f"Initial queue size: {len(client.audio_queue)}")
    print(f"Is playing: {client.is_playing}")

    def audio_handler(message):
        if message.type == "tts_audio":
            print(f"๐ŸŽต Audio received: {message.data.get('text', '')}")
            print(f"Queue size: {len(client.audio_queue)}")

    client.on_message(audio_handler)

    # Stream and collect audio
    async with client:
        await client.stream_microphone(
            duration=10.0,
            auto_playback=False,  # Don't auto-play
            verbose=False
        )

    # Test manual controls
    queue_size = len(client.audio_queue)
    if queue_size > 0:
        print(f"โœ… {queue_size} audio segments in queue")

        print("๐ŸŽต Testing play_audio()...")
        await client.play_audio()

        # Wait a moment then test pause
        await asyncio.sleep(1)
        print("โธ๏ธ Testing pause_audio()...")
        await client.pause_audio()

        print("โ–ถ๏ธ Testing play_audio() again...")
        await client.play_audio()

        # Test stop
        await asyncio.sleep(1)
        print("โน๏ธ Testing stop_audio()...")
        await client.stop_audio()

        print("๐Ÿ—‘๏ธ Testing clear_queue()...")
        client.clear_queue()
        print(f"Queue size after clear: {len(client.audio_queue)}")

        print("โœ… All playback controls working!")
    else:
        print("โŒ No audio received to test playback controls")

    await client.disconnect()
    client.cleanup()

asyncio.run(test_playback_controls())

5. Test All Event Handlers

import asyncio
from vocals import VocalsClient

async def test_event_handlers():
    """Test all event handler types"""
    client = VocalsClient(modes=['transcription', 'voice_assistant'])

    # Track events
    events_received = {
        'messages': 0,
        'connections': 0,
        'errors': 0,
        'audio_data': 0
    }

    def message_handler(message):
        events_received['messages'] += 1
        print(f"๐Ÿ“ฉ Message: {message.type}")

    def connection_handler(state):
        events_received['connections'] += 1
        print(f"๐Ÿ”Œ Connection: {state.name}")

    def error_handler(error):
        events_received['errors'] += 1
        print(f"โŒ Error: {error.message}")

    def audio_data_handler(audio_data):
        events_received['audio_data'] += 1
        if events_received['audio_data'] % 100 == 0:  # Log every 100th
            print(f"๐ŸŽค Audio data chunks: {events_received['audio_data']}")

    # Register all handlers
    client.on_message(message_handler)
    client.on_connection_change(connection_handler)
    client.on_error(error_handler)
    client.on_audio_data(audio_data_handler)

    print("๐Ÿงช Testing all event handlers...")

    async with client:
        await client.stream_microphone(
            duration=10.0,
            auto_playback=False,
            verbose=False
        )

    # Report results
    print("\n๐Ÿ“Š Event Handler Test Results:")
    for event_type, count in events_received.items():
        status = "โœ…" if count > 0 else "โŒ"
        print(f"   {status} {event_type}: {count}")

asyncio.run(test_event_handlers())

6. Validate All Controls Are Working

Run this comprehensive test to verify everything:

# Create a test script
cat > test_all_controls.py << 'EOF'
import asyncio
from vocals import VocalsClient

async def comprehensive_test():
    """Comprehensive test of all client controls"""
    print("๐Ÿงช Comprehensive Client Control Test")
    print("=" * 50)

    # Test 1: Default mode
    print("\n1๏ธโƒฃ Testing Default Mode...")
    client1 = VocalsClient()
    async with client1:
        await client1.stream_microphone(duration=5.0, verbose=False)
    print("โœ… Default mode test completed")

    # Test 2: Controlled mode
    print("\n2๏ธโƒฃ Testing Controlled Mode...")
    client2 = VocalsClient(modes=['transcription', 'voice_assistant'])

    message_count = 0
    def counter(message):
        nonlocal message_count
        message_count += 1
        if message.type == "tts_audio":
            asyncio.create_task(client2.play_audio())

    client2.on_message(counter)
    async with client2:
        await client2.stream_microphone(duration=5.0, auto_playback=False, verbose=False)
    print(f"โœ… Controlled mode test completed - {message_count} messages")

    # Test 3: All controls
    print("\n3๏ธโƒฃ Testing Individual Controls...")
    client3 = VocalsClient()

    # Test properties
    print(f"   Connection state: {client3.connection_state.name}")
    print(f"   Is connected: {client3.is_connected}")
    print(f"   Recording state: {client3.recording_state.name}")
    print(f"   Is recording: {client3.is_recording}")
    print(f"   Playback state: {client3.playback_state.name}")
    print(f"   Is playing: {client3.is_playing}")
    print(f"   Queue length: {len(client3.audio_queue)}")
    print(f"   Current amplitude: {client3.current_amplitude}")

    await client3.disconnect()
    client3.cleanup()
    print("โœ… All controls test completed")

    print("\n๐ŸŽ‰ All tests completed successfully!")

if __name__ == "__main__":
    asyncio.run(comprehensive_test())
EOF

# Run the test
python test_all_controls.py

This comprehensive testing suite will validate that all your controls are working properly after our recent fixes!

CLI Tools

The SDK includes powerful command-line tools for setup, testing, and debugging:

Setup & Configuration

# Interactive setup wizard
vocals setup

# List available audio devices
vocals devices

# Test a specific audio device
vocals test-device 1 --duration 5

# Generate diagnostic report
vocals diagnose

Development Tools

# Run all tests
vocals test

# Run a demo session
vocals demo --duration 30 --verbose

# Create project templates
vocals create-template voice_assistant
vocals create-template file_processor
vocals create-template conversation_tracker
vocals create-template advanced_voice_assistant

Available Templates:

  • voice_assistant: Simple voice assistant (Default Experience)
  • file_processor: Process audio files (Default Experience)
  • conversation_tracker: Track conversations (Controlled Experience)
  • advanced_voice_assistant: Full control voice assistant (Controlled Experience)

All templates use the modern class-based API with VocalsClient.

Advanced Features

# Performance monitoring
vocals demo --duration 60 --stats

# Custom audio device
vocals demo --device 2

# Debug mode
VOCALS_DEBUG_LEVEL=DEBUG vocals demo

Error Handling

The client provides comprehensive error handling:

from vocals import VocalsClient, VocalsError

async def main():
    client = VocalsClient()

    try:
        async with client:
            await client.stream_microphone(duration=10.0)
    except VocalsError as e:
        print(f"Vocals client error: {e}")
    except Exception as e:
        print(f"Unexpected error: {e}")
        # Manual cleanup if context manager fails
        await client.disconnect()
        client.cleanup()

# Alternative without context manager
async def main_manual():
    client = VocalsClient()

    try:
        await client.stream_microphone(duration=10.0)
    except VocalsError as e:
        print(f"Vocals client error: {e}")
    except Exception as e:
        print(f"Unexpected error: {e}")
    finally:
        await client.disconnect()
        client.cleanup()

Troubleshooting

Common Issues

  1. "API key not found"

    • Set environment variable: export VOCALS_DEV_API_KEY="your_key"
    • Or create .env file with the key
    • Ensure the .env file is loaded (e.g., using python-dotenv if needed)
  2. "Connection failed"

    • Check your internet connection
    • Verify API key is valid
    • Check WebSocket endpoint is accessible
    • Try increasing reconnect attempts in config
  3. "No audio input detected"

    • Check microphone permissions
    • Verify microphone is working (use vocals devices to list devices)
    • Adjust amplitude_threshold parameter lower (e.g., 0.005)
    • Test with vocals test-device <id>
  4. Audio playback issues

    • Ensure speakers/headphones are connected
    • Check system audio settings
    • Try different audio formats or sample rates in AudioConfig
  5. High latency

    • Check network speed
    • Reduce buffer_size in AudioConfig
    • Ensure no other apps are using high bandwidth
  6. Dependency errors

    • Run pip install -r requirements.txt again
    • For Linux: Ensure portaudio is installed
    • Try creating a fresh virtual environment

If issues persist, run vocals diagnose and share the output when reporting bugs.

Debug Mode

Enable debug logging to troubleshoot issues:

import logging

# Enable debug logging
logging.basicConfig(level=logging.DEBUG)

# Or for specific modules
logging.getLogger("vocals").setLevel(logging.DEBUG)

Examples

Check out the included examples:

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

For major changes, please open an issue first to discuss what you would like to change.

See CONTRIBUTING.md for more details (feel free to create one if it doesn't exist).

Support

For support, documentation, and updates:

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocals-1.0.984.tar.gz (101.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vocals-1.0.984-py3-none-any.whl (67.5 kB view details)

Uploaded Python 3

File details

Details for the file vocals-1.0.984.tar.gz.

File metadata

  • Download URL: vocals-1.0.984.tar.gz
  • Upload date:
  • Size: 101.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for vocals-1.0.984.tar.gz
Algorithm Hash digest
SHA256 f74ad21ba0a68ff80e39da723260141a9e3cbd7e7253af54454bef6ae58e70d9
MD5 783c54db7beda48fef269b79d547b87c
BLAKE2b-256 db19580616ee1899cb2156601a9bef458be863b999cbc7d8d5fce8867e5e4445

See more details on using hashes here.

File details

Details for the file vocals-1.0.984-py3-none-any.whl.

File metadata

  • Download URL: vocals-1.0.984-py3-none-any.whl
  • Upload date:
  • Size: 67.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.7

File hashes

Hashes for vocals-1.0.984-py3-none-any.whl
Algorithm Hash digest
SHA256 c351a2c8283242298318f5f568d72184b747cd09d7098db75eb658cf6672dcf6
MD5 cabdc160f95b90df69f07dadbd7375c5
BLAKE2b-256 90a95daae33861ca141730e048ac59acc4ad66f5fd6ba6cde5239f2a33664453

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page