Skip to main content

高性能异步并行音频&VAD处理库

Project description

Cascade - Production-Ready, High-Performance, Asynchronous VAD Library

中文

Python Version License Development Status Powered by Silero VAD Build Status Code Coverage

Cascade is a production-ready, high-performance, and low-latency audio stream processing library designed for Voice Activity Detection (VAD). Built upon the excellent Silero VAD model, Cascade significantly reduces VAD processing latency while maintaining high accuracy through its 1:1:1 binding architecture and asynchronous streaming technology.

📊 Performance Benchmarks

Based on our latest streaming VAD performance tests with different chunk sizes:

Streaming Performance by Chunk Size

Chunk Size (bytes) Processing Time (ms) Throughput (chunks/sec) Total Test Time (s) Speech Segments
1024 0.66 92.2 3.15 2
4096 1.66 82.4 0.89 2
8192 2.95 72.7 0.51 2

Key Performance Metrics

Metric Value Description
Best Processing Speed 0.66ms/chunk Optimal performance with 1024-byte chunks
Peak Throughput 92.2 chunks/sec Maximum processing throughput
Success Rate 100% Processing success rate across all tests
Accuracy High Guaranteed by the Silero VAD model
Architecture 1:1:1:1 Independent model per processor instance

Performance Characteristics

  • Excellent performance across chunk sizes: High throughput and low latency with various chunk sizes
  • Real-time capability: Sub-millisecond processing enables real-time applications
  • Scalability: Linear performance scaling with independent processor instances

✨ Core Features

🚀 High-Performance Engineering

  • Lock-Free Design: The 1:1:1 binding architecture eliminates lock contention, boosting performance.
  • Frame-Aligned Buffer: A highly efficient buffer optimized for 512-sample frames.
  • Asynchronous Streaming: Non-blocking audio stream processing based on asyncio.
  • Memory Optimization: Zero-copy design, object pooling, and cache alignment.
  • Concurrency Optimization: Dedicated threads, asynchronous queues, and batch processing.

🎯 Intelligent Interaction

  • Real-time Interruption Detection: VAD-based intelligent interruption detection, allowing users to interrupt system responses at any time
  • State Synchronization Guarantee: Two-way guard mechanism ensures strong consistency between physical and logical layers
  • Automatic State Management: VAD automatically manages speech collection state, external services control processing state
  • Anti-false-trigger Design: Minimum interval checking and state mutex locks effectively prevent false triggers
  • Low-latency Response: Interruption detection latency < 50ms for natural conversation experience

🔧 Robust Software Engineering

  • Modular Design: A component architecture with high cohesion and low coupling.
  • Interface Abstraction: Dependency inversion through interface-based design.
  • Type System: Data validation and type checking using Pydantic.
  • Comprehensive Testing: Unit, integration, and performance tests.
  • Code Standards: Adherence to PEP 8 style guidelines.

🛡️ Production-Ready Reliability

  • Error Handling: Robust error handling and recovery mechanisms.
  • Resource Management: Automatic cleanup and graceful shutdown.
  • Monitoring Metrics: Real-time performance monitoring and statistics.
  • Scalability: Horizontal scaling by increasing the number of instances.
  • Stability Assurance: Handles boundary conditions and exceptional cases gracefully.

🏗️ Architecture

Cascade employs a 1:1:1:1 independent architecture to ensure optimal performance and thread safety.

graph TD
    Client --> StreamProcessor
    
    subgraph "1:1:1:1 Independent Architecture"
        StreamProcessor --> |per connection| IndependentProcessor[Independent Processor Instance]
        IndependentProcessor --> |independent loading| VADModel[Silero VAD Model]
        IndependentProcessor --> |independent management| VADIterator[VAD Iterator]
        IndependentProcessor --> |independent buffering| FrameBuffer[Frame-Aligned Buffer]
        IndependentProcessor --> |independent state| StateMachine[State Machine]
    end
    
    subgraph "Asynchronous Processing Flow"
        VADModel --> |asyncio.to_thread| VADInference[VAD Inference]
        VADInference --> StateMachine
        StateMachine --> |None| SingleFrame[Single Frame Output]
        StateMachine --> |start| Collecting[Start Collecting]
        StateMachine --> |end| SpeechSegment[Speech Segment Output]
    end

🚀 Quick Start

Installation

pip install cascade-vad

OR

# Using uv is recommended
uv venv -p 3.12

source .venv/bin/activate

# Install from PyPI (recommended)
pip install cascade-vad

# Or install from source
git clone https://github.com/xucailiang/cascade.git
cd cascade
pip install -e .

Basic Usage

import cascade
import asyncio

async def basic_example():
    """A basic usage example."""
    
    # Method 1: Simple file processing
    async for result in cascade.process_audio_file("audio.wav"):
        if result.result_type == "segment":
            segment = result.segment
            print(f"🎤 Speech Segment: {segment.start_timestamp_ms:.0f}ms - {segment.end_timestamp_ms:.0f}ms")
        else:
            frame = result.frame
            print(f"🔇 Single Frame: {frame.timestamp_ms:.0f}ms")
    
    # Method 2: Stream processing
    async with cascade.StreamProcessor() as processor:
        async for result in processor.process_stream(audio_stream):
            if result.result_type == "segment":
                segment = result.segment
                print(f"🎤 Speech Segment: {segment.start_timestamp_ms:.0f}ms - {segment.end_timestamp_ms:.0f}ms")
            else:
                frame = result.frame
                print(f"🔇 Single Frame: {frame.timestamp_ms:.0f}ms")

asyncio.run(basic_example())

Advanced Configuration

import cascade

async def advanced_example():
    """An advanced configuration example."""
    
    # Custom configuration
    config = cascade.Config(
        vad_threshold=0.7,          # Higher detection threshold
        min_silence_duration_ms=100,
        speech_pad_ms=100
    )
    
    # Use the custom config
    async with cascade.StreamProcessor(config) as processor:
        # Process audio stream
        async for result in processor.process_stream(audio_stream):
            # Process results...
            pass
        
        # Get performance statistics
        stats = processor.get_stats()
        print(f"Throughput: {stats.throughput_chunks_per_second:.1f} chunks/sec")

asyncio.run(advanced_example())

Interruption Detection

import cascade

async def interruption_example():
    """Interruption detection example"""
    
    # Configure interruption detection
    config = cascade.Config(
        vad_threshold=0.5,
        interruption_config=cascade.InterruptionConfig(
            enable_interruption=True,  # Enable interruption detection
            min_interval_ms=500        # Minimum interruption interval 500ms
        )
    )
    
    async with cascade.StreamProcessor(config) as processor:
        async for result in processor.process_stream(audio_stream):
            
            # Detect interruption events
            if result.is_interruption:
                print(f"🛑 Interruption detected! Interrupted state: {result.interruption.system_state.value}")
                # Stop current TTS playback
                await tts_service.stop()
                # Cancel LLM request
                await llm_service.cancel()
            
            # Process speech segments
            elif result.is_speech_segment:
                # ASR recognition
                text = await asr_service.recognize(result.segment.audio_data)
                
                # Set to processing
                processor.set_system_state(cascade.SystemState.PROCESSING)
                
                # LLM generation
                response = await llm_service.generate(text)
                
                # Set to responding
                processor.set_system_state(cascade.SystemState.RESPONDING)
                
                # TTS playback
                await tts_service.play(response)
                
                # Reset to idle after completion
                processor.set_system_state(cascade.SystemState.IDLE)

asyncio.run(interruption_example())

For detailed documentation, see: Interruption Implementation Summary

🧪 Testing

# Run basic integration tests
python tests/test_simple_vad.py -v

# Run simulated audio stream tests
python tests/test_stream_vad.py -v

# Run performance benchmark tests
python tests/benchmark_performance.py

Test Coverage:

  • ✅ Basic API Usage
  • ✅ Stream Processing
  • ✅ File Processing
  • ✅ Real Audio VAD
  • ✅ Automatic Speech Segment Saving
  • ✅ 1:1:1:1 Architecture Validation
  • ✅ Performance Benchmarks
  • ✅ FrameAlignedBuffer Tests

🌐 Web Demo

We provide a complete WebSocket-based web demonstration that showcases Cascade's real-time VAD capabilities with multiple client support.

Web Demo Screenshot

Features

  • Real-time Audio Processing: Capture audio from browser microphone and process with VAD
  • Live VAD Visualization: Real-time display of VAD detection results
  • Speech Segment Management: Display detected speech segments with playback support
  • Dynamic VAD Configuration: Adjust VAD parameters in real-time
  • Multi-client Support: Independent Cascade instances for each WebSocket connection

Quick Start

# Start backend server
cd web_demo
python server.py

# Start frontend (in another terminal)
cd web_demo/frontend
pnpm install && pnpm dev

For detailed setup instructions, see Web Demo Documentation.

🔧 Production Deployment

Best Practices

  1. Resource Allocation

    • Each instance uses approximately 50MB of memory.
    • Recommended: 2-3 instances per CPU core.
    • Monitor memory usage to prevent Out-of-Memory (OOM) errors.
  2. Performance Tuning

    • Adjust max_instances to match server CPU cores.
    • Increase buffer_size_frames for higher throughput.
    • Tune vad_threshold to balance accuracy and sensitivity.
  3. Error Handling

    • Implement retry mechanisms for transient errors.
    • Use health checks to monitor service status.
    • Log detailed information for troubleshooting.

Monitoring Metrics

# Get performance monitoring metrics
stats = processor.get_stats()

# Key monitoring metrics
print(f"Total Chunks Processed: {stats.total_chunks_processed}")
print(f"Average Processing Time: {stats.average_processing_time_ms:.2f}ms")
print(f"Throughput: {stats.throughput_chunks_per_second:.1f} chunks/sec")
print(f"Speech Segments: {stats.speech_segments}")
print(f"Error Rate: {stats.error_rate:.2%}")
print(f"Memory Usage: {stats.memory_usage_mb:.1f}MB")

🔧 Requirements

Core Dependencies

  • Python: 3.12 (recommended)
  • pydantic: 2.4.0+ (Data validation)
  • numpy: 1.24.0+ (Numerical computation)
  • scipy: 1.11.0+ (Signal processing)
  • silero-vad: 5.1.2+ (VAD model)
  • onnxruntime: 1.22.1+ (ONNX inference)
  • torchaudio: 2.7.1+ (Audio processing)

Development Dependencies

  • pytest: Testing framework
  • black: Code formatter
  • ruff: Linter
  • mypy: Type checker
  • pre-commit: Git hooks

🤝 Contribution Guide

We welcome community contributions! Please follow these steps:

  1. Fork the project and create a feature branch.
  2. Install development dependencies: pip install -e .[dev]
  3. Run tests: pytest
  4. Lint your code: ruff check . && black --check .
  5. Type check: mypy cascade
  6. Submit a Pull Request with a clear description of your changes.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Silero Team: For their excellent VAD model.
  • PyTorch Team: For the deep learning framework.
  • Pydantic Team: For the type validation system.
  • Python Community: For the rich ecosystem.

📞 Contact

img_v3_02ra_9845ba4a-a36d-4387-9d01-2b392c94d6cg


⭐ If you find this project helpful, please give it a star!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cascade_vad-2.0.0.tar.gz (41.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cascade_vad-2.0.0-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file cascade_vad-2.0.0.tar.gz.

File metadata

  • Download URL: cascade_vad-2.0.0.tar.gz
  • Upload date:
  • Size: 41.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for cascade_vad-2.0.0.tar.gz
Algorithm Hash digest
SHA256 f40cd1f767ba6c196d9e20a240ea7f4c82211cae3b5069cab2c984f1b08c45cb
MD5 3afef9190127a96227e9befbfa85b740
BLAKE2b-256 5495ab08d29d5771640cc9ffd7579cd504f7aabcc04049fd2c28f70f53c4074e

See more details on using hashes here.

File details

Details for the file cascade_vad-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: cascade_vad-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for cascade_vad-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0c9be60d9a786b00b4dea71f712c2f579144ee7d1baa7be139d22575c6abf245
MD5 9458e831d88fa6e7856833a5c376e553
BLAKE2b-256 090d4706c6957b5617521fac21e73b2c36f268e351424b6254785836e1b81943

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page