高性能异步并行音频&VAD处理库

These details have not been verified by PyPI

Project links

Project description

Cascade - Production-Ready, High-Performance, Asynchronous VAD Library

Cascade is a production-ready, high-performance, and low-latency audio stream processing library designed for Voice Activity Detection (VAD). Built upon the excellent Silero VAD model, Cascade significantly reduces VAD processing latency while maintaining high accuracy through its 1:1:1 binding architecture and asynchronous streaming technology.

📊 Performance Benchmarks

Based on our latest streaming VAD performance tests with different chunk sizes:

Streaming Performance by Chunk Size

Chunk Size (bytes)	Processing Time (ms)	Throughput (chunks/sec)	Total Test Time (s)	Speech Segments
1024	0.66	92.2	3.15	2
4096	1.66	82.4	0.89	2
8192	2.95	72.7	0.51	2

Key Performance Metrics

Metric	Value	Description
Best Processing Speed	0.66ms/chunk	Optimal performance with 1024-byte chunks
Peak Throughput	92.2 chunks/sec	Maximum processing throughput
Success Rate	100%	Processing success rate across all tests
Accuracy	High	Guaranteed by the Silero VAD model
Architecture	1:1:1:1	Independent model per processor instance

Performance Characteristics

Excellent performance across chunk sizes: High throughput and low latency with various chunk sizes
Real-time capability: Sub-millisecond processing enables real-time applications
Scalability: Linear performance scaling with independent processor instances

✨ Core Features

🚀 High-Performance Engineering

Lock-Free Design: The 1:1:1 binding architecture eliminates lock contention, boosting performance.
Frame-Aligned Buffer: A highly efficient buffer optimized for 512-sample frames.
Asynchronous Streaming: Non-blocking audio stream processing based on asyncio.
Memory Optimization: Zero-copy design, object pooling, and cache alignment.
Concurrency Optimization: Dedicated threads, asynchronous queues, and batch processing.

🎯 Intelligent Interaction

Real-time Interruption Detection: VAD-based intelligent interruption detection, allowing users to interrupt system responses at any time
State Synchronization Guarantee: Two-way guard mechanism ensures strong consistency between physical and logical layers
Automatic State Management: VAD automatically manages speech collection state, external services control processing state
Anti-false-trigger Design: Minimum interval checking and state mutex locks effectively prevent false triggers
Low-latency Response: Interruption detection latency < 50ms for natural conversation experience

🔧 Robust Software Engineering

Modular Design: A component architecture with high cohesion and low coupling.
Interface Abstraction: Dependency inversion through interface-based design.
Type System: Data validation and type checking using Pydantic.
Comprehensive Testing: Unit, integration, and performance tests.
Code Standards: Adherence to PEP 8 style guidelines.

🛡️ Production-Ready Reliability

Error Handling: Robust error handling and recovery mechanisms.
Resource Management: Automatic cleanup and graceful shutdown.
Monitoring Metrics: Real-time performance monitoring and statistics.
Scalability: Horizontal scaling by increasing the number of instances.
Stability Assurance: Handles boundary conditions and exceptional cases gracefully.

🏗️ Architecture

Cascade employs a 1:1:1:1 independent architecture to ensure optimal performance and thread safety.

graph TD
    Client --> StreamProcessor
    
    subgraph "1:1:1:1 Independent Architecture"
        StreamProcessor --> |per connection| IndependentProcessor[Independent Processor Instance]
        IndependentProcessor --> |independent loading| VADModel[Silero VAD Model]
        IndependentProcessor --> |independent management| VADIterator[VAD Iterator]
        IndependentProcessor --> |independent buffering| FrameBuffer[Frame-Aligned Buffer]
        IndependentProcessor --> |independent state| StateMachine[State Machine]
    end
    
    subgraph "Asynchronous Processing Flow"
        VADModel --> |asyncio.to_thread| VADInference[VAD Inference]
        VADInference --> StateMachine
        StateMachine --> |None| SingleFrame[Single Frame Output]
        StateMachine --> |start| Collecting[Start Collecting]
        StateMachine --> |end| SpeechSegment[Speech Segment Output]
    end

🚀 Quick Start

Installation

pip install cascade-vad

# Using uv is recommended
uv venv -p 3.12

source .venv/bin/activate

# Install from PyPI (recommended)
pip install cascade-vad

# Or install from source
git clone https://github.com/xucailiang/cascade.git
cd cascade
pip install -e .

Basic Usage

import cascade
import asyncio

async def basic_example():
    """A basic usage example."""
    
    # Method 1: Simple file processing
    async for result in cascade.process_audio_file("audio.wav"):
        if result.result_type == "segment":
            segment = result.segment
            print(f"🎤 Speech Segment: {segment.start_timestamp_ms:.0f}ms - {segment.end_timestamp_ms:.0f}ms")
        else:
            frame = result.frame
            print(f"🔇 Single Frame: {frame.timestamp_ms:.0f}ms")
    
    # Method 2: Stream processing
    async with cascade.StreamProcessor() as processor:
        async for result in processor.process_stream(audio_stream):
            if result.result_type == "segment":
                segment = result.segment
                print(f"🎤 Speech Segment: {segment.start_timestamp_ms:.0f}ms - {segment.end_timestamp_ms:.0f}ms")
            else:
                frame = result.frame
                print(f"🔇 Single Frame: {frame.timestamp_ms:.0f}ms")

asyncio.run(basic_example())

Advanced Configuration

import cascade

async def advanced_example():
    """An advanced configuration example."""
    
    # Custom configuration
    config = cascade.Config(
        vad_threshold=0.7,          # Higher detection threshold
        min_silence_duration_ms=100,
        speech_pad_ms=100
    )
    
    # Use the custom config
    async with cascade.StreamProcessor(config) as processor:
        # Process audio stream
        async for result in processor.process_stream(audio_stream):
            # Process results...
            pass
        
        # Get performance statistics
        stats = processor.get_stats()
        print(f"Throughput: {stats.throughput_chunks_per_second:.1f} chunks/sec")

asyncio.run(advanced_example())

Interruption Detection

import cascade

async def interruption_example():
    """Interruption detection example"""
    
    # Configure interruption detection
    config = cascade.Config(
        vad_threshold=0.5,
        interruption_config=cascade.InterruptionConfig(
            enable_interruption=True,  # Enable interruption detection
            min_interval_ms=500        # Minimum interruption interval 500ms
        )
    )
    
    async with cascade.StreamProcessor(config) as processor:
        async for result in processor.process_stream(audio_stream):
            
            # Detect interruption events
            if result.is_interruption:
                print(f"🛑 Interruption detected! Interrupted state: {result.interruption.system_state.value}")
                # Stop current TTS playback
                await tts_service.stop()
                # Cancel LLM request
                await llm_service.cancel()
            
            # Process speech segments
            elif result.is_speech_segment:
                # ASR recognition
                text = await asr_service.recognize(result.segment.audio_data)
                
                # Set to processing
                processor.set_system_state(cascade.SystemState.PROCESSING)
                
                # LLM generation
                response = await llm_service.generate(text)
                
                # Set to responding
                processor.set_system_state(cascade.SystemState.RESPONDING)
                
                # TTS playback
                await tts_service.play(response)
                
                # Reset to idle after completion
                processor.set_system_state(cascade.SystemState.IDLE)

asyncio.run(interruption_example())

For detailed documentation, see: Interruption Implementation Summary

🧪 Testing

# Run basic integration tests
python tests/test_simple_vad.py -v

# Run simulated audio stream tests
python tests/test_stream_vad.py -v

# Run performance benchmark tests
python tests/benchmark_performance.py

Test Coverage:

✅ Basic API Usage
✅ Stream Processing
✅ File Processing
✅ Real Audio VAD
✅ Automatic Speech Segment Saving
✅ 1:1:1:1 Architecture Validation
✅ Performance Benchmarks
✅ FrameAlignedBuffer Tests

🌐 Web Demo

We provide a complete WebSocket-based web demonstration that showcases Cascade's real-time VAD capabilities with multiple client support.

Web Demo Screenshot

Features

Real-time Audio Processing: Capture audio from browser microphone and process with VAD
Live VAD Visualization: Real-time display of VAD detection results
Speech Segment Management: Display detected speech segments with playback support
Dynamic VAD Configuration: Adjust VAD parameters in real-time
Multi-client Support: Independent Cascade instances for each WebSocket connection

Quick Start

# Start backend server
cd web_demo
python server.py

# Start frontend (in another terminal)
cd web_demo/frontend
pnpm install && pnpm dev

For detailed setup instructions, see Web Demo Documentation.

🔧 Production Deployment

Best Practices

Resource Allocation
- Each instance uses approximately 50MB of memory.
- Recommended: 2-3 instances per CPU core.
- Monitor memory usage to prevent Out-of-Memory (OOM) errors.
Performance Tuning
- Adjust max_instances to match server CPU cores.
- Increase buffer_size_frames for higher throughput.
- Tune vad_threshold to balance accuracy and sensitivity.
Error Handling
- Implement retry mechanisms for transient errors.
- Use health checks to monitor service status.
- Log detailed information for troubleshooting.

Monitoring Metrics

# Get performance monitoring metrics
stats = processor.get_stats()

# Key monitoring metrics
print(f"Total Chunks Processed: {stats.total_chunks_processed}")
print(f"Average Processing Time: {stats.average_processing_time_ms:.2f}ms")
print(f"Throughput: {stats.throughput_chunks_per_second:.1f} chunks/sec")
print(f"Speech Segments: {stats.speech_segments}")
print(f"Error Rate: {stats.error_rate:.2%}")
print(f"Memory Usage: {stats.memory_usage_mb:.1f}MB")

🔧 Requirements

Core Dependencies

Python: 3.12 (recommended)
pydantic: 2.4.0+ (Data validation)
numpy: 1.24.0+ (Numerical computation)
scipy: 1.11.0+ (Signal processing)
silero-vad: 5.1.2+ (VAD model)
onnxruntime: 1.22.1+ (ONNX inference)
torchaudio: 2.7.1+ (Audio processing)

Development Dependencies

pytest: Testing framework
black: Code formatter
ruff: Linter
mypy: Type checker
pre-commit: Git hooks

🤝 Contribution Guide

We welcome community contributions! Please follow these steps:

Fork the project and create a feature branch.
Install development dependencies: pip install -e .[dev]
Run tests: pytest
Lint your code: ruff check . && black --check .
Type check: mypy cascade
Submit a Pull Request with a clear description of your changes.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Silero Team: For their excellent VAD model.
PyTorch Team: For the deep learning framework.
Pydantic Team: For the type validation system.
Python Community: For the rich ecosystem.

📞 Contact

Author: Xucailiang
Email: xucailiang.ai@gmail.com
Project Homepage: https://github.com/xucailiang/cascade
Issue Tracker: https://github.com/xucailiang/cascade/issues
Documentation: https://cascade-vad.readthedocs.io/

img_v3_02ra_9845ba4a-a36d-4387-9d01-2b392c94d6cg

⭐ If you find this project helpful, please give it a star!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.0

Dec 22, 2025

1.0.0

Oct 22, 2025

0.2.0

Aug 31, 2025

0.1.1

Aug 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cascade_vad-2.0.0.tar.gz (41.9 kB view details)

Uploaded Dec 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cascade_vad-2.0.0-py3-none-any.whl (27.1 kB view details)

Uploaded Dec 22, 2025 Python 3

File details

Details for the file cascade_vad-2.0.0.tar.gz.

File metadata

Download URL: cascade_vad-2.0.0.tar.gz
Upload date: Dec 22, 2025
Size: 41.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for cascade_vad-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`f40cd1f767ba6c196d9e20a240ea7f4c82211cae3b5069cab2c984f1b08c45cb`
MD5	`3afef9190127a96227e9befbfa85b740`
BLAKE2b-256	`5495ab08d29d5771640cc9ffd7579cd504f7aabcc04049fd2c28f70f53c4074e`

See more details on using hashes here.

File details

Details for the file cascade_vad-2.0.0-py3-none-any.whl.

File metadata

Download URL: cascade_vad-2.0.0-py3-none-any.whl
Upload date: Dec 22, 2025
Size: 27.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for cascade_vad-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0c9be60d9a786b00b4dea71f712c2f579144ee7d1baa7be139d22575c6abf245`
MD5	`9458e831d88fa6e7856833a5c376e553`
BLAKE2b-256	`090d4706c6957b5617521fac21e73b2c36f268e351424b6254785836e1b81943`

See more details on using hashes here.

cascade-vad 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Cascade - Production-Ready, High-Performance, Asynchronous VAD Library

📊 Performance Benchmarks

Streaming Performance by Chunk Size

Key Performance Metrics

Performance Characteristics

✨ Core Features

🚀 High-Performance Engineering

🎯 Intelligent Interaction

🔧 Robust Software Engineering

🛡️ Production-Ready Reliability

🏗️ Architecture

🚀 Quick Start

Installation

Basic Usage

Advanced Configuration

Interruption Detection

🧪 Testing

🌐 Web Demo

Features

Quick Start

🔧 Production Deployment

Best Practices

Monitoring Metrics

🔧 Requirements

Core Dependencies

Development Dependencies

🤝 Contribution Guide

📄 License

🙏 Acknowledgments

📞 Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes