高性能异步并行音频&VAD处理库
Project description
Cascade - Production-Ready, High-Performance, Asynchronous VAD Library
Cascade is a production-ready, high-performance, and low-latency audio stream processing library designed for Voice Activity Detection (VAD). Built upon the excellent Silero VAD model, Cascade significantly reduces VAD processing latency while maintaining high accuracy through its 1:1:1 binding architecture and asynchronous streaming technology.
📊 Performance Benchmarks
Based on our latest streaming VAD performance tests with different chunk sizes:
Streaming Performance by Chunk Size
| Chunk Size (bytes) | Processing Time (ms) | Throughput (chunks/sec) | Total Test Time (s) | Speech Segments |
|---|---|---|---|---|
| 1024 | 0.66 | 92.2 | 3.15 | 2 |
| 4096 | 1.66 | 82.4 | 0.89 | 2 |
| 8192 | 2.95 | 72.7 | 0.51 | 2 |
Key Performance Metrics
| Metric | Value | Description |
|---|---|---|
| Best Processing Speed | 0.66ms/chunk | Optimal performance with 1024-byte chunks |
| Peak Throughput | 92.2 chunks/sec | Maximum processing throughput |
| Success Rate | 100% | Processing success rate across all tests |
| Accuracy | High | Guaranteed by the Silero VAD model |
| Architecture | 1:1:1:1 | Independent model per processor instance |
Performance Characteristics
- Excellent performance across chunk sizes: High throughput and low latency with various chunk sizes
- Real-time capability: Sub-millisecond processing enables real-time applications
- Scalability: Linear performance scaling with independent processor instances
✨ Core Features
🚀 High-Performance Engineering
- Lock-Free Design: The 1:1:1 binding architecture eliminates lock contention, boosting performance.
- Frame-Aligned Buffer: A highly efficient buffer optimized for 512-sample frames.
- Asynchronous Streaming: Non-blocking audio stream processing based on
asyncio. - Memory Optimization: Zero-copy design, object pooling, and cache alignment.
- Concurrency Optimization: Dedicated threads, asynchronous queues, and batch processing.
🎯 Intelligent Interaction
- Real-time Interruption Detection: VAD-based intelligent interruption detection, allowing users to interrupt system responses at any time
- State Synchronization Guarantee: Two-way guard mechanism ensures strong consistency between physical and logical layers
- Automatic State Management: VAD automatically manages speech collection state, external services control processing state
- Anti-false-trigger Design: Minimum interval checking and state mutex locks effectively prevent false triggers
- Low-latency Response: Interruption detection latency < 50ms for natural conversation experience
🔧 Robust Software Engineering
- Modular Design: A component architecture with high cohesion and low coupling.
- Interface Abstraction: Dependency inversion through interface-based design.
- Type System: Data validation and type checking using Pydantic.
- Comprehensive Testing: Unit, integration, and performance tests.
- Code Standards: Adherence to PEP 8 style guidelines.
🛡️ Production-Ready Reliability
- Error Handling: Robust error handling and recovery mechanisms.
- Resource Management: Automatic cleanup and graceful shutdown.
- Monitoring Metrics: Real-time performance monitoring and statistics.
- Scalability: Horizontal scaling by increasing the number of instances.
- Stability Assurance: Handles boundary conditions and exceptional cases gracefully.
🏗️ Architecture
Cascade employs a 1:1:1:1 independent architecture to ensure optimal performance and thread safety.
graph TD
Client --> StreamProcessor
subgraph "1:1:1:1 Independent Architecture"
StreamProcessor --> |per connection| IndependentProcessor[Independent Processor Instance]
IndependentProcessor --> |independent loading| VADModel[Silero VAD Model]
IndependentProcessor --> |independent management| VADIterator[VAD Iterator]
IndependentProcessor --> |independent buffering| FrameBuffer[Frame-Aligned Buffer]
IndependentProcessor --> |independent state| StateMachine[State Machine]
end
subgraph "Asynchronous Processing Flow"
VADModel --> |asyncio.to_thread| VADInference[VAD Inference]
VADInference --> StateMachine
StateMachine --> |None| SingleFrame[Single Frame Output]
StateMachine --> |start| Collecting[Start Collecting]
StateMachine --> |end| SpeechSegment[Speech Segment Output]
end
🚀 Quick Start
Installation
pip install cascade-vad
OR
# Using uv is recommended
uv venv -p 3.12
source .venv/bin/activate
# Install from PyPI (recommended)
pip install cascade-vad
# Or install from source
git clone https://github.com/xucailiang/cascade.git
cd cascade
pip install -e .
Basic Usage
import cascade
import asyncio
async def basic_example():
"""A basic usage example."""
# Method 1: Simple file processing
async for result in cascade.process_audio_file("audio.wav"):
if result.result_type == "segment":
segment = result.segment
print(f"🎤 Speech Segment: {segment.start_timestamp_ms:.0f}ms - {segment.end_timestamp_ms:.0f}ms")
else:
frame = result.frame
print(f"🔇 Single Frame: {frame.timestamp_ms:.0f}ms")
# Method 2: Stream processing
async with cascade.StreamProcessor() as processor:
async for result in processor.process_stream(audio_stream):
if result.result_type == "segment":
segment = result.segment
print(f"🎤 Speech Segment: {segment.start_timestamp_ms:.0f}ms - {segment.end_timestamp_ms:.0f}ms")
else:
frame = result.frame
print(f"🔇 Single Frame: {frame.timestamp_ms:.0f}ms")
asyncio.run(basic_example())
Advanced Configuration
import cascade
async def advanced_example():
"""An advanced configuration example."""
# Custom configuration
config = cascade.Config(
vad_threshold=0.7, # Higher detection threshold
min_silence_duration_ms=100,
speech_pad_ms=100
)
# Use the custom config
async with cascade.StreamProcessor(config) as processor:
# Process audio stream
async for result in processor.process_stream(audio_stream):
# Process results...
pass
# Get performance statistics
stats = processor.get_stats()
print(f"Throughput: {stats.throughput_chunks_per_second:.1f} chunks/sec")
asyncio.run(advanced_example())
Interruption Detection
import cascade
async def interruption_example():
"""Interruption detection example"""
# Configure interruption detection
config = cascade.Config(
vad_threshold=0.5,
interruption_config=cascade.InterruptionConfig(
enable_interruption=True, # Enable interruption detection
min_interval_ms=500 # Minimum interruption interval 500ms
)
)
async with cascade.StreamProcessor(config) as processor:
async for result in processor.process_stream(audio_stream):
# Detect interruption events
if result.is_interruption:
print(f"🛑 Interruption detected! Interrupted state: {result.interruption.system_state.value}")
# Stop current TTS playback
await tts_service.stop()
# Cancel LLM request
await llm_service.cancel()
# Process speech segments
elif result.is_speech_segment:
# ASR recognition
text = await asr_service.recognize(result.segment.audio_data)
# Set to processing
processor.set_system_state(cascade.SystemState.PROCESSING)
# LLM generation
response = await llm_service.generate(text)
# Set to responding
processor.set_system_state(cascade.SystemState.RESPONDING)
# TTS playback
await tts_service.play(response)
# Reset to idle after completion
processor.set_system_state(cascade.SystemState.IDLE)
asyncio.run(interruption_example())
For detailed documentation, see: Interruption Implementation Summary
🧪 Testing
# Run basic integration tests
python tests/test_simple_vad.py -v
# Run simulated audio stream tests
python tests/test_stream_vad.py -v
# Run performance benchmark tests
python tests/benchmark_performance.py
Test Coverage:
- ✅ Basic API Usage
- ✅ Stream Processing
- ✅ File Processing
- ✅ Real Audio VAD
- ✅ Automatic Speech Segment Saving
- ✅ 1:1:1:1 Architecture Validation
- ✅ Performance Benchmarks
- ✅ FrameAlignedBuffer Tests
🌐 Web Demo
We provide a complete WebSocket-based web demonstration that showcases Cascade's real-time VAD capabilities with multiple client support.
Features
- Real-time Audio Processing: Capture audio from browser microphone and process with VAD
- Live VAD Visualization: Real-time display of VAD detection results
- Speech Segment Management: Display detected speech segments with playback support
- Dynamic VAD Configuration: Adjust VAD parameters in real-time
- Multi-client Support: Independent Cascade instances for each WebSocket connection
Quick Start
# Start backend server
cd web_demo
python server.py
# Start frontend (in another terminal)
cd web_demo/frontend
pnpm install && pnpm dev
For detailed setup instructions, see Web Demo Documentation.
🔧 Production Deployment
Best Practices
-
Resource Allocation
- Each instance uses approximately 50MB of memory.
- Recommended: 2-3 instances per CPU core.
- Monitor memory usage to prevent Out-of-Memory (OOM) errors.
-
Performance Tuning
- Adjust
max_instancesto match server CPU cores. - Increase
buffer_size_framesfor higher throughput. - Tune
vad_thresholdto balance accuracy and sensitivity.
- Adjust
-
Error Handling
- Implement retry mechanisms for transient errors.
- Use health checks to monitor service status.
- Log detailed information for troubleshooting.
Monitoring Metrics
# Get performance monitoring metrics
stats = processor.get_stats()
# Key monitoring metrics
print(f"Total Chunks Processed: {stats.total_chunks_processed}")
print(f"Average Processing Time: {stats.average_processing_time_ms:.2f}ms")
print(f"Throughput: {stats.throughput_chunks_per_second:.1f} chunks/sec")
print(f"Speech Segments: {stats.speech_segments}")
print(f"Error Rate: {stats.error_rate:.2%}")
print(f"Memory Usage: {stats.memory_usage_mb:.1f}MB")
🔧 Requirements
Core Dependencies
- Python: 3.12 (recommended)
- pydantic: 2.4.0+ (Data validation)
- numpy: 1.24.0+ (Numerical computation)
- scipy: 1.11.0+ (Signal processing)
- silero-vad: 5.1.2+ (VAD model)
- onnxruntime: 1.22.1+ (ONNX inference)
- torchaudio: 2.7.1+ (Audio processing)
Development Dependencies
- pytest: Testing framework
- black: Code formatter
- ruff: Linter
- mypy: Type checker
- pre-commit: Git hooks
🤝 Contribution Guide
We welcome community contributions! Please follow these steps:
- Fork the project and create a feature branch.
- Install development dependencies:
pip install -e .[dev] - Run tests:
pytest - Lint your code:
ruff check . && black --check . - Type check:
mypy cascade - Submit a Pull Request with a clear description of your changes.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Silero Team: For their excellent VAD model.
- PyTorch Team: For the deep learning framework.
- Pydantic Team: For the type validation system.
- Python Community: For the rich ecosystem.
📞 Contact
- Author: Xucailiang
- Email: xucailiang.ai@gmail.com
- Project Homepage: https://github.com/xucailiang/cascade
- Issue Tracker: https://github.com/xucailiang/cascade/issues
- Documentation: https://cascade-vad.readthedocs.io/
⭐ If you find this project helpful, please give it a star!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cascade_vad-2.0.0.tar.gz.
File metadata
- Download URL: cascade_vad-2.0.0.tar.gz
- Upload date:
- Size: 41.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f40cd1f767ba6c196d9e20a240ea7f4c82211cae3b5069cab2c984f1b08c45cb
|
|
| MD5 |
3afef9190127a96227e9befbfa85b740
|
|
| BLAKE2b-256 |
5495ab08d29d5771640cc9ffd7579cd504f7aabcc04049fd2c28f70f53c4074e
|
File details
Details for the file cascade_vad-2.0.0-py3-none-any.whl.
File metadata
- Download URL: cascade_vad-2.0.0-py3-none-any.whl
- Upload date:
- Size: 27.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c9be60d9a786b00b4dea71f712c2f579144ee7d1baa7be139d22575c6abf245
|
|
| MD5 |
9458e831d88fa6e7856833a5c376e553
|
|
| BLAKE2b-256 |
090d4706c6957b5617521fac21e73b2c36f268e351424b6254785836e1b81943
|