A pluggable, async-first Python framework for real-time audio-to-audio conversational AI
Project description
Audio Engine
A pluggable audio-to-audio conversational engine with real-time streaming support.
Features
- Pluggable Architecture: Swap ASR, LLM, and TTS providers easily
- Real-time Streaming: WebSocket server for low-latency conversations
- GeneFace++ Integration: Optional face animation from audio
- Simple API: Get started with just a few lines of code
Installation
pip install atom-audio-engine
For development with all optional dependencies:
pip install atom-audio-engine[all,dev]
Quick Start
Basic Usage
from audio_engine import Pipeline
from audio_engine.asr import WhisperASR
from audio_engine.llm import AnthropicLLM
from audio_engine.tts import CartesiaTTS
# Create pipeline with your providers
pipeline = Pipeline(
asr=CartesiaASR(api_key="your-cartesia-key"),
llm=GroqLLM(api_key="your-groq-key", model="mixtral-8x7b-32768"),
tts=CartesiaTTS(api_key="your-cartesia-key", voice_id="your-voice-id"),
system_prompt="You are a helpful assistant.",
)
async with pipeline:
# Simple: process complete audio
response_audio = await pipeline.process(input_audio_bytes)
# Streaming: lower latency
async for chunk in pipeline.stream(audio_stream):
play_audio(chunk)
WebSocket Server
from audio_engine import Pipeline
from audio_engine.streaming import WebSocketServer
pipeline = Pipeline(asr=..., llm=..., tts=...)
server = WebSocketServer(pipeline, host="0.0.0.0", port=8765)
await server.start()
With GeneFace++ Face Animation
from audio_engine.integrations.geneface import GeneFacePipelineWrapper, GeneFaceConfig
wrapped = GeneFacePipelineWrapper(
pipeline=pipeline,
geneface_config=GeneFaceConfig(
geneface_path="/path/to/ai-geneface-realtime"
)
)
audio, video_path = await wrapped.process_with_video(input_audio)
Architecture
User Audio → ASR → LLM → TTS → Response Audio
↓
GeneFace++ (optional)
↓
Animated Face Video
Directory Structure
audio_engine/
├── core/ # Pipeline and configuration
├── asr/ # Speech-to-Text providers
├── llm/ # LLM providers
├── tts/ # Text-to-Speech providers
├── streaming/ # WebSocket server
├── integrations/ # GeneFace++ integration
├── utils/ # Audio utilities
└── examples/ # Example scripts
Implementing a Provider
Custom ASR
from audio_engine.asr.base import BaseASR
class MyASR(BaseASR):
@property
def name(self) -> str:
return "my-asr"
async def transcribe(self, audio: bytes, sample_rate: int = 16000) -> str:
# Your implementation
pass
async def transcribe_stream(self, audio_stream):
# Your streaming implementation
pass
Custom LLM
from audio_engine.llm.base import BaseLLM
class MyLLM(BaseLLM):
@property
def name(self) -> str:
return "my-llm"
async def generate(self, prompt: str, context=None) -> str:
# Your implementation
pass
async def generate_stream(self, prompt: str, context=None):
# Your streaming implementation
pass
Custom TTS
from audio_engine.tts.base import BaseTTS
class MyTTS(BaseTTS):
@property
def name(self) -> str:
return "my-tts"
async def synthesize(self, text: str) -> bytes:
# Your implementation
pass
async def synthesize_stream(self, text: str):
# Your streaming implementation
pass
WebSocket Protocol
Client → Server
- Binary: Raw audio chunks (PCM 16-bit, 16kHz mono)
- JSON:
{"type": "end_of_speech"}or{"type": "reset"}
Server → Client
- Binary: Response audio chunks
- JSON Events:
{"type": "connected", "client_id": "..."}{"type": "transcript", "text": "..."}{"type": "response_text", "text": "..."}{"type": "response_start"}{"type": "response_end"}
Environment Variables
# ASR
ASR_PROVIDER=whisper
ASR_API_KEY=your-key
# LLM
LLM_PROVIDER=anthropic
LLM_API_KEY=your-key
LLM_MODEL=claude-sonnet-4-20250514
# TTS
TTS_PROVIDER=cartesia
TTS_API_KEY=your-key
TTS_VOICE_ID=your-voice-id
# Debug
DEBUG=true
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
atom_audio_engine-0.1.5.tar.gz
(33.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file atom_audio_engine-0.1.5.tar.gz.
File metadata
- Download URL: atom_audio_engine-0.1.5.tar.gz
- Upload date:
- Size: 33.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79457f0a47907943e21e580c789038dec4baf249d71d96e2a7e9bf565580b0ed
|
|
| MD5 |
a4ffa9f30b5a2535ffb08718664823b7
|
|
| BLAKE2b-256 |
15d5ae1b94d6b3609499da515f20d48757591dbc3845c45d61716322f4d85b12
|
File details
Details for the file atom_audio_engine-0.1.5-py3-none-any.whl.
File metadata
- Download URL: atom_audio_engine-0.1.5-py3-none-any.whl
- Upload date:
- Size: 45.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2026c69704791d87dc67990de547523f63d8541639c6d229f720342a0ec4158
|
|
| MD5 |
83ff73dd48c801123e296c5db8b27d80
|
|
| BLAKE2b-256 |
ad3deaa31510a31150d661203b0ca4a1e2c8546d3f14bee64c2eb8e054a6aee8
|