A pluggable, async-first Python framework for real-time audio-to-audio conversational AI
Project description
Audio Engine
A pluggable audio-to-audio conversational engine with real-time streaming support.
Features
- Pluggable Architecture: Swap ASR, LLM, and TTS providers easily
- Real-time Streaming: WebSocket server for low-latency conversations
- GeneFace++ Integration: Optional face animation from audio
- Simple API: Get started with just a few lines of code
Installation
pip install atom-audio-engine
For development with all optional dependencies:
pip install atom-audio-engine[all,dev]
Quick Start
Basic Usage
from audio_engine import Pipeline
from audio_engine.asr import WhisperASR
from audio_engine.llm import AnthropicLLM
from audio_engine.tts import CartesiaTTS
# Create pipeline with your providers
pipeline = Pipeline(
asr=CartesiaASR(api_key="your-cartesia-key"),
llm=GroqLLM(api_key="your-groq-key", model="mixtral-8x7b-32768"),
tts=CartesiaTTS(api_key="your-cartesia-key", voice_id="your-voice-id"),
system_prompt="You are a helpful assistant.",
)
async with pipeline:
# Simple: process complete audio
response_audio = await pipeline.process(input_audio_bytes)
# Streaming: lower latency
async for chunk in pipeline.stream(audio_stream):
play_audio(chunk)
WebSocket Server
from audio_engine import Pipeline
from audio_engine.streaming import WebSocketServer
pipeline = Pipeline(asr=..., llm=..., tts=...)
server = WebSocketServer(pipeline, host="0.0.0.0", port=8765)
await server.start()
With GeneFace++ Face Animation
from audio_engine.integrations.geneface import GeneFacePipelineWrapper, GeneFaceConfig
wrapped = GeneFacePipelineWrapper(
pipeline=pipeline,
geneface_config=GeneFaceConfig(
geneface_path="/path/to/ai-geneface-realtime"
)
)
audio, video_path = await wrapped.process_with_video(input_audio)
Architecture
User Audio → ASR → LLM → TTS → Response Audio
↓
GeneFace++ (optional)
↓
Animated Face Video
Directory Structure
audio_engine/
├── core/ # Pipeline and configuration
├── asr/ # Speech-to-Text providers
├── llm/ # LLM providers
├── tts/ # Text-to-Speech providers
├── streaming/ # WebSocket server
├── integrations/ # GeneFace++ integration
├── utils/ # Audio utilities
└── examples/ # Example scripts
Implementing a Provider
Custom ASR
from audio_engine.asr.base import BaseASR
class MyASR(BaseASR):
@property
def name(self) -> str:
return "my-asr"
async def transcribe(self, audio: bytes, sample_rate: int = 16000) -> str:
# Your implementation
pass
async def transcribe_stream(self, audio_stream):
# Your streaming implementation
pass
Custom LLM
from audio_engine.llm.base import BaseLLM
class MyLLM(BaseLLM):
@property
def name(self) -> str:
return "my-llm"
async def generate(self, prompt: str, context=None) -> str:
# Your implementation
pass
async def generate_stream(self, prompt: str, context=None):
# Your streaming implementation
pass
Custom TTS
from audio_engine.tts.base import BaseTTS
class MyTTS(BaseTTS):
@property
def name(self) -> str:
return "my-tts"
async def synthesize(self, text: str) -> bytes:
# Your implementation
pass
async def synthesize_stream(self, text: str):
# Your streaming implementation
pass
WebSocket Protocol
Client → Server
- Binary: Raw audio chunks (PCM 16-bit, 16kHz mono)
- JSON:
{"type": "end_of_speech"}or{"type": "reset"}
Server → Client
- Binary: Response audio chunks
- JSON Events:
{"type": "connected", "client_id": "..."}{"type": "transcript", "text": "..."}{"type": "response_text", "text": "..."}{"type": "response_start"}{"type": "response_end"}
Environment Variables
# ASR
ASR_PROVIDER=whisper
ASR_API_KEY=your-key
# LLM
LLM_PROVIDER=anthropic
LLM_API_KEY=your-key
LLM_MODEL=claude-sonnet-4-20250514
# TTS
TTS_PROVIDER=cartesia
TTS_API_KEY=your-key
TTS_VOICE_ID=your-voice-id
# Debug
DEBUG=true
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file atom_audio_engine-0.1.1.tar.gz.
File metadata
- Download URL: atom_audio_engine-0.1.1.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cead44a16494d1c6d2de85f76455f8c2eb02bc9193850a9957d8ecbdbbbe853b
|
|
| MD5 |
86011765855d1bd96f302eba98a994d8
|
|
| BLAKE2b-256 |
3205ffa7d122043161b9e9fb2343f4bef86b718a752f051a08edbd5e02f8706f
|
File details
Details for the file atom_audio_engine-0.1.1-py3-none-any.whl.
File metadata
- Download URL: atom_audio_engine-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1c3fc3bdc4ccb4e5a23a07b151bee32543b004138c070eeb5a5bf97172b5aa4
|
|
| MD5 |
38dd8e09fe60447c9c95534393618dc5
|
|
| BLAKE2b-256 |
9919ca865a2dd41b695cf7e473275fcef785f796f59adb4c9493a1cb20e4b5ef
|