Skip to main content

A pluggable, async-first Python framework for real-time audio-to-audio conversational AI

Project description

Audio Engine

A pluggable audio-to-audio conversational engine with real-time streaming support.

Features

  • Pluggable Architecture: Swap ASR, LLM, and TTS providers easily
  • Real-time Streaming: WebSocket server for low-latency conversations
  • GeneFace++ Integration: Optional face animation from audio
  • Simple API: Get started with just a few lines of code

Installation

pip install atom-audio-engine

For development with all optional dependencies:

pip install atom-audio-engine[all,dev]

Quick Start

Basic Usage

from audio_engine import Pipeline
from audio_engine.asr import WhisperASR
from audio_engine.llm import AnthropicLLM
from audio_engine.tts import CartesiaTTS

# Create pipeline with your providers
pipeline = Pipeline(
    asr=CartesiaASR(api_key="your-cartesia-key"),
    llm=GroqLLM(api_key="your-groq-key", model="mixtral-8x7b-32768"),
    tts=CartesiaTTS(api_key="your-cartesia-key", voice_id="your-voice-id"),
    system_prompt="You are a helpful assistant.",
)

async with pipeline:
    # Simple: process complete audio
    response_audio = await pipeline.process(input_audio_bytes)

    # Streaming: lower latency
    async for chunk in pipeline.stream(audio_stream):
        play_audio(chunk)

WebSocket Server

from audio_engine import Pipeline
from audio_engine.streaming import WebSocketServer

pipeline = Pipeline(asr=..., llm=..., tts=...)
server = WebSocketServer(pipeline, host="0.0.0.0", port=8765)

await server.start()

With GeneFace++ Face Animation

from audio_engine.integrations.geneface import GeneFacePipelineWrapper, GeneFaceConfig

wrapped = GeneFacePipelineWrapper(
    pipeline=pipeline,
    geneface_config=GeneFaceConfig(
        geneface_path="/path/to/ai-geneface-realtime"
    )
)

audio, video_path = await wrapped.process_with_video(input_audio)

Architecture

User Audio → ASR → LLM → TTS → Response Audio
                           ↓
                    GeneFace++ (optional)
                           ↓
                    Animated Face Video

Directory Structure

audio_engine/
├── core/           # Pipeline and configuration
├── asr/            # Speech-to-Text providers
├── llm/            # LLM providers
├── tts/            # Text-to-Speech providers
├── streaming/      # WebSocket server
├── integrations/   # GeneFace++ integration
├── utils/          # Audio utilities
└── examples/       # Example scripts

Implementing a Provider

Custom ASR

from audio_engine.asr.base import BaseASR

class MyASR(BaseASR):
    @property
    def name(self) -> str:
        return "my-asr"

    async def transcribe(self, audio: bytes, sample_rate: int = 16000) -> str:
        # Your implementation
        pass

    async def transcribe_stream(self, audio_stream):
        # Your streaming implementation
        pass

Custom LLM

from audio_engine.llm.base import BaseLLM

class MyLLM(BaseLLM):
    @property
    def name(self) -> str:
        return "my-llm"

    async def generate(self, prompt: str, context=None) -> str:
        # Your implementation
        pass

    async def generate_stream(self, prompt: str, context=None):
        # Your streaming implementation
        pass

Custom TTS

from audio_engine.tts.base import BaseTTS

class MyTTS(BaseTTS):
    @property
    def name(self) -> str:
        return "my-tts"

    async def synthesize(self, text: str) -> bytes:
        # Your implementation
        pass

    async def synthesize_stream(self, text: str):
        # Your streaming implementation
        pass

WebSocket Protocol

Client → Server

  • Binary: Raw audio chunks (PCM 16-bit, 16kHz mono)
  • JSON: {"type": "end_of_speech"} or {"type": "reset"}

Server → Client

  • Binary: Response audio chunks
  • JSON Events:
    • {"type": "connected", "client_id": "..."}
    • {"type": "transcript", "text": "..."}
    • {"type": "response_text", "text": "..."}
    • {"type": "response_start"}
    • {"type": "response_end"}

Environment Variables

# ASR
ASR_PROVIDER=whisper
ASR_API_KEY=your-key

# LLM
LLM_PROVIDER=anthropic
LLM_API_KEY=your-key
LLM_MODEL=claude-sonnet-4-20250514

# TTS
TTS_PROVIDER=cartesia
TTS_API_KEY=your-key
TTS_VOICE_ID=your-voice-id

# Debug
DEBUG=true

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atom_audio_engine-0.1.4.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atom_audio_engine-0.1.4-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file atom_audio_engine-0.1.4.tar.gz.

File metadata

  • Download URL: atom_audio_engine-0.1.4.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for atom_audio_engine-0.1.4.tar.gz
Algorithm Hash digest
SHA256 a6ce019eb3a263994545ba0ab2b3b574086e6ba4d0a1188f3c210e1afec975a7
MD5 252bd6340161ea983a4f21dfc27b7604
BLAKE2b-256 0a1ece60eefc7c1f7283ef9e8db0aae7039e94c7e98ca22c8213a711f667269e

See more details on using hashes here.

File details

Details for the file atom_audio_engine-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for atom_audio_engine-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 199fad0e08e3a4e92cbef790300178e962ef8adebd9e2018f0296b0aee7b024e
MD5 6b763a803338778be5a9148475ee3c1d
BLAKE2b-256 b3e9a0d97d1182bd08456079876b9747dc0ee6bf0d4a170909a9f86b6934846e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page