Skip to main content

A pluggable, async-first Python framework for real-time audio-to-audio conversational AI

Project description

Audio Engine

A pluggable audio-to-audio conversational engine with real-time streaming support.

Features

  • Pluggable Architecture: Swap ASR, LLM, and TTS providers easily
  • Real-time Streaming: WebSocket server for low-latency conversations
  • GeneFace++ Integration: Optional face animation from audio
  • Simple API: Get started with just a few lines of code

Installation

pip install atom-audio-engine

For development with all optional dependencies:

pip install atom-audio-engine[all,dev]

Quick Start

Basic Usage

from audio_engine import Pipeline
from audio_engine.asr import WhisperASR
from audio_engine.llm import AnthropicLLM
from audio_engine.tts import CartesiaTTS

# Create pipeline with your providers
pipeline = Pipeline(
    asr=CartesiaASR(api_key="your-cartesia-key"),
    llm=GroqLLM(api_key="your-groq-key", model="mixtral-8x7b-32768"),
    tts=CartesiaTTS(api_key="your-cartesia-key", voice_id="your-voice-id"),
    system_prompt="You are a helpful assistant.",
)

async with pipeline:
    # Simple: process complete audio
    response_audio = await pipeline.process(input_audio_bytes)

    # Streaming: lower latency
    async for chunk in pipeline.stream(audio_stream):
        play_audio(chunk)

WebSocket Server

from audio_engine import Pipeline
from audio_engine.streaming import WebSocketServer

pipeline = Pipeline(asr=..., llm=..., tts=...)
server = WebSocketServer(pipeline, host="0.0.0.0", port=8765)

await server.start()

With GeneFace++ Face Animation

from audio_engine.integrations.geneface import GeneFacePipelineWrapper, GeneFaceConfig

wrapped = GeneFacePipelineWrapper(
    pipeline=pipeline,
    geneface_config=GeneFaceConfig(
        geneface_path="/path/to/ai-geneface-realtime"
    )
)

audio, video_path = await wrapped.process_with_video(input_audio)

Architecture

User Audio → ASR → LLM → TTS → Response Audio
                           ↓
                    GeneFace++ (optional)
                           ↓
                    Animated Face Video

Directory Structure

audio_engine/
├── core/           # Pipeline and configuration
├── asr/            # Speech-to-Text providers
├── llm/            # LLM providers
├── tts/            # Text-to-Speech providers
├── streaming/      # WebSocket server
├── integrations/   # GeneFace++ integration
├── utils/          # Audio utilities
└── examples/       # Example scripts

Implementing a Provider

Custom ASR

from audio_engine.asr.base import BaseASR

class MyASR(BaseASR):
    @property
    def name(self) -> str:
        return "my-asr"

    async def transcribe(self, audio: bytes, sample_rate: int = 16000) -> str:
        # Your implementation
        pass

    async def transcribe_stream(self, audio_stream):
        # Your streaming implementation
        pass

Custom LLM

from audio_engine.llm.base import BaseLLM

class MyLLM(BaseLLM):
    @property
    def name(self) -> str:
        return "my-llm"

    async def generate(self, prompt: str, context=None) -> str:
        # Your implementation
        pass

    async def generate_stream(self, prompt: str, context=None):
        # Your streaming implementation
        pass

Custom TTS

from audio_engine.tts.base import BaseTTS

class MyTTS(BaseTTS):
    @property
    def name(self) -> str:
        return "my-tts"

    async def synthesize(self, text: str) -> bytes:
        # Your implementation
        pass

    async def synthesize_stream(self, text: str):
        # Your streaming implementation
        pass

WebSocket Protocol

Client → Server

  • Binary: Raw audio chunks (PCM 16-bit, 16kHz mono)
  • JSON: {"type": "end_of_speech"} or {"type": "reset"}

Server → Client

  • Binary: Response audio chunks
  • JSON Events:
    • {"type": "connected", "client_id": "..."}
    • {"type": "transcript", "text": "..."}
    • {"type": "response_text", "text": "..."}
    • {"type": "response_start"}
    • {"type": "response_end"}

Environment Variables

# ASR
ASR_PROVIDER=whisper
ASR_API_KEY=your-key

# LLM
LLM_PROVIDER=anthropic
LLM_API_KEY=your-key
LLM_MODEL=claude-sonnet-4-20250514

# TTS
TTS_PROVIDER=cartesia
TTS_API_KEY=your-key
TTS_VOICE_ID=your-voice-id

# Debug
DEBUG=true

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atom_audio_engine-0.1.6.tar.gz (33.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atom_audio_engine-0.1.6-py3-none-any.whl (45.4 kB view details)

Uploaded Python 3

File details

Details for the file atom_audio_engine-0.1.6.tar.gz.

File metadata

  • Download URL: atom_audio_engine-0.1.6.tar.gz
  • Upload date:
  • Size: 33.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for atom_audio_engine-0.1.6.tar.gz
Algorithm Hash digest
SHA256 e963a3bdb628a4569ea7d4bb4dc019e9641301a85f2cfc058e0f3f24c282fd01
MD5 047a9e20c0bbb91144e0292663486e8d
BLAKE2b-256 417a5ca808430b2333e1047d2c49e4e42674b8ffb79755c8ef293430f3191a6c

See more details on using hashes here.

File details

Details for the file atom_audio_engine-0.1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for atom_audio_engine-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 e8a17a62bde2b297e7484ef0230cccd0a0da35b829bba37cd7ba9d4973a05d4d
MD5 802e913194fcc55fe36371f142af8a08
BLAKE2b-256 6fdf1331e1871c854cb88f7a887fdc9c55bc98b5bc3637c1f89089ce1948d7d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page