Skip to main content

A pluggable, async-first Python framework for real-time audio-to-audio conversational AI

Project description

Audio Engine

A pluggable audio-to-audio conversational engine with real-time streaming support.

Features

  • Pluggable Architecture: Swap ASR, LLM, and TTS providers easily
  • Real-time Streaming: WebSocket server for low-latency conversations
  • GeneFace++ Integration: Optional face animation from audio
  • Simple API: Get started with just a few lines of code

Installation

pip install atom-audio-engine

For development with all optional dependencies:

pip install atom-audio-engine[all,dev]

Quick Start

Basic Usage

from audio_engine import Pipeline
from audio_engine.asr import WhisperASR
from audio_engine.llm import AnthropicLLM
from audio_engine.tts import CartesiaTTS

# Create pipeline with your providers
pipeline = Pipeline(
    asr=CartesiaASR(api_key="your-cartesia-key"),
    llm=GroqLLM(api_key="your-groq-key", model="mixtral-8x7b-32768"),
    tts=CartesiaTTS(api_key="your-cartesia-key", voice_id="your-voice-id"),
    system_prompt="You are a helpful assistant.",
)

async with pipeline:
    # Simple: process complete audio
    response_audio = await pipeline.process(input_audio_bytes)

    # Streaming: lower latency
    async for chunk in pipeline.stream(audio_stream):
        play_audio(chunk)

WebSocket Server

from audio_engine import Pipeline
from audio_engine.streaming import WebSocketServer

pipeline = Pipeline(asr=..., llm=..., tts=...)
server = WebSocketServer(pipeline, host="0.0.0.0", port=8765)

await server.start()

With GeneFace++ Face Animation

from audio_engine.integrations.geneface import GeneFacePipelineWrapper, GeneFaceConfig

wrapped = GeneFacePipelineWrapper(
    pipeline=pipeline,
    geneface_config=GeneFaceConfig(
        geneface_path="/path/to/ai-geneface-realtime"
    )
)

audio, video_path = await wrapped.process_with_video(input_audio)

Architecture

User Audio → ASR → LLM → TTS → Response Audio
                           ↓
                    GeneFace++ (optional)
                           ↓
                    Animated Face Video

Directory Structure

audio_engine/
├── core/           # Pipeline and configuration
├── asr/            # Speech-to-Text providers
├── llm/            # LLM providers
├── tts/            # Text-to-Speech providers
├── streaming/      # WebSocket server
├── integrations/   # GeneFace++ integration
├── utils/          # Audio utilities
└── examples/       # Example scripts

Implementing a Provider

Custom ASR

from audio_engine.asr.base import BaseASR

class MyASR(BaseASR):
    @property
    def name(self) -> str:
        return "my-asr"

    async def transcribe(self, audio: bytes, sample_rate: int = 16000) -> str:
        # Your implementation
        pass

    async def transcribe_stream(self, audio_stream):
        # Your streaming implementation
        pass

Custom LLM

from audio_engine.llm.base import BaseLLM

class MyLLM(BaseLLM):
    @property
    def name(self) -> str:
        return "my-llm"

    async def generate(self, prompt: str, context=None) -> str:
        # Your implementation
        pass

    async def generate_stream(self, prompt: str, context=None):
        # Your streaming implementation
        pass

Custom TTS

from audio_engine.tts.base import BaseTTS

class MyTTS(BaseTTS):
    @property
    def name(self) -> str:
        return "my-tts"

    async def synthesize(self, text: str) -> bytes:
        # Your implementation
        pass

    async def synthesize_stream(self, text: str):
        # Your streaming implementation
        pass

WebSocket Protocol

Client → Server

  • Binary: Raw audio chunks (PCM 16-bit, 16kHz mono)
  • JSON: {"type": "end_of_speech"} or {"type": "reset"}

Server → Client

  • Binary: Response audio chunks
  • JSON Events:
    • {"type": "connected", "client_id": "..."}
    • {"type": "transcript", "text": "..."}
    • {"type": "response_text", "text": "..."}
    • {"type": "response_start"}
    • {"type": "response_end"}

Environment Variables

# ASR
ASR_PROVIDER=whisper
ASR_API_KEY=your-key

# LLM
LLM_PROVIDER=anthropic
LLM_API_KEY=your-key
LLM_MODEL=claude-sonnet-4-20250514

# TTS
TTS_PROVIDER=cartesia
TTS_API_KEY=your-key
TTS_VOICE_ID=your-voice-id

# Debug
DEBUG=true

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atom_audio_engine-0.1.5.tar.gz (33.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atom_audio_engine-0.1.5-py3-none-any.whl (45.3 kB view details)

Uploaded Python 3

File details

Details for the file atom_audio_engine-0.1.5.tar.gz.

File metadata

  • Download URL: atom_audio_engine-0.1.5.tar.gz
  • Upload date:
  • Size: 33.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for atom_audio_engine-0.1.5.tar.gz
Algorithm Hash digest
SHA256 79457f0a47907943e21e580c789038dec4baf249d71d96e2a7e9bf565580b0ed
MD5 a4ffa9f30b5a2535ffb08718664823b7
BLAKE2b-256 15d5ae1b94d6b3609499da515f20d48757591dbc3845c45d61716322f4d85b12

See more details on using hashes here.

File details

Details for the file atom_audio_engine-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for atom_audio_engine-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e2026c69704791d87dc67990de547523f63d8541639c6d229f720342a0ec4158
MD5 83ff73dd48c801123e296c5db8b27d80
BLAKE2b-256 ad3deaa31510a31150d661203b0ca4a1e2c8546d3f14bee64c2eb8e054a6aee8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page