Skip to main content

A pluggable, async-first Python framework for real-time audio-to-audio conversational AI

Project description

Audio Engine

A pluggable audio-to-audio conversational engine with real-time streaming support.

Features

  • Pluggable Architecture: Swap ASR, LLM, and TTS providers easily
  • Real-time Streaming: WebSocket server for low-latency conversations
  • GeneFace++ Integration: Optional face animation from audio
  • Simple API: Get started with just a few lines of code

Installation

pip install atom-audio-engine

For development with all optional dependencies:

pip install atom-audio-engine[all,dev]

Quick Start

Basic Usage

from audio_engine import Pipeline
from audio_engine.asr import WhisperASR
from audio_engine.llm import AnthropicLLM
from audio_engine.tts import CartesiaTTS

# Create pipeline with your providers
pipeline = Pipeline(
    asr=CartesiaASR(api_key="your-cartesia-key"),
    llm=GroqLLM(api_key="your-groq-key", model="mixtral-8x7b-32768"),
    tts=CartesiaTTS(api_key="your-cartesia-key", voice_id="your-voice-id"),
    system_prompt="You are a helpful assistant.",
)

async with pipeline:
    # Simple: process complete audio
    response_audio = await pipeline.process(input_audio_bytes)

    # Streaming: lower latency
    async for chunk in pipeline.stream(audio_stream):
        play_audio(chunk)

WebSocket Server

from audio_engine import Pipeline
from audio_engine.streaming import WebSocketServer

pipeline = Pipeline(asr=..., llm=..., tts=...)
server = WebSocketServer(pipeline, host="0.0.0.0", port=8765)

await server.start()

With GeneFace++ Face Animation

from audio_engine.integrations.geneface import GeneFacePipelineWrapper, GeneFaceConfig

wrapped = GeneFacePipelineWrapper(
    pipeline=pipeline,
    geneface_config=GeneFaceConfig(
        geneface_path="/path/to/ai-geneface-realtime"
    )
)

audio, video_path = await wrapped.process_with_video(input_audio)

Architecture

User Audio → ASR → LLM → TTS → Response Audio
                           ↓
                    GeneFace++ (optional)
                           ↓
                    Animated Face Video

Directory Structure

audio_engine/
├── core/           # Pipeline and configuration
├── asr/            # Speech-to-Text providers
├── llm/            # LLM providers
├── tts/            # Text-to-Speech providers
├── streaming/      # WebSocket server
├── integrations/   # GeneFace++ integration
├── utils/          # Audio utilities
└── examples/       # Example scripts

Implementing a Provider

Custom ASR

from audio_engine.asr.base import BaseASR

class MyASR(BaseASR):
    @property
    def name(self) -> str:
        return "my-asr"

    async def transcribe(self, audio: bytes, sample_rate: int = 16000) -> str:
        # Your implementation
        pass

    async def transcribe_stream(self, audio_stream):
        # Your streaming implementation
        pass

Custom LLM

from audio_engine.llm.base import BaseLLM

class MyLLM(BaseLLM):
    @property
    def name(self) -> str:
        return "my-llm"

    async def generate(self, prompt: str, context=None) -> str:
        # Your implementation
        pass

    async def generate_stream(self, prompt: str, context=None):
        # Your streaming implementation
        pass

Custom TTS

from audio_engine.tts.base import BaseTTS

class MyTTS(BaseTTS):
    @property
    def name(self) -> str:
        return "my-tts"

    async def synthesize(self, text: str) -> bytes:
        # Your implementation
        pass

    async def synthesize_stream(self, text: str):
        # Your streaming implementation
        pass

WebSocket Protocol

Client → Server

  • Binary: Raw audio chunks (PCM 16-bit, 16kHz mono)
  • JSON: {"type": "end_of_speech"} or {"type": "reset"}

Server → Client

  • Binary: Response audio chunks
  • JSON Events:
    • {"type": "connected", "client_id": "..."}
    • {"type": "transcript", "text": "..."}
    • {"type": "response_text", "text": "..."}
    • {"type": "response_start"}
    • {"type": "response_end"}

Environment Variables

# ASR
ASR_PROVIDER=whisper
ASR_API_KEY=your-key

# LLM
LLM_PROVIDER=anthropic
LLM_API_KEY=your-key
LLM_MODEL=claude-sonnet-4-20250514

# TTS
TTS_PROVIDER=cartesia
TTS_API_KEY=your-key
TTS_VOICE_ID=your-voice-id

# Debug
DEBUG=true

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atom_audio_engine-0.1.1.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atom_audio_engine-0.1.1-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file atom_audio_engine-0.1.1.tar.gz.

File metadata

  • Download URL: atom_audio_engine-0.1.1.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for atom_audio_engine-0.1.1.tar.gz
Algorithm Hash digest
SHA256 cead44a16494d1c6d2de85f76455f8c2eb02bc9193850a9957d8ecbdbbbe853b
MD5 86011765855d1bd96f302eba98a994d8
BLAKE2b-256 3205ffa7d122043161b9e9fb2343f4bef86b718a752f051a08edbd5e02f8706f

See more details on using hashes here.

File details

Details for the file atom_audio_engine-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for atom_audio_engine-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b1c3fc3bdc4ccb4e5a23a07b151bee32543b004138c070eeb5a5bf97172b5aa4
MD5 38dd8e09fe60447c9c95534393618dc5
BLAKE2b-256 9919ca865a2dd41b695cf7e473275fcef785f796f59adb4c9493a1cb20e4b5ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page