Skip to main content

A pluggable, async-first Python framework for real-time audio-to-audio conversational AI

Project description

Audio Engine

A pluggable audio-to-audio conversational engine with real-time streaming support.

Features

  • Pluggable Architecture: Swap ASR, LLM, and TTS providers easily
  • Real-time Streaming: WebSocket server for low-latency conversations
  • GeneFace++ Integration: Optional face animation from audio
  • Simple API: Get started with just a few lines of code

Installation

pip install atom-audio-engine

For development with all optional dependencies:

pip install atom-audio-engine[all,dev]

Quick Start

Basic Usage

from audio_engine import Pipeline
from audio_engine.asr import WhisperASR
from audio_engine.llm import AnthropicLLM
from audio_engine.tts import CartesiaTTS

# Create pipeline with your providers
pipeline = Pipeline(
    asr=CartesiaASR(api_key="your-cartesia-key"),
    llm=GroqLLM(api_key="your-groq-key", model="mixtral-8x7b-32768"),
    tts=CartesiaTTS(api_key="your-cartesia-key", voice_id="your-voice-id"),
    system_prompt="You are a helpful assistant.",
)

async with pipeline:
    # Simple: process complete audio
    response_audio = await pipeline.process(input_audio_bytes)

    # Streaming: lower latency
    async for chunk in pipeline.stream(audio_stream):
        play_audio(chunk)

WebSocket Server

from audio_engine import Pipeline
from audio_engine.streaming import WebSocketServer

pipeline = Pipeline(asr=..., llm=..., tts=...)
server = WebSocketServer(pipeline, host="0.0.0.0", port=8765)

await server.start()

With GeneFace++ Face Animation

from audio_engine.integrations.geneface import GeneFacePipelineWrapper, GeneFaceConfig

wrapped = GeneFacePipelineWrapper(
    pipeline=pipeline,
    geneface_config=GeneFaceConfig(
        geneface_path="/path/to/ai-geneface-realtime"
    )
)

audio, video_path = await wrapped.process_with_video(input_audio)

Architecture

User Audio → ASR → LLM → TTS → Response Audio
                           ↓
                    GeneFace++ (optional)
                           ↓
                    Animated Face Video

Directory Structure

audio_engine/
├── core/           # Pipeline and configuration
├── asr/            # Speech-to-Text providers
├── llm/            # LLM providers
├── tts/            # Text-to-Speech providers
├── streaming/      # WebSocket server
├── integrations/   # GeneFace++ integration
├── utils/          # Audio utilities
└── examples/       # Example scripts

Implementing a Provider

Custom ASR

from audio_engine.asr.base import BaseASR

class MyASR(BaseASR):
    @property
    def name(self) -> str:
        return "my-asr"

    async def transcribe(self, audio: bytes, sample_rate: int = 16000) -> str:
        # Your implementation
        pass

    async def transcribe_stream(self, audio_stream):
        # Your streaming implementation
        pass

Custom LLM

from audio_engine.llm.base import BaseLLM

class MyLLM(BaseLLM):
    @property
    def name(self) -> str:
        return "my-llm"

    async def generate(self, prompt: str, context=None) -> str:
        # Your implementation
        pass

    async def generate_stream(self, prompt: str, context=None):
        # Your streaming implementation
        pass

Custom TTS

from audio_engine.tts.base import BaseTTS

class MyTTS(BaseTTS):
    @property
    def name(self) -> str:
        return "my-tts"

    async def synthesize(self, text: str) -> bytes:
        # Your implementation
        pass

    async def synthesize_stream(self, text: str):
        # Your streaming implementation
        pass

WebSocket Protocol

Client → Server

  • Binary: Raw audio chunks (PCM 16-bit, 16kHz mono)
  • JSON: {"type": "end_of_speech"} or {"type": "reset"}

Server → Client

  • Binary: Response audio chunks
  • JSON Events:
    • {"type": "connected", "client_id": "..."}
    • {"type": "transcript", "text": "..."}
    • {"type": "response_text", "text": "..."}
    • {"type": "response_start"}
    • {"type": "response_end"}

Environment Variables

# ASR
ASR_PROVIDER=whisper
ASR_API_KEY=your-key

# LLM
LLM_PROVIDER=anthropic
LLM_API_KEY=your-key
LLM_MODEL=claude-sonnet-4-20250514

# TTS
TTS_PROVIDER=cartesia
TTS_API_KEY=your-key
TTS_VOICE_ID=your-voice-id

# Debug
DEBUG=true

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atom_audio_engine-0.1.2.tar.gz (53.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atom_audio_engine-0.1.2-py3-none-any.whl (74.6 kB view details)

Uploaded Python 3

File details

Details for the file atom_audio_engine-0.1.2.tar.gz.

File metadata

  • Download URL: atom_audio_engine-0.1.2.tar.gz
  • Upload date:
  • Size: 53.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for atom_audio_engine-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f9ae918e21031bf0cc76ba3f9368f309cc0ffb1e046373da250f30212bec9b9a
MD5 c13ec843c1bf4ea5ed9c8f211c40fcc6
BLAKE2b-256 05db6bdd4196167335c85af93773d42ed65aab8b3c1cbb8352cdda5d5830d238

See more details on using hashes here.

File details

Details for the file atom_audio_engine-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for atom_audio_engine-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6e3e42d33ab39e63e3276aae3eb1ae22b875ebbf7bb2e09b280bf45434089a50
MD5 8570c49e55bf2c04f9db5c9032017d8c
BLAKE2b-256 d4b7e06b33da5af46b8f0fecad6d3501aab7252500eb140f3a23dac115d8e7bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page