A pluggable, async-first Python framework for real-time audio-to-audio conversational AI

These details have not been verified by PyPI

Project links

Project description

Audio Engine

A pluggable audio-to-audio conversational engine with real-time streaming support.

Features

Pluggable Architecture: Swap ASR, LLM, and TTS providers easily
Real-time Streaming: WebSocket server for low-latency conversations
GeneFace++ Integration: Optional face animation from audio
Simple API: Get started with just a few lines of code

Installation

pip install atom-audio-engine

For development with all optional dependencies:

pip install atom-audio-engine[all,dev]

Quick Start

Basic Usage

from audio_engine import Pipeline
from audio_engine.asr import WhisperASR
from audio_engine.llm import AnthropicLLM
from audio_engine.tts import CartesiaTTS

# Create pipeline with your providers
pipeline = Pipeline(
    asr=CartesiaASR(api_key="your-cartesia-key"),
    llm=GroqLLM(api_key="your-groq-key", model="mixtral-8x7b-32768"),
    tts=CartesiaTTS(api_key="your-cartesia-key", voice_id="your-voice-id"),
    system_prompt="You are a helpful assistant.",
)

async with pipeline:
    # Simple: process complete audio
    response_audio = await pipeline.process(input_audio_bytes)

    # Streaming: lower latency
    async for chunk in pipeline.stream(audio_stream):
        play_audio(chunk)

WebSocket Server

from audio_engine import Pipeline
from audio_engine.streaming import WebSocketServer

pipeline = Pipeline(asr=..., llm=..., tts=...)
server = WebSocketServer(pipeline, host="0.0.0.0", port=8765)

await server.start()

With GeneFace++ Face Animation

from audio_engine.integrations.geneface import GeneFacePipelineWrapper, GeneFaceConfig

wrapped = GeneFacePipelineWrapper(
    pipeline=pipeline,
    geneface_config=GeneFaceConfig(
        geneface_path="/path/to/ai-geneface-realtime"
    )
)

audio, video_path = await wrapped.process_with_video(input_audio)

Architecture

User Audio → ASR → LLM → TTS → Response Audio
                           ↓
                    GeneFace++ (optional)
                           ↓
                    Animated Face Video

Directory Structure

audio_engine/
├── core/           # Pipeline and configuration
├── asr/            # Speech-to-Text providers
├── llm/            # LLM providers
├── tts/            # Text-to-Speech providers
├── streaming/      # WebSocket server
├── integrations/   # GeneFace++ integration
├── utils/          # Audio utilities
└── examples/       # Example scripts

Implementing a Provider

Custom ASR

from audio_engine.asr.base import BaseASR

class MyASR(BaseASR):
    @property
    def name(self) -> str:
        return "my-asr"

    async def transcribe(self, audio: bytes, sample_rate: int = 16000) -> str:
        # Your implementation
        pass

    async def transcribe_stream(self, audio_stream):
        # Your streaming implementation
        pass

Custom LLM

from audio_engine.llm.base import BaseLLM

class MyLLM(BaseLLM):
    @property
    def name(self) -> str:
        return "my-llm"

    async def generate(self, prompt: str, context=None) -> str:
        # Your implementation
        pass

    async def generate_stream(self, prompt: str, context=None):
        # Your streaming implementation
        pass

Custom TTS

from audio_engine.tts.base import BaseTTS

class MyTTS(BaseTTS):
    @property
    def name(self) -> str:
        return "my-tts"

    async def synthesize(self, text: str) -> bytes:
        # Your implementation
        pass

    async def synthesize_stream(self, text: str):
        # Your streaming implementation
        pass

WebSocket Protocol

Client → Server

Binary: Raw audio chunks (PCM 16-bit, 16kHz mono)
JSON: {"type": "end_of_speech"} or {"type": "reset"}

Server → Client

Binary: Response audio chunks
JSON Events:
- {"type": "connected", "client_id": "..."}
- {"type": "transcript", "text": "..."}
- {"type": "response_text", "text": "..."}
- {"type": "response_start"}
- {"type": "response_end"}

Environment Variables

# ASR
ASR_PROVIDER=whisper
ASR_API_KEY=your-key

# LLM
LLM_PROVIDER=anthropic
LLM_API_KEY=your-key
LLM_MODEL=claude-sonnet-4-20250514

# TTS
TTS_PROVIDER=cartesia
TTS_API_KEY=your-key
TTS_VOICE_ID=your-voice-id

# Debug
DEBUG=true

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.6

Feb 6, 2026

This version

0.1.5

Feb 6, 2026

0.1.4

Feb 6, 2026

0.1.2

Feb 6, 2026

0.1.1

Feb 6, 2026

0.1.0

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atom_audio_engine-0.1.5.tar.gz (33.7 kB view details)

Uploaded Feb 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

atom_audio_engine-0.1.5-py3-none-any.whl (45.3 kB view details)

Uploaded Feb 6, 2026 Python 3

File details

Details for the file atom_audio_engine-0.1.5.tar.gz.

File metadata

Download URL: atom_audio_engine-0.1.5.tar.gz
Upload date: Feb 6, 2026
Size: 33.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for atom_audio_engine-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`79457f0a47907943e21e580c789038dec4baf249d71d96e2a7e9bf565580b0ed`
MD5	`a4ffa9f30b5a2535ffb08718664823b7`
BLAKE2b-256	`15d5ae1b94d6b3609499da515f20d48757591dbc3845c45d61716322f4d85b12`

See more details on using hashes here.

File details

Details for the file atom_audio_engine-0.1.5-py3-none-any.whl.

File metadata

Download URL: atom_audio_engine-0.1.5-py3-none-any.whl
Upload date: Feb 6, 2026
Size: 45.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for atom_audio_engine-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e2026c69704791d87dc67990de547523f63d8541639c6d229f720342a0ec4158`
MD5	`83ff73dd48c801123e296c5db8b27d80`
BLAKE2b-256	`ad3deaa31510a31150d661203b0ca4a1e2c8546d3f14bee64c2eb8e054a6aee8`

See more details on using hashes here.

atom-audio-engine 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Audio Engine

Features

Installation

Quick Start

Basic Usage

WebSocket Server

With GeneFace++ Face Animation

Architecture

Directory Structure

Implementing a Provider

Custom ASR

Custom LLM

Custom TTS

WebSocket Protocol

Client → Server

Server → Client

Environment Variables

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes