Skip to main content

Provider-agnostic speech stack for speech-to-speech applications

Project description

Converse Framework

Provider-agnostic speech stack for speech-to-speech applications.

Table of Contents

Install

pip install converse-framework

The base install pulls in only numpy. Real VAD / ASR / LLM / TTS providers live behind optional extras:

pip install converse-framework[silero]          # Silero VAD
pip install converse-framework[faster-whisper]  # faster-whisper ASR
pip install converse-framework[whisper-cpp]     # whisper.cpp HTTP ASR
pip install converse-framework[llamacpp]        # llama.cpp HTTP LLM
pip install converse-framework[kokoro]          # Kokoro ONNX TTS
pip install converse-framework[pocket-tts]      # Pocket TTS
pip install converse-framework[all]             # everything

Missing dependency behavior

If a config requests a provider whose heavy backend is not installed, build_provider (and therefore build_provider_bundle) returns an UnavailableProvider sentinel for that slot instead of raising a bare ImportError. The sentinel's status.message always names the provider that was missing and includes the pip install extra to fix it. The mapping is owned by converse_framework.providers.unavailable.EXTRA_HINTS and exposed as extra_hint_for(kind, name), which returns the extra name (e.g. "converse-framework[silero]") when one is known and None otherwise.

from converse_framework import extra_hint_for
from converse_framework.providers.unavailable import UnavailableProvider

print(extra_hint_for("vad", "silero"))          # converse-framework[silero]
print(extra_hint_for("asr", "faster-whisper"))  # converse-framework[faster-whisper]
print(extra_hint_for("vad", "made-up"))         # None

p = UnavailableProvider("vad", "silero")
print(p.status.message)
# Provider 'silero' (vad) is not available. Install the required extra
# with `pip install converse-framework[silero]`.

is_provider_available(kind, name) is the companion check: it returns True only when the provider's heavy dependency is importable, so you can fail fast before handing the config to a pipeline. UnavailableProvider is a real implementation of all four provider protocols, so the rest of the pipeline keeps running (turns fail with a clear RuntimeError when the broken provider is actually invoked) and the consumer can decide whether to prompt for the install or fall back to a different provider.

Python version compatibility

The base package supports Python 3.11 and newer. Each extra has its own constraints (the table below mirrors the markers in pyproject.toml):

Extra Python Notes
base 3.11+ numpy>=2.0 is the only required runtime dependency.
silero 3.11+ silero-vad + onnxruntime. No known upper bound.
faster-whisper 3.11+ The nvidia-cublas-cu12 wheel pins Windows.
llamacpp 3.11+ httpx itself supports 3.9+, so 3.11+ is the only constraint.
whisper-cpp 3.11+ Only needs httpx, which supports 3.9+.
kokoro 3.11 to <3.14 kokoro-onnx 0.5.0 requires Python <3.14. The wheel build fails fast on 3.14+.
pocket-tts 3.11+ No known upper bound.

The kokoro extra is the only one with an upper-bound marker today. If you are on Python 3.14+ and need a TTS provider, use pocket-tts or a mock provider. New providers should add their own python_version markers in pyproject.toml when their backend has a known limit.

Quick Start

from converse_framework import build_provider_bundle

config = {
    "vad": {"provider": "mock"},
    "asr": {"provider": "mock"},
    "llm": {"provider": "mock"},
    "tts": {"provider": "mock"},
}

bundle = build_provider_bundle(config)
print(bundle.statuses())

import converse_framework only needs numpy to be installed — heavy provider backends are loaded lazily through the registry.

Provider status semantics

Every provider exposes a status property (cached state, no I/O), a lightweight probe_status() method (import checks, HTTP reachability — does not load models), and a load_status() method (may load or initialise heavy resources before returning).

Call probe_status() to check readiness without side effects — it is safe for status screens and health checks:

import asyncio

# Probe without loading models
results = asyncio.run(bundle.probe_statuses())
for kind, status in results.items():
    print(f"{kind}: ready={status.ready} level={status.status_level}")
    if status.voices:
        print(f"  voices={[v.id for v in status.voices]}")

Call load_status() when you need the definitive picture — it may trigger model downloads or initialise GPU resources:

results = asyncio.run(bundle.load_statuses())

The status_level field distinguishes "ready", "configured", "loading", "error", and "unavailable". The old check_status() is kept for backward compatibility and behaves the same as probe_status() for providers that implement it.

Recipes

The recipes below are short, self-contained scripts that exercise the public API. They all run with the base install (numpy + the framework) unless a snippet is explicitly fenced as requires the \` extra`.

Minimal mock text pipeline

build_provider_bundle returns a fully-mock provider bundle and SpeechPipeline runs an end-to-end text turn against it. QueueEventSink captures every event the pipeline emits so the script can assert or print them.

import asyncio

from converse_framework import (
    PipelineConfig,
    QueueEventSink,
    SpeechPipeline,
    build_provider_bundle,
)


async def main():
    queue: asyncio.Queue = asyncio.Queue()
    sink = QueueEventSink(queue)
    pipeline = SpeechPipeline(
        providers=build_provider_bundle(
            {
                "vad": {"provider": "mock"},
                "asr": {"provider": "mock"},
                "llm": {"provider": "mock"},
                "tts": {"provider": "mock"},
            }
        ),
        sink=sink,
        config=PipelineConfig(tts_chunk_chars=80),
    )

    await pipeline.handle_text_turn("Hello, mock pipeline.")
    # Let the TTS streaming task finish, then drain the captured events.
    await asyncio.sleep(0.5)
    types = [queue.get_nowait()["type"] for _ in range(queue.qsize())]
    print(types)


asyncio.run(main())

Audio frame to utterance collector to pipeline

parse_audio_frame validates a wire payload and turns it into an AudioFrame. AudioUtteranceCollector runs VAD on the frame, applies the rejection gates, and on vad.speech_end hands the assembled PCM bytes to its utterance_callback. The recipe wires that callback into SpeechPipeline.handle_audio_turn. The in-process VAD below fires vad.speech_start on the first frame and vad.speech_end on the third so the collector has something to dispatch — the framework's own MockVADProvider returns no events and is not useful for this path.

import asyncio
import base64

from converse_framework.audio_utils import AudioFrameStats, parse_audio_frame
from converse_framework.events import QueueEventSink
from converse_framework.pipeline import PipelineConfig, SpeechPipeline
from converse_framework.protocols import (
    ProviderCapabilities,
    ProviderStatus,
    VADEvent,
)
from converse_framework.registry import build_provider_bundle
from converse_framework.utterance_collector import (
    AudioUtteranceCollector,
    UtteranceCollectorConfig,
)


class ScriptedVAD:
    """A tiny in-process VAD: start on frame 0, end on frame 2."""

    def __init__(self) -> None:
        self._count = 0

    @property
    def status(self) -> ProviderStatus:
        return ProviderStatus(
            name="scripted",
            kind="vad",
            ready=True,
            message="Scripted VAD fires start at frame 0 and end at frame 2.",
            capabilities=ProviderCapabilities(),
        )

    async def check_status(self) -> ProviderStatus:
        return self.status

    async def process_frame(self, frame):
        self._count += 1
        events: list[VADEvent] = []
        if self._count == 1:
            events.append(VADEvent(type="vad.speech_start", probability=1.0, audio_ms=30))
        if self._count == 3:
            events.append(VADEvent(type="vad.speech_end", probability=1.0, audio_ms=90))
        return events


async def main():
    queue: asyncio.Queue = asyncio.Queue()
    sink = QueueEventSink(queue)
    bundle = build_provider_bundle(
        {
            "vad": {"provider": "mock"},
            "asr": {"provider": "mock"},
            "llm": {"provider": "mock"},
            "tts": {"provider": "mock"},
        }
    )
    pipeline = SpeechPipeline(providers=bundle, sink=sink, config=PipelineConfig(tts_chunk_chars=80))

    cfg = UtteranceCollectorConfig(
        sample_rate=16000,
        channels=1,
        frame_ms=30,
        # Disable the rejection gates -- this recipe shows the wiring
        # from frame to pipeline, not the collector's silence handling.
        min_speech_duration_ms=0,
        reject_low_energy_rms=0,
        reject_utterance_rms=0,
        trim_silence_rms=0,
    )
    stats = AudioFrameStats(
        expected_sample_rate=16000,
        expected_channels=1,
        expected_frame_ms=30,
    )

    async def on_utterance(pcm: bytes, sample_rate: int, mode: str) -> None:
        await pipeline.handle_audio_turn(pcm, sample_rate, mode=mode)

    collector = AudioUtteranceCollector(
        vad_provider=ScriptedVAD(),
        event_sink=sink,
        utterance_callback=on_utterance,
        config=cfg,
    )

    # Three 30 ms frames of silence (16 kHz mono -> 480 samples -> 960 bytes).
    silence = base64.b64encode(b"\x00\x00" * 480).decode("ascii")
    for seq in range(3):
        frame = parse_audio_frame(
            {
                "data": silence,
                "sample_rate": 16000,
                "channels": 1,
                "frame_ms": 30,
                "sequence": seq,
                "encoding": "pcm_s16le",
            },
            stats,
        )
        await collector.ingest_frame(frame)

    await pipeline.cancel_tts("done")
    await asyncio.sleep(0.3)
    types = [queue.get_nowait()["type"] for _ in range(queue.qsize())]
    print(types)


asyncio.run(main())

Custom provider registration

register_provider adds a new (kind, name) pair to the registry by import string. build_provider_bundle then resolves the name on demand and instantiates the class. is_provider_available is the companion probe — it returns True only when the underlying module can be imported, which is the safe check before handing the config to a pipeline. The recipe points the new name at the framework's own mock VAD so it runs against the base install; replace the import string with your own my_pkg.providers:MyVADProvider to register a real implementation.

from converse_framework.registry import (
    build_provider_bundle,
    is_provider_available,
    register_provider,
)

# Register a custom VAD name. Replace the import string with your own
# `my_pkg.providers:MyVADProvider` to wire up a real implementation.
register_provider(
    "vad",
    "my-vad",
    "converse_framework.providers.mock:MockVADProvider",
)

bundle = build_provider_bundle(
    {
        "vad": {"provider": "my-vad"},
        "asr": {"provider": "mock"},
        "llm": {"provider": "mock"},
        "tts": {"provider": "mock"},
    }
)
print(bundle.vad.status.provider_id)        # "mock" (the registered class)
print(is_provider_available("vad", "my-vad"))  # True

Custom event sink

SpeechPipeline accepts any EventSink subclass. The recipe prints each event as it fires, which is handy when you are wiring up a new transport and want to see the wire shape without standing up a queue.

import asyncio

from converse_framework import (
    EventSink,
    PipelineConfig,
    SpeechPipeline,
    build_provider_bundle,
)


class PrintSink(EventSink):
    """Minimal sink that prints each event as it fires."""

    async def emit(self, event_type, **payload):
        keys = ", ".join(payload) or "-"
        print(f"[event] {event_type} ({keys})")


async def main():
    sink = PrintSink()
    pipeline = SpeechPipeline(
        providers=build_provider_bundle(
            {
                "vad": {"provider": "mock"},
                "asr": {"provider": "mock"},
                "llm": {"provider": "mock"},
                "tts": {"provider": "mock"},
            }
        ),
        sink=sink,
        config=PipelineConfig(tts_chunk_chars=80),
    )
    await pipeline.handle_text_turn("Hello, custom sink.")
    # Let the TTS streaming task finish before the loop exits.
    await asyncio.sleep(0.5)


asyncio.run(main())

Browser playback (JS reference client)

The framework ships a vanilla JavaScript / Web Audio reference client at converse_framework/js/tts-audio-player.js that turns the framework's tts.audio events into sound without bundling a build step. It builds AudioBuffers directly from PCM s16le bytes (avoiding decodeAudioData on tiny chunks) and coalesces consecutive events within a short window before scheduling, which is the same fix that resolved Pocket TTS choppiness in the reference harness.

<script src="converse_framework/js/tts-audio-player.js"></script>
<script>
  const player = new TtsAudioPlayer({ coalesceMs: 80 });
  ws.addEventListener('message', (ev) => {
    const event = JSON.parse(ev.data);
    if (event.type === 'tts.audio') player.onEvent(event);
  });
  // when the conversation ends:
  player.close();
</script>

The reference client handles the most common case (mono / stereo PCM s16le with explicit sample rate, channels, and final flag) and ignores anything that is not pcm_s16le with a console warning. Drop the file into your static assets directory; no npm / bundler required.

Browser microphone capture (JS reference client)

The framework ships a vanilla JavaScript microphone capture class at converse_framework/js/mic-frame-sender.js. It uses getUserMedia and an AudioWorkletNode (with inline blob-URL processor, falling back to ScriptProcessorNode) to deliver 16-bit PCM s16le frames at a configurable interval:

<script src="converse_framework/js/mic-frame-sender.js"></script>
<script>
  const ws = new WebSocket("ws://localhost:8000/ws");
  const mic = new MicFrameSender({
    webSocket: ws,
    sampleRate: 16000,
    channels: 1,
    frameMs: 30,
    onLevel: (db) => console.log("mic level", db.toFixed(1)),
  });
  mic.start(); // begins capture after user gesture
</script>

A composed client at converse_framework/js/browser-voice-client.js combines MicFrameSender, TtsAudioPlayer, and an optional SpeakerEchoGuard (see converse_framework/js/speaker-echo-guard.js) into a single class with automatic WebSocket event dispatch.

Mobile microphone access requires additional HTTPS / tunnel setup (see next section).

Mobile Browser Microphone Testing

Browser microphone capture (via getUserMedia) requires a secure context — HTTPS, localhost, or 127.0.0.1. This is not a framework limitation; it is a browser security requirement.

Local desktop developmentlocalhost is always considered secure. A plain ws://localhost:8000/ws works with no extra setup.

Same-LAN testing (desktop) — also works, because ws://<lan-ip>/ws is accepted by desktop browsers for WebSocket.send() (it is the getUserMedia call that checks the page context, not the WebSocket itself). Serve the HTML page itself via HTTPS to keep mobile browsers happy (see below).

Mobile device on same LAN — a plain http://<lan-ip> page will be rejected by mobile browsers when calling getUserMedia. You need either a tunnel that provides HTTPS or a local trusted certificate.


Option 1 — Cloudflare Tunnel (recommended for testing)

  1. Install cloudflared (winget install cloudflare.cloudflared on Windows, brew install cloudflare/cloudflare/cloudflared on macOS, or download from the Cloudflare Zero Trust dashboard).
  2. Start your server on port 8000:
    uvicorn converse_framework.examples.websocket_voice_chat:create_app --factory
    
  3. Run the tunnel:
    cloudflared tunnel --url http://localhost:8000
    
  4. Cloudflare prints a public https://<random>.trycloudflare.com URL.
  5. Open that URL on your mobile device. Change the WebSocket URL in your client to wss://<random>.trycloudflare.com/ws.

Option 2 — ngrok

  1. Install ngrok from https://ngrok.com/download.
  2. Start your server on port 8000.
  3. Tunnel:
    ngrok http 8000
    
  4. Use the generated https://<random>.ngrok-free.app URL.
  5. WebSocket URL: wss://<random>.ngrok-free.app/ws.

Option 3 — Local trusted certificate (advanced)

Use mkcert to create a trusted CA-signed cert for your LAN IP::

# Install mkcert once
brew install mkcert  # macOS
winget install mkcert  # Windows (or scoop install mkcert)
mkcert -install

# Create a cert for your LAN IP, e.g. 192.168.1.42
mkcert 192.168.1.42 localhost 127.0.0.1

# Run uvicorn with the generated key/cert files
uvicorn converse_framework.examples.websocket_voice_chat:create_app --factory \
    --ssl-keyfile ./192.168.1.42-key.pem \
    --ssl-certfile ./192.168.1.42.pem

The page and WebSocket are now served over https://192.168.1.42:8000 and wss://192.168.1.42:8000/ws respectively. The mkcert root CA must be installed on the mobile device (see mkcert docs for Android /iOS instructions).


Summary of WebSocket URL forms

Scenario Page URL WebSocket URL
Desktop localhost http://localhost:8000 ws://localhost:8000/ws
Desktop same LAN http://<lan-ip>:8000 ws://<lan-ip>:8000/ws
Mobile via tunnel https://<tunnel>/ wss://<tunnel>/ws
Mobile via local cert https://<lan-ip>:8000 wss://<lan-ip>:8000/ws

Wrap an external CLI as a provider

When the engine you want to use is only available as a CLI binary (whisper-cli, whisper.cpp/main, the Vosk CLI, …), the framework's converse_framework.examples.subprocess_provider shows the pattern. The class shells out to a configured binary, writes a WAV header followed by the caller's PCM s16le body to the subprocess's stdin, and yields the subprocess's stdout as a single final transcript event.

from converse_framework.examples.subprocess_provider import (
    SubprocessASRProvider,
)

provider = SubprocessASRProvider({
    "binary": "whisper-cli",
    "model": "ggml-small.en.bin",
    "command_template": ["-m", "{model}", "-f", "-"],
    "timeout_s": 120,
})
# Then plug it into a ProviderBundle:
from converse_framework.registry import build_provider_bundle
bundle = build_provider_bundle(
    {
        "vad": {"provider": "mock"},
        "asr": {"provider": "subprocess"},   # see note below
        "llm": {"provider": "mock"},
        "tts": {"provider": "mock"},
    },
)

SubprocessASRProvider is shipped as a recipe (not a registered provider) because it is generic: copy the class, point it at your binary of choice, and register it with register_provider("asr", "my-name", "my.module:MySubprocessProvider"). The example also ships a fake-echo script (--use-fake-echo) that lets the driver run end-to-end in CI without installing any real ASR.

Pocket TTS voice listing and configuration

Pocket TTS supports listing available voices and changing voice or other options at runtime via :meth:TTSProvider.configure (introduced in protocol v0.2). All variants return a :class:ProviderConfigResult with changed and requires_reload flags.

List voices without importing the heavy ONNX backend:

from converse_framework.providers.pocket_tts import PocketTTSProvider

provider = PocketTTSProvider({"voice": "azelma"})
voices = provider.list_voices()
for v in voices:
    print(f"{v.id}: {v.name} ({v.gender}, {v.language})")
    # e.g. "azelma: Azelma (Female, en)"

Change voice (clears only the voice cache, preserves the loaded model):

result = provider.configure(voice="anna")
print(result.changed, result.requires_reload)
# True, False — model stays, voice state reloaded

Change quantization or temperature (clears both model and voice, requiring a full reload on next synthesis):

result = provider.configure(quantize=True)
print(result.requires_reload)
# True — both _model and _voice_state cleared

Change max_tokens or coalesce_ms without unloading:

result = provider.configure(max_tokens=250, coalesce_ms=120)
print(result.requires_reload)
# False — values stored, no cache invalidated

ProviderBundle.replace() and pipeline.update_providers() (see the Runtime Provider Updates section) work with any TTS provider including Pocket TTS.

CUDA DLL helper (Windows)

On Windows, NVIDIA wheel packages like nvidia-cublas-cu12 install DLLs under site-packages/nvidia/<package>/bin/, but C extension libraries such as CTranslate2 may not search those directories automatically. The framework ships a CUDA DLL discovery helper at converse_framework/cuda_utils.py that finds them and adds them to the DLL search path.

from converse_framework.cuda_utils import (
    add_nvidia_dll_directories,
    discover_nvidia_dll_dirs,
    format_nvidia_dll_diagnostic,
)

# Add all discovered NVIDIA DLL directories to the search path.
# Keep the handles alive for the lifetime of the process.
dll_handles = add_nvidia_dll_directories()

# Print a diagnostic string for debugging:
print(format_nvidia_dll_diagnostic())

The helper searches nvidia/cublas/bin, nvidia/cudnn/bin, nvidia/cusparse/bin, nvidia/cusolver/bin, and nvidia/curand/bin inside site-packages. It is Windows-only (no-op on other platforms) and best-effort — failures are logged, not raised.

FasterWhisperASRProvider calls add_nvidia_dll_directories() automatically inside _ensure_model() when the config option auto_cuda_dll_dirs is True (the default). Disable with:

provider = FasterWhisperASRProvider({
    "model": "large-v3-turbo",
    "device": "cuda",
    "auto_cuda_dll_dirs": False,  # disable auto-discovery
})

Runtime Provider Updates

The framework supports swapping providers at runtime without recreating the pipeline or collector. This is useful for settings UIs that let users change TTS voice, VAD model, or ASR backend without restarting the conversation.

ProviderBundle.replace()

:meth:ProviderBundle.replace creates a new bundle with specific providers swapped out by keyword argument, inheriting the rest from the original bundle. It is a no-side-effect, no-copy operation — the caller owns the lifecycle of the old providers.

from converse_framework import build_provider_bundle, build_provider

bundle = build_provider_bundle({
    "vad": {"provider": "mock"},
    "asr": {"provider": "mock"},
    "llm": {"provider": "mock"},
    "tts": {"provider": "mock"},
})

new_tts = build_provider("tts", "mock", {"first_chunk_delay_ms": 500})
new_bundle = bundle.replace(tts=new_tts)
# new_bundle.tts is the new provider; vad/asr/llm are unchanged.
# bundle is unaffected.

Multiple providers can be replaced at once:

replaced = bundle.replace(vad=new_vad, tts=new_tts)

ProviderBundle.unload_replaced()

:meth:ProviderBundle.unload_replaced compares two bundles by identity and calls unload() on every provider that differs. Providers with the same identity reference are left untouched.

old_bundle = build_provider_bundle(config)
new_bundle = old_bundle.replace(tts=new_tts)
await ProviderBundle.unload_replaced(old_bundle, new_bundle)

SpeechPipeline.update_providers()

:meth:SpeechPipeline.update_providers is the safe way to swap providers on an active pipeline. It cancels in-flight TTS synthesis by default (so the next turn picks up the new provider), swaps the bundle, and emits a providers.updated event with the serialized statuses of the new bundle. Conversation history is not cleared.

from converse_framework import (
    PipelineConfig, QueueEventSink, SpeechPipeline,
    build_provider_bundle,
)

queue = asyncio.Queue()
pipeline = SpeechPipeline(
    providers=build_provider_bundle(initial_config),
    sink=QueueEventSink(queue),
    config=PipelineConfig(),
)

new_bundle = build_provider_bundle(updated_config)
await pipeline.update_providers(new_bundle, reason="settings_change")
# pipeline.providers is now new_bundle
# TTS was cancelled if it was playing
# providers.updated event was emitted

AudioUtteranceCollector.update_vad_provider()

:meth:AudioUtteranceCollector.update_vad_provider swaps the VAD provider that drives utterance boundary detection. It raises :class:RuntimeError if the collector is currently recording an utterance to avoid corrupting in-flight VAD state. The pre-speech buffer is cleared on swap so stale audio from the old VAD is not passed to the new one.

new_vad = SileroVADProvider({"speech_threshold": 0.6})
collector.update_vad_provider(new_vad)

End-to-end pattern

A typical settings-update flow combines all the pieces:

# 1. Build the new bundle
new_bundle = bundle.replace(tts=new_tts)

# 2. Probe without loading models
probe_results = await new_bundle.probe_statuses()

# 3. On user confirmation, swap in the pipeline
await pipeline.update_providers(new_bundle)

# 4. Swap the VAD in the collector (separate because the
#    collector and pipeline are independent components)
if "vad" in updated:
    collector.update_vad_provider(new_bundle.vad)

# 5. Old providers are unloaded in the background by
#    pipeline.update_providers().

WebSocket Session Helper

The framework provides a reusable :class:WebSocketSession that handles the common message-dispatch loop for browser-based voice apps. It owns the transport, sink, provider bundle, pipeline, collector, and frame stats, and routes seven built-in message types without requiring the application to copy the recipe state machine.

Built-in message types:

  • audio.frame — validated PCM frame forwarded to the utterance collector.
  • text.turn — text conversation turn.
  • conversation.clear — clears per-mode conversation history.
  • tts.cancel — cancels in-flight TTS synthesis.
  • status.request — emits probe/check/load status (kind selected by the probe / check / load flag in the payload).
  • settings.update — delegated to an optional :class:WebSocketSessionHooks callback.
  • providers.reload — swaps the provider bundle and optionally reloads the VAD provider, with before / after hooks.

Unknown message types fall through to the optional on_unknown_message hook or emit a turn.error event.

Configuration and hooks are supplied via:

  • :class:WebSocketSessionConfig — provider config, collector config, pipeline config, default mode, auto-probe on reload.
  • :class:WebSocketSessionHooks — optional async callbacks for unknown messages, settings updates, status requests, provider reload lifecycle, and event monitoring.

The session class lives at converse_framework.session and is not imported from the top-level __init__.py to keep lightweight imports for apps that do not use it.

Usage sketch:

from converse_framework.session import (
    WebSocketSession,
    WebSocketSessionConfig,
    WebSocketSessionHooks,
)

hooks = WebSocketSessionHooks(
    on_settings_update=lambda cfg: print("settings updated", cfg),
    on_event=lambda ev: print("event", ev.type),
)
session = WebSocketSession(
    transport=your_transport,
    config=WebSocketSessionConfig(
        provider_config={"vad": {"provider": "mock"}, ...},
    ),
    hooks=hooks,
)

async for message in your_websocket:
    await session.handle_message(message)

Examples

Text chat (automated-test covered)

Run a real text conversation against SpeechPipeline using only the framework's public API. No FastAPI, no WebSocket, no profile files.

python -m converse_framework.examples.text_chat

Try a real provider by passing overrides (the matching extra must be installed):

python -m converse_framework.examples.text_chat \
    --provider asr=faster-whisper \
    --provider llm=llamacpp \
    --provider tts=kokoro

The driver behind the CLI is converse_framework.examples.text_chat.run_text_chat, which is what the test suite exercises.

Voice chat (manual)

The voice example wires an AudioUtteranceCollector to the pipeline and feeds it PCM frames. It is a manual example — you supply a WAV file (or replace the source with a microphone capture) and the script drives the conversation. It is intentionally not covered by the automated tests because it depends on platform audio I/O.

# With real providers installed
python -m converse_framework.examples.voice_chat --input path/to/16k_mono.wav

# Or run the same flow with mock providers to validate the path
python -m converse_framework.examples.voice_chat --mock --input path/to/16k_mono.wav

Framework / App Boundary

The framework owns the provider-agnostic speech stack:

  • Provider protocols (VADProvider, ASRProvider, LLMProvider, TTSProvider).
  • Audio frame parsing, PCM conversion, metering, and silence trimming.
  • Event sink API and the wire shape used by the browser UI.
  • SpeechPipeline turn orchestration (ASR → LLM → TTS, streaming chunks, cancellation, barge-in).
  • AudioUtteranceCollector (VAD-driven utterance collection).
  • A lazy provider registry and the optional concrete providers behind extras.
  • WebSocketSession (optional reusable message-dispatch loop).
  • Browser JS helpers (mic-frame-sender.js, speaker-echo-guard.js, browser-voice-client.js, tts-audio-player.js).
  • CUDA DLL discovery helper (cuda_utils).

As of v0.2 the framework also provides safe provider-swap mechanics (ProviderBundle.replace(), pipeline.update_providers(), collector.update_vad_provider()), first-class provider configuration (configure(), list_voices()), and lifecycle events (provider.loading, provider.loaded, provider.error).

The framework does not own the application. The following stay in the consumer app (e.g. the reference harness):

  • FastAPI app, REST endpoints, WebSocket handler.
  • Profile files and runtime settings persistence.
  • Character card parsing and first-message seeding.
  • Companion mode policy and memory store.
  • TTS preset manager and provider settings UX.
  • The WebSocket transport itself.

Transport boundary

The framework defines a generic Transport protocol and ships a QueueTransport for tests. The consumer app owns the real WebSocket transport — WebSocketTransport (or equivalent) lives in the app, not in the framework, so the framework never takes a hard dependency on FastAPI. The reference harness exposes conversational_harness.transport.WebSocketTransport for that purpose.

Status

The package is in v0.1 pre-release. The test matrix below is the current contract:

Surface Tests
converse_framework (base) 126
Reference harness (Reference-Repository-Conversational-AI-Harness) 91 passed, 1 skipped

Run them locally:

# Framework (run from the package root)
python -m pytest

# Harness (run from inside the harness directory)
python -m pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

converse_framework-0.2.2.tar.gz (140.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

converse_framework-0.2.2-py3-none-any.whl (96.3 kB view details)

Uploaded Python 3

File details

Details for the file converse_framework-0.2.2.tar.gz.

File metadata

  • Download URL: converse_framework-0.2.2.tar.gz
  • Upload date:
  • Size: 140.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for converse_framework-0.2.2.tar.gz
Algorithm Hash digest
SHA256 245f4fad4c19c36328b622a98f802c3c414cf01f35522b6bb41c41e0c409cdbd
MD5 23e504f96d0bc8c19d39e7a1b6eeab50
BLAKE2b-256 6ba08510146f7bfdf91a95e182bb2b154603f5ce12657a64afff1ca734210929

See more details on using hashes here.

Provenance

The following attestation bundles were made for converse_framework-0.2.2.tar.gz:

Publisher: publish.yml on thomas9120/Converse-Framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file converse_framework-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for converse_framework-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6b02fac608b3b6c4a7b26621c99d7376880cd7f970a313ed49d6ce3abb82f948
MD5 a47918e0112ca4ce8792e65a917bb301
BLAKE2b-256 7c593a5eb9bc1209c89d54f1474b33549913d30642c3c190383b361e708a843f

See more details on using hashes here.

Provenance

The following attestation bundles were made for converse_framework-0.2.2-py3-none-any.whl:

Publisher: publish.yml on thomas9120/Converse-Framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page