Provider-agnostic speech stack for speech-to-speech applications
Project description
Converse Framework
Provider-agnostic speech stack for speech-to-speech applications.
Table of Contents
- Install
- Quick Start
- Recipes
- Runtime Provider Updates
- WebSocket Session Helper
- Examples
- Framework / App Boundary
- Status
Install
pip install converse-framework
The base install pulls in only numpy. Real VAD / ASR / LLM / TTS
providers live behind optional extras:
pip install converse-framework[silero] # Silero VAD
pip install converse-framework[faster-whisper] # faster-whisper ASR
pip install converse-framework[whisper-cpp] # whisper.cpp HTTP ASR
pip install converse-framework[llamacpp] # llama.cpp HTTP LLM
pip install converse-framework[kokoro] # Kokoro ONNX TTS
pip install converse-framework[pocket-tts] # Pocket TTS
pip install converse-framework[all] # everything
Missing dependency behavior
If a config requests a provider whose heavy backend is not installed,
build_provider (and therefore build_provider_bundle) returns an
UnavailableProvider sentinel for that slot instead of raising a bare
ImportError. The sentinel's status.message always names the provider
that was missing and includes the pip install extra to fix it. The
mapping is owned by converse_framework.providers.unavailable.EXTRA_HINTS
and exposed as extra_hint_for(kind, name), which returns the extra
name (e.g. "converse-framework[silero]") when one is known and None
otherwise.
from converse_framework import extra_hint_for
from converse_framework.providers.unavailable import UnavailableProvider
print(extra_hint_for("vad", "silero")) # converse-framework[silero]
print(extra_hint_for("asr", "faster-whisper")) # converse-framework[faster-whisper]
print(extra_hint_for("vad", "made-up")) # None
p = UnavailableProvider("vad", "silero")
print(p.status.message)
# Provider 'silero' (vad) is not available. Install the required extra
# with `pip install converse-framework[silero]`.
is_provider_available(kind, name) is the companion check: it returns
True only when the provider's heavy dependency is importable, so you
can fail fast before handing the config to a pipeline. UnavailableProvider
is a real implementation of all four provider protocols, so the rest of
the pipeline keeps running (turns fail with a clear RuntimeError when
the broken provider is actually invoked) and the consumer can decide
whether to prompt for the install or fall back to a different provider.
Python version compatibility
The base package supports Python 3.11 and newer. Each extra has its
own constraints (the table below mirrors the markers in
pyproject.toml):
| Extra | Python | Notes |
|---|---|---|
| base | 3.11+ | numpy>=2.0 is the only required runtime dependency. |
silero |
3.11+ | silero-vad + onnxruntime. No known upper bound. |
faster-whisper |
3.11+ | The nvidia-cublas-cu12 wheel pins Windows. |
llamacpp |
3.11+ | httpx itself supports 3.9+, so 3.11+ is the only constraint. |
whisper-cpp |
3.11+ | Only needs httpx, which supports 3.9+. |
kokoro |
3.11 to <3.14 | kokoro-onnx 0.5.0 requires Python <3.14. The wheel build fails fast on 3.14+. |
pocket-tts |
3.11+ | No known upper bound. |
The kokoro extra is the only one with an upper-bound marker today.
If you are on Python 3.14+ and need a TTS provider, use pocket-tts
or a mock provider. New providers should add their own
python_version markers in pyproject.toml when their backend has a
known limit.
Quick Start
from converse_framework import build_provider_bundle
config = {
"vad": {"provider": "mock"},
"asr": {"provider": "mock"},
"llm": {"provider": "mock"},
"tts": {"provider": "mock"},
}
bundle = build_provider_bundle(config)
print(bundle.statuses())
import converse_framework only needs numpy to be installed — heavy
provider backends are loaded lazily through the registry.
Provider status semantics
Every provider exposes a status property (cached state, no I/O), a
lightweight probe_status() method (import checks, HTTP reachability
— does not load models), and a load_status() method (may load
or initialise heavy resources before returning).
Call probe_status() to check readiness without side effects — it
is safe for status screens and health checks:
import asyncio
# Probe without loading models
results = asyncio.run(bundle.probe_statuses())
for kind, status in results.items():
print(f"{kind}: ready={status.ready} level={status.status_level}")
if status.voices:
print(f" voices={[v.id for v in status.voices]}")
Call load_status() when you need the definitive picture — it may
trigger model downloads or initialise GPU resources:
results = asyncio.run(bundle.load_statuses())
The status_level field distinguishes "ready", "configured",
"loading", "error", and "unavailable". The old
check_status() is kept for backward compatibility and behaves
the same as probe_status() for providers that implement it.
Recipes
The recipes below are short, self-contained scripts that exercise the
public API. They all run with the base install (numpy + the framework)
unless a snippet is explicitly fenced as requires the \` extra`.
Minimal mock text pipeline
build_provider_bundle returns a fully-mock provider bundle and
SpeechPipeline runs an end-to-end text turn against it. QueueEventSink
captures every event the pipeline emits so the script can assert or
print them.
import asyncio
from converse_framework import (
PipelineConfig,
QueueEventSink,
SpeechPipeline,
build_provider_bundle,
)
async def main():
queue: asyncio.Queue = asyncio.Queue()
sink = QueueEventSink(queue)
pipeline = SpeechPipeline(
providers=build_provider_bundle(
{
"vad": {"provider": "mock"},
"asr": {"provider": "mock"},
"llm": {"provider": "mock"},
"tts": {"provider": "mock"},
}
),
sink=sink,
config=PipelineConfig(tts_chunk_chars=80),
)
await pipeline.handle_text_turn("Hello, mock pipeline.")
# Let the TTS streaming task finish, then drain the captured events.
await asyncio.sleep(0.5)
types = [queue.get_nowait()["type"] for _ in range(queue.qsize())]
print(types)
asyncio.run(main())
Audio frame to utterance collector to pipeline
parse_audio_frame validates a wire payload and turns it into an
AudioFrame. AudioUtteranceCollector runs VAD on the frame, applies
the rejection gates, and on vad.speech_end hands the assembled PCM
bytes to its utterance_callback. The recipe wires that callback into
SpeechPipeline.handle_audio_turn. The in-process VAD below fires
vad.speech_start on the first frame and vad.speech_end on the third
so the collector has something to dispatch — the framework's own
MockVADProvider returns no events and is not useful for this path.
import asyncio
import base64
from converse_framework.audio_utils import AudioFrameStats, parse_audio_frame
from converse_framework.events import QueueEventSink
from converse_framework.pipeline import PipelineConfig, SpeechPipeline
from converse_framework.protocols import (
ProviderCapabilities,
ProviderStatus,
VADEvent,
)
from converse_framework.registry import build_provider_bundle
from converse_framework.utterance_collector import (
AudioUtteranceCollector,
UtteranceCollectorConfig,
)
class ScriptedVAD:
"""A tiny in-process VAD: start on frame 0, end on frame 2."""
def __init__(self) -> None:
self._count = 0
@property
def status(self) -> ProviderStatus:
return ProviderStatus(
name="scripted",
kind="vad",
ready=True,
message="Scripted VAD fires start at frame 0 and end at frame 2.",
capabilities=ProviderCapabilities(),
)
async def check_status(self) -> ProviderStatus:
return self.status
async def process_frame(self, frame):
self._count += 1
events: list[VADEvent] = []
if self._count == 1:
events.append(VADEvent(type="vad.speech_start", probability=1.0, audio_ms=30))
if self._count == 3:
events.append(VADEvent(type="vad.speech_end", probability=1.0, audio_ms=90))
return events
async def main():
queue: asyncio.Queue = asyncio.Queue()
sink = QueueEventSink(queue)
bundle = build_provider_bundle(
{
"vad": {"provider": "mock"},
"asr": {"provider": "mock"},
"llm": {"provider": "mock"},
"tts": {"provider": "mock"},
}
)
pipeline = SpeechPipeline(providers=bundle, sink=sink, config=PipelineConfig(tts_chunk_chars=80))
cfg = UtteranceCollectorConfig(
sample_rate=16000,
channels=1,
frame_ms=30,
# Disable the rejection gates -- this recipe shows the wiring
# from frame to pipeline, not the collector's silence handling.
min_speech_duration_ms=0,
reject_low_energy_rms=0,
reject_utterance_rms=0,
trim_silence_rms=0,
)
stats = AudioFrameStats(
expected_sample_rate=16000,
expected_channels=1,
expected_frame_ms=30,
)
async def on_utterance(pcm: bytes, sample_rate: int, mode: str) -> None:
await pipeline.handle_audio_turn(pcm, sample_rate, mode=mode)
collector = AudioUtteranceCollector(
vad_provider=ScriptedVAD(),
event_sink=sink,
utterance_callback=on_utterance,
config=cfg,
)
# Three 30 ms frames of silence (16 kHz mono -> 480 samples -> 960 bytes).
silence = base64.b64encode(b"\x00\x00" * 480).decode("ascii")
for seq in range(3):
frame = parse_audio_frame(
{
"data": silence,
"sample_rate": 16000,
"channels": 1,
"frame_ms": 30,
"sequence": seq,
"encoding": "pcm_s16le",
},
stats,
)
await collector.ingest_frame(frame)
await pipeline.cancel_tts("done")
await asyncio.sleep(0.3)
types = [queue.get_nowait()["type"] for _ in range(queue.qsize())]
print(types)
asyncio.run(main())
Custom provider registration
register_provider adds a new (kind, name) pair to the registry by
import string. build_provider_bundle then resolves the name on demand
and instantiates the class. is_provider_available is the companion
probe — it returns True only when the underlying module can be
imported, which is the safe check before handing the config to a
pipeline. The recipe points the new name at the framework's own mock
VAD so it runs against the base install; replace the import string
with your own my_pkg.providers:MyVADProvider to register a real
implementation.
from converse_framework.registry import (
build_provider_bundle,
is_provider_available,
register_provider,
)
# Register a custom VAD name. Replace the import string with your own
# `my_pkg.providers:MyVADProvider` to wire up a real implementation.
register_provider(
"vad",
"my-vad",
"converse_framework.providers.mock:MockVADProvider",
)
bundle = build_provider_bundle(
{
"vad": {"provider": "my-vad"},
"asr": {"provider": "mock"},
"llm": {"provider": "mock"},
"tts": {"provider": "mock"},
}
)
print(bundle.vad.status.provider_id) # "mock" (the registered class)
print(is_provider_available("vad", "my-vad")) # True
Custom event sink
SpeechPipeline accepts any EventSink subclass. The recipe prints
each event as it fires, which is handy when you are wiring up a new
transport and want to see the wire shape without standing up a queue.
import asyncio
from converse_framework import (
EventSink,
PipelineConfig,
SpeechPipeline,
build_provider_bundle,
)
class PrintSink(EventSink):
"""Minimal sink that prints each event as it fires."""
async def emit(self, event_type, **payload):
keys = ", ".join(payload) or "-"
print(f"[event] {event_type} ({keys})")
async def main():
sink = PrintSink()
pipeline = SpeechPipeline(
providers=build_provider_bundle(
{
"vad": {"provider": "mock"},
"asr": {"provider": "mock"},
"llm": {"provider": "mock"},
"tts": {"provider": "mock"},
}
),
sink=sink,
config=PipelineConfig(tts_chunk_chars=80),
)
await pipeline.handle_text_turn("Hello, custom sink.")
# Let the TTS streaming task finish before the loop exits.
await asyncio.sleep(0.5)
asyncio.run(main())
Browser playback (JS reference client)
The framework ships a vanilla JavaScript / Web Audio reference client at
converse_framework/js/tts-audio-player.js that turns the framework's
tts.audio events into sound without bundling a build step. It builds
AudioBuffers directly from PCM s16le bytes (avoiding
decodeAudioData on tiny chunks) and coalesces consecutive events
within a short window before scheduling, which is the same fix that
resolved Pocket TTS choppiness in the reference harness.
<script src="converse_framework/js/tts-audio-player.js"></script>
<script>
const player = new TtsAudioPlayer({ coalesceMs: 80 });
ws.addEventListener('message', (ev) => {
const event = JSON.parse(ev.data);
if (event.type === 'tts.audio') player.onEvent(event);
});
// when the conversation ends:
player.close();
</script>
The reference client handles the most common case (mono / stereo PCM
s16le with explicit sample rate, channels, and final flag) and
ignores anything that is not pcm_s16le with a console warning. Drop
the file into your static assets directory; no npm / bundler required.
Browser microphone capture (JS reference client)
The framework ships a vanilla JavaScript microphone capture class at
converse_framework/js/mic-frame-sender.js. It uses getUserMedia and
an AudioWorkletNode (with inline blob-URL processor, falling back to
ScriptProcessorNode) to deliver 16-bit PCM s16le frames at a
configurable interval:
<script src="converse_framework/js/mic-frame-sender.js"></script>
<script>
const ws = new WebSocket("ws://localhost:8000/ws");
const mic = new MicFrameSender({
webSocket: ws,
sampleRate: 16000,
channels: 1,
frameMs: 30,
onLevel: (db) => console.log("mic level", db.toFixed(1)),
});
mic.start(); // begins capture after user gesture
</script>
A composed client at converse_framework/js/browser-voice-client.js
combines MicFrameSender, TtsAudioPlayer, and an optional
SpeakerEchoGuard (see converse_framework/js/speaker-echo-guard.js)
into a single class with automatic WebSocket event dispatch.
Mobile microphone access requires additional HTTPS / tunnel setup (see next section).
Mobile Browser Microphone Testing
Browser microphone capture (via getUserMedia) requires a secure
context — HTTPS, localhost, or 127.0.0.1. This is not a
framework limitation; it is a browser security requirement.
Local desktop development — localhost is always considered
secure. A plain ws://localhost:8000/ws works with no extra setup.
Same-LAN testing (desktop) — also works, because
ws://<lan-ip>/ws is accepted by desktop browsers for
WebSocket.send() (it is the getUserMedia call that checks the page
context, not the WebSocket itself). Serve the HTML page itself via
HTTPS to keep mobile browsers happy (see below).
Mobile device on same LAN — a plain http://<lan-ip> page will
be rejected by mobile browsers when calling getUserMedia. You need
either a tunnel that provides HTTPS or a local trusted certificate.
Option 1 — Cloudflare Tunnel (recommended for testing)
- Install
cloudflared(winget install cloudflare.cloudflaredon Windows,brew install cloudflare/cloudflare/cloudflaredon macOS, or download from the Cloudflare Zero Trust dashboard). - Start your server on port 8000:
uvicorn converse_framework.examples.websocket_voice_chat:create_app --factory
- Run the tunnel:
cloudflared tunnel --url http://localhost:8000
- Cloudflare prints a public
https://<random>.trycloudflare.comURL. - Open that URL on your mobile device. Change the WebSocket URL in
your client to
wss://<random>.trycloudflare.com/ws.
Option 2 — ngrok
- Install ngrok from https://ngrok.com/download.
- Start your server on port 8000.
- Tunnel:
ngrok http 8000
- Use the generated
https://<random>.ngrok-free.appURL. - WebSocket URL:
wss://<random>.ngrok-free.app/ws.
Option 3 — Local trusted certificate (advanced)
Use mkcert to create a trusted CA-signed cert for your LAN IP::
# Install mkcert once
brew install mkcert # macOS
winget install mkcert # Windows (or scoop install mkcert)
mkcert -install
# Create a cert for your LAN IP, e.g. 192.168.1.42
mkcert 192.168.1.42 localhost 127.0.0.1
# Run uvicorn with the generated key/cert files
uvicorn converse_framework.examples.websocket_voice_chat:create_app --factory \
--ssl-keyfile ./192.168.1.42-key.pem \
--ssl-certfile ./192.168.1.42.pem
The page and WebSocket are now served over https://192.168.1.42:8000
and wss://192.168.1.42:8000/ws respectively. The mkcert root CA
must be installed on the mobile device (see mkcert docs for Android
/iOS instructions).
Summary of WebSocket URL forms
| Scenario | Page URL | WebSocket URL |
|---|---|---|
| Desktop localhost | http://localhost:8000 |
ws://localhost:8000/ws |
| Desktop same LAN | http://<lan-ip>:8000 |
ws://<lan-ip>:8000/ws |
| Mobile via tunnel | https://<tunnel>/ |
wss://<tunnel>/ws |
| Mobile via local cert | https://<lan-ip>:8000 |
wss://<lan-ip>:8000/ws |
Wrap an external CLI as a provider
When the engine you want to use is only available as a CLI binary
(whisper-cli, whisper.cpp/main, the Vosk CLI, …), the framework's
converse_framework.examples.subprocess_provider shows the pattern.
The class shells out to a configured binary, writes a WAV header
followed by the caller's PCM s16le body to the subprocess's stdin,
and yields the subprocess's stdout as a single final transcript
event.
from converse_framework.examples.subprocess_provider import (
SubprocessASRProvider,
)
provider = SubprocessASRProvider({
"binary": "whisper-cli",
"model": "ggml-small.en.bin",
"command_template": ["-m", "{model}", "-f", "-"],
"timeout_s": 120,
})
# Then plug it into a ProviderBundle:
from converse_framework.registry import build_provider_bundle
bundle = build_provider_bundle(
{
"vad": {"provider": "mock"},
"asr": {"provider": "subprocess"}, # see note below
"llm": {"provider": "mock"},
"tts": {"provider": "mock"},
},
)
SubprocessASRProvider is shipped as a recipe (not a registered
provider) because it is generic: copy the class, point it at your
binary of choice, and register it with register_provider("asr", "my-name", "my.module:MySubprocessProvider"). The example also
ships a fake-echo script (--use-fake-echo) that lets the driver
run end-to-end in CI without installing any real ASR.
Pocket TTS voice listing and configuration
Pocket TTS supports listing available voices and changing voice or
other options at runtime via :meth:TTSProvider.configure (introduced
in protocol v0.2). All variants return a :class:ProviderConfigResult
with changed and requires_reload flags.
List voices without importing the heavy ONNX backend:
from converse_framework.providers.pocket_tts import PocketTTSProvider
provider = PocketTTSProvider({"voice": "azelma"})
voices = provider.list_voices()
for v in voices:
print(f"{v.id}: {v.name} ({v.gender}, {v.language})")
# e.g. "azelma: Azelma (Female, en)"
Change voice (clears only the voice cache, preserves the loaded model):
result = provider.configure(voice="anna")
print(result.changed, result.requires_reload)
# True, False — model stays, voice state reloaded
Change quantization or temperature (clears both model and voice, requiring a full reload on next synthesis):
result = provider.configure(quantize=True)
print(result.requires_reload)
# True — both _model and _voice_state cleared
Change max_tokens or coalesce_ms without unloading:
result = provider.configure(max_tokens=250, coalesce_ms=120)
print(result.requires_reload)
# False — values stored, no cache invalidated
ProviderBundle.replace() and pipeline.update_providers()
(see the Runtime Provider Updates section) work with any TTS
provider including Pocket TTS.
CUDA DLL helper (Windows)
On Windows, NVIDIA wheel packages like nvidia-cublas-cu12 install
DLLs under site-packages/nvidia/<package>/bin/, but C extension
libraries such as CTranslate2 may not search those directories
automatically. The framework ships a CUDA DLL discovery helper at
converse_framework/cuda_utils.py that finds them and adds them to
the DLL search path.
from converse_framework.cuda_utils import (
add_nvidia_dll_directories,
discover_nvidia_dll_dirs,
format_nvidia_dll_diagnostic,
)
# Add all discovered NVIDIA DLL directories to the search path.
# Keep the handles alive for the lifetime of the process.
dll_handles = add_nvidia_dll_directories()
# Print a diagnostic string for debugging:
print(format_nvidia_dll_diagnostic())
The helper searches nvidia/cublas/bin, nvidia/cudnn/bin,
nvidia/cusparse/bin, nvidia/cusolver/bin, and
nvidia/curand/bin inside site-packages. It is Windows-only
(no-op on other platforms) and best-effort — failures are logged,
not raised.
FasterWhisperASRProvider calls add_nvidia_dll_directories()
automatically inside _ensure_model() when the config option
auto_cuda_dll_dirs is True (the default). Disable with:
provider = FasterWhisperASRProvider({
"model": "large-v3-turbo",
"device": "cuda",
"auto_cuda_dll_dirs": False, # disable auto-discovery
})
Runtime Provider Updates
The framework supports swapping providers at runtime without recreating the pipeline or collector. This is useful for settings UIs that let users change TTS voice, VAD model, or ASR backend without restarting the conversation.
ProviderBundle.replace()
:meth:ProviderBundle.replace creates a new bundle with specific
providers swapped out by keyword argument, inheriting the rest
from the original bundle. It is a no-side-effect, no-copy operation
— the caller owns the lifecycle of the old providers.
from converse_framework import build_provider_bundle, build_provider
bundle = build_provider_bundle({
"vad": {"provider": "mock"},
"asr": {"provider": "mock"},
"llm": {"provider": "mock"},
"tts": {"provider": "mock"},
})
new_tts = build_provider("tts", "mock", {"first_chunk_delay_ms": 500})
new_bundle = bundle.replace(tts=new_tts)
# new_bundle.tts is the new provider; vad/asr/llm are unchanged.
# bundle is unaffected.
Multiple providers can be replaced at once:
replaced = bundle.replace(vad=new_vad, tts=new_tts)
ProviderBundle.unload_replaced()
:meth:ProviderBundle.unload_replaced compares two bundles by
identity and calls unload() on every provider that differs.
Providers with the same identity reference are left untouched.
old_bundle = build_provider_bundle(config)
new_bundle = old_bundle.replace(tts=new_tts)
await ProviderBundle.unload_replaced(old_bundle, new_bundle)
SpeechPipeline.update_providers()
:meth:SpeechPipeline.update_providers is the safe way to swap
providers on an active pipeline. It cancels in-flight TTS
synthesis by default (so the next turn picks up the new
provider), swaps the bundle, and emits a providers.updated
event with the serialized statuses of the new bundle.
Conversation history is not cleared.
from converse_framework import (
PipelineConfig, QueueEventSink, SpeechPipeline,
build_provider_bundle,
)
queue = asyncio.Queue()
pipeline = SpeechPipeline(
providers=build_provider_bundle(initial_config),
sink=QueueEventSink(queue),
config=PipelineConfig(),
)
new_bundle = build_provider_bundle(updated_config)
await pipeline.update_providers(new_bundle, reason="settings_change")
# pipeline.providers is now new_bundle
# TTS was cancelled if it was playing
# providers.updated event was emitted
AudioUtteranceCollector.update_vad_provider()
:meth:AudioUtteranceCollector.update_vad_provider swaps the VAD
provider that drives utterance boundary detection. It raises
:class:RuntimeError if the collector is currently recording an
utterance to avoid corrupting in-flight VAD state. The
pre-speech buffer is cleared on swap so stale audio from the old
VAD is not passed to the new one.
new_vad = SileroVADProvider({"speech_threshold": 0.6})
collector.update_vad_provider(new_vad)
End-to-end pattern
A typical settings-update flow combines all the pieces:
# 1. Build the new bundle
new_bundle = bundle.replace(tts=new_tts)
# 2. Probe without loading models
probe_results = await new_bundle.probe_statuses()
# 3. On user confirmation, swap in the pipeline
await pipeline.update_providers(new_bundle)
# 4. Swap the VAD in the collector (separate because the
# collector and pipeline are independent components)
if "vad" in updated:
collector.update_vad_provider(new_bundle.vad)
# 5. Old providers are unloaded in the background by
# pipeline.update_providers().
WebSocket Session Helper
The framework provides a reusable :class:WebSocketSession that
handles the common message-dispatch loop for browser-based voice apps.
It owns the transport, sink, provider bundle, pipeline, collector, and
frame stats, and routes seven built-in message types without requiring
the application to copy the recipe state machine.
Built-in message types:
audio.frame— validated PCM frame forwarded to the utterance collector.text.turn— text conversation turn.conversation.clear— clears per-mode conversation history.tts.cancel— cancels in-flight TTS synthesis.status.request— emits probe/check/load status (kind selected by theprobe/check/loadflag in the payload).settings.update— delegated to an optional :class:WebSocketSessionHookscallback.providers.reload— swaps the provider bundle and optionally reloads the VAD provider, withbefore/afterhooks.
Unknown message types fall through to the optional
on_unknown_message hook or emit a turn.error event.
Configuration and hooks are supplied via:
- :class:
WebSocketSessionConfig— provider config, collector config, pipeline config, default mode, auto-probe on reload. - :class:
WebSocketSessionHooks— optional async callbacks for unknown messages, settings updates, status requests, provider reload lifecycle, and event monitoring.
The session class lives at converse_framework.session and is not
imported from the top-level __init__.py to keep lightweight imports
for apps that do not use it.
Usage sketch:
from converse_framework.session import (
WebSocketSession,
WebSocketSessionConfig,
WebSocketSessionHooks,
)
hooks = WebSocketSessionHooks(
on_settings_update=lambda cfg: print("settings updated", cfg),
on_event=lambda ev: print("event", ev.type),
)
session = WebSocketSession(
transport=your_transport,
config=WebSocketSessionConfig(
provider_config={"vad": {"provider": "mock"}, ...},
),
hooks=hooks,
)
async for message in your_websocket:
await session.handle_message(message)
Examples
Text chat (automated-test covered)
Run a real text conversation against SpeechPipeline using only the
framework's public API. No FastAPI, no WebSocket, no profile files.
python -m converse_framework.examples.text_chat
Try a real provider by passing overrides (the matching extra must be installed):
python -m converse_framework.examples.text_chat \
--provider asr=faster-whisper \
--provider llm=llamacpp \
--provider tts=kokoro
The driver behind the CLI is converse_framework.examples.text_chat.run_text_chat,
which is what the test suite exercises.
Voice chat (manual)
The voice example wires an AudioUtteranceCollector to the pipeline
and feeds it PCM frames. It is a manual example — you supply a
WAV file (or replace the source with a microphone capture) and the
script drives the conversation. It is intentionally not covered by
the automated tests because it depends on platform audio I/O.
# With real providers installed
python -m converse_framework.examples.voice_chat --input path/to/16k_mono.wav
# Or run the same flow with mock providers to validate the path
python -m converse_framework.examples.voice_chat --mock --input path/to/16k_mono.wav
Framework / App Boundary
The framework owns the provider-agnostic speech stack:
- Provider protocols (
VADProvider,ASRProvider,LLMProvider,TTSProvider). - Audio frame parsing, PCM conversion, metering, and silence trimming.
- Event sink API and the wire shape used by the browser UI.
SpeechPipelineturn orchestration (ASR → LLM → TTS, streaming chunks, cancellation, barge-in).AudioUtteranceCollector(VAD-driven utterance collection).- A lazy provider registry and the optional concrete providers behind extras.
WebSocketSession(optional reusable message-dispatch loop).- Browser JS helpers (
mic-frame-sender.js,speaker-echo-guard.js,browser-voice-client.js,tts-audio-player.js). - CUDA DLL discovery helper (
cuda_utils).
As of v0.2 the framework also provides safe provider-swap mechanics
(ProviderBundle.replace(), pipeline.update_providers(),
collector.update_vad_provider()), first-class provider
configuration (configure(), list_voices()), and lifecycle
events (provider.loading, provider.loaded, provider.error).
The framework does not own the application. The following stay in the consumer app (e.g. the reference harness):
- FastAPI app, REST endpoints, WebSocket handler.
- Profile files and runtime settings persistence.
- Character card parsing and first-message seeding.
- Companion mode policy and memory store.
- TTS preset manager and provider settings UX.
- The WebSocket transport itself.
Transport boundary
The framework defines a generic Transport protocol and ships a
QueueTransport for tests. The consumer app owns the real
WebSocket transport — WebSocketTransport (or equivalent) lives in
the app, not in the framework, so the framework never takes a hard
dependency on FastAPI. The reference harness exposes
conversational_harness.transport.WebSocketTransport for that
purpose.
Status
The package is in v0.1 pre-release. The test matrix below is the current contract:
| Surface | Tests |
|---|---|
converse_framework (base) |
126 |
Reference harness (Reference-Repository-Conversational-AI-Harness) |
91 passed, 1 skipped |
Run them locally:
# Framework (run from the package root)
python -m pytest
# Harness (run from inside the harness directory)
python -m pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file converse_framework-0.2.2.tar.gz.
File metadata
- Download URL: converse_framework-0.2.2.tar.gz
- Upload date:
- Size: 140.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
245f4fad4c19c36328b622a98f802c3c414cf01f35522b6bb41c41e0c409cdbd
|
|
| MD5 |
23e504f96d0bc8c19d39e7a1b6eeab50
|
|
| BLAKE2b-256 |
6ba08510146f7bfdf91a95e182bb2b154603f5ce12657a64afff1ca734210929
|
Provenance
The following attestation bundles were made for converse_framework-0.2.2.tar.gz:
Publisher:
publish.yml on thomas9120/Converse-Framework
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
converse_framework-0.2.2.tar.gz -
Subject digest:
245f4fad4c19c36328b622a98f802c3c414cf01f35522b6bb41c41e0c409cdbd - Sigstore transparency entry: 1725193791
- Sigstore integration time:
-
Permalink:
thomas9120/Converse-Framework@a76c62e275e6acf6ac64f32b38c6399f7598816c -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/thomas9120
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a76c62e275e6acf6ac64f32b38c6399f7598816c -
Trigger Event:
release
-
Statement type:
File details
Details for the file converse_framework-0.2.2-py3-none-any.whl.
File metadata
- Download URL: converse_framework-0.2.2-py3-none-any.whl
- Upload date:
- Size: 96.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b02fac608b3b6c4a7b26621c99d7376880cd7f970a313ed49d6ce3abb82f948
|
|
| MD5 |
a47918e0112ca4ce8792e65a917bb301
|
|
| BLAKE2b-256 |
7c593a5eb9bc1209c89d54f1474b33549913d30642c3c190383b361e708a843f
|
Provenance
The following attestation bundles were made for converse_framework-0.2.2-py3-none-any.whl:
Publisher:
publish.yml on thomas9120/Converse-Framework
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
converse_framework-0.2.2-py3-none-any.whl -
Subject digest:
6b02fac608b3b6c4a7b26621c99d7376880cd7f970a313ed49d6ce3abb82f948 - Sigstore transparency entry: 1725193937
- Sigstore integration time:
-
Permalink:
thomas9120/Converse-Framework@a76c62e275e6acf6ac64f32b38c6399f7598816c -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/thomas9120
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a76c62e275e6acf6ac64f32b38c6399f7598816c -
Trigger Event:
release
-
Statement type: