Voice/text runner for FlowStore v0 specs — Pipecat-based dispatcher with browser audio bridge.
Project description
flowstore-runner
Voice/text runner for FlowStore v0 specs. Pipecat-based dispatcher with self-contained browser test pages for both modes.
Plan and rationale: RUNNER-PLAN.md. Strategy: ../whatsupp2/STRATEGY.md. Editor integration is wired through the /api/chat/* endpoints below.
Status
- Phase 0 ✅ — bare audio loop end-to-end (Pipecat WebRTC + Google STT/LLM/TTS).
- Phase 1 ✅ — v0 dispatcher: spec-driven flow interpretation, three-method routing (
direct/calculation/llm), interrupts withgoto: "RETURN", post-exit assigns + capability dispatch, event emission. - Phase 1.5 ✅ — text I/O adapter:
/api/chat/{session,turn,end}endpoints, BYOK or env-fallback auth,context_varsfor placeholder substitution + initial variable seeding. - Phase 2 (text path) ✅ — editor canvas highlighting from event stream.
FlowNoderings the active flow; edges pulse onexit_path_taken; variables stream into the simulate panel. Wired through HTTP (Phase 1.5's/api/chat/*), not SSE. Voice-mode → editor integration (SSE broker,/api/offerevent subscription) deferred until a voice consumer needs it. - Session event log persistence ✅ — set
UXFLOWS_EVENT_LOG_DIRand every session writes a{session_id}.jsonlfile alongside the live emitter. Per-session file = one file replay. Voice + text both honor it. - Voice-mode parity with text ✅ — silent-take_exit follow-up (tool-less re-inference when Gemini fires a routing tool with no spoken text) and
context_varsonPOST /api/offer(mirrors/api/chat/session).
Setup
- Install uv if not already installed.
- Sync dependencies:
uv sync - Auth — pick one (or both):
- Vertex (default, used by voice mode and text-mode env-fallback): drop your GCP service-account JSON at
data/credentials.json. Thedata/directory is gitignored. Service account needs Vertex AI User (Gemini), Cloud Speech Client (STT), Cloud Text-to-Speech User (TTS). - AI Studio (BYOK, text mode only): get a key from https://aistudio.google.com/app/apikey. No setup file — passed per-session via the API.
- Vertex (default, used by voice mode and text-mode env-fallback): drop your GCP service-account JSON at
- Copy the env template and fill in your project ID:
cp .env.example .env $EDITOR .env
Run
uv run flowstore-runner serve
Then pick the mode you want:
| Mode | URL | What it does |
|---|---|---|
| Voice | http://localhost:8000/ | Real-time voice via WebRTC. Mic in, TTS out. Full dispatcher loop. |
| Text | http://localhost:8000/text.html | Text chat against the dispatcher. No audio, no STT/TTS. Same routing/state/events as voice. |
| Bare audio | http://localhost:8000/audio-test.html | Audio loop with a hardcoded prompt — no spec, no dispatcher. Use to debug voice/STT/VAD when something feels off. |
curl localhost:8000/health returns {"ok": true} once the server has loaded.
Quick sanity test (text, headless)
SPEC=$(cat examples/coffee.json)
curl -s -X POST http://localhost:8000/api/chat/session \
-H "Content-Type: application/json" \
-d "{\"spec\": $SPEC}" | python3 -m json.tool
You should get back a session_id, an opening agent turn, and session_started + flow_entered events. Send a turn:
curl -s -X POST http://localhost:8000/api/chat/turn \
-H "Content-Type: application/json" \
-d '{"session_id":"<paste-id>","user_text":"i'"'"'d like a latte"}' | python3 -m json.tool
API surface
Three endpoints power the text mode (canonical implementations in server/app.py):
POST /api/chat/session— start a session. Body:{spec, api_key?, model?, language?, context_vars?}. Returns{session_id, agent_text, events, ended}.api_keyfalls back to env Vertex creds when omitted;context_varsseeds the variable bag and substitutes{KEY}placeholders in the system prompt (case-insensitive; unfilled placeholders stay as{KEY}literal).POST /api/chat/turn— send a user turn. Body:{session_id, user_text}. Returns{agent_text, events, ended}.POST /api/chat/end— explicit cleanup. Idle sessions get GC'd after 30 min regardless.
Voice uses WebRTC SDP exchange via /api/offer (Pipecat's SmallWebRTCRequestHandler).
Layout
src/flowstore_runner/
cli.py # `flowstore-runner serve`
config.py # env-driven config
spec/
types.py, loader.py # pydantic v0 spec types + per-spec lookup tables
dispatcher/
flow_state.py # stack-based FlowState (interrupts push/pop) + variable bag
expressions.py # calculation engine (simpleeval-based)
methods.py # three-method evaluator (direct / calculation / llm)
routing.py # plan() + resolve() — exit-path eval, interrupt collection
assigns.py # exit-fired variable assignment
capabilities.py # HTTP fire-and-forget + retrieval stub
prompt_builder.py # per-flow system prompt + per-turn tool schema + {KEY} substitution
processor.py # Pipecat seam — PreLLMPlanner, tool handlers, PostLLMResolver, apply_tool_call
session.py # per-connection Session (FlowState + LLMContext + emitter)
events/
schema.py # pydantic event types — runner-side contract
emitter.py # LoggingEmitter / QueueEmitter / BufferingEmitter
server/
app.py # FastAPI: /api/offer (voice), /api/chat/* (text), static /web mount
pipeline.py # Pipecat voice pipeline
pipeline_raw.py # Bare audio pipeline (no spec/dispatcher)
text_session.py # Text adapter — drives dispatcher per-turn without Pipecat audio
text_registry.py # In-memory TextSession registry + idle GC
web/
index.html, client.js # voice debug page (vanilla RTCPeerConnection)
text.html, text.js, text.css # text debug page (vanilla fetch)
audio-test.html, audio-test.js # bare audio debug page
style.css
examples/
coffee.json # self-contained order-bot spec
data/
credentials.json # GCP service-account JSON (gitignored)
tests/ # 95 passing — dispatcher core + text-mode e2e
Why three debug pages
Each page isolates a layer:
- voice page (
/) — full stack, the canonical voice surface. - text page (
/text.html) — same dispatcher as voice, no audio. Use when you want to iterate on a spec's logic without burning STT/TTS minutes, when you're on a flaky mic, or when reviewing flow transitions visually beats listening. Real designers use this through the editor's Simulate panel (../flowstore/components/runtime/SimulatePanel.tsx); the standalone page is for runner-side dev work. - bare audio page (
/audio-test.html) — no spec, no dispatcher, hardcoded prompt. First-pass triage for "is the audio path or the dispatcher misbehaving?" If the issue reproduces here, it's audio.
All three stay around as debug surfaces forever. They don't disappear when the editor integration lands.
Known rough edges
- Walkaway gap — Gemini sometimes responds to a graceful goodbye with text only, no
take_exit_path. Session stays "live" until idle GC. Documented in RUNNER-PLAN §"Live-test follow-up". Click Reset (or close the tab) to recover. - Single user, single host — v0 deployment model. Concurrent sessions across one process should work but are untested at scale.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flowstore_runner-0.1.0.tar.gz.
File metadata
- Download URL: flowstore_runner-0.1.0.tar.gz
- Upload date:
- Size: 49.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09880128820fe3cf59bce4a9ed52a9b655e3fab76ba6636aa8157d978a988fae
|
|
| MD5 |
5ec407441e8e8090b7061e83ef5afa46
|
|
| BLAKE2b-256 |
4f658d0c34497c36371c06f9a1c24275d351adf741956a831f71aea842e1c78b
|
File details
Details for the file flowstore_runner-0.1.0-py3-none-any.whl.
File metadata
- Download URL: flowstore_runner-0.1.0-py3-none-any.whl
- Upload date:
- Size: 63.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f3be03f9524034288dc1a3342539e2bc2f563a5c3e4c4d993a3f62eb4a5a392
|
|
| MD5 |
ba1cc1d6e14a75ab7edc90d306da0c68
|
|
| BLAKE2b-256 |
0a70aa58a16f97bd70d442ed0c1f438d3aa82e61e91ae2b11096bd10ef44eec5
|