Voice-first narrator and control layer for LLM coding CLIs
Project description
Listen to Control — listen to your coding agent and step in by voice the moment it matters.
Coding agents now run for minutes at a time: reading files, editing code, calling tools. You either babysit the terminal the whole time, or you look away and miss the moment it goes in the wrong direction. Voice Copilot narrates what the agent is doing in short spoken updates and lets you cut in by voice the second something matters — so you keep control without staring at the output.
▶ Watch the 60-second demo · Website · Quickstart
Why this exists
Autonomous coding agents changed the loop. A single prompt can trigger minutes of work, and the useful signal — what it understood, what it changed, what risk it noticed — is buried in a fast-scrolling wall of text. Reading every line defeats the point of delegating; ignoring the terminal means you only find out about a bad turn after it happened.
Existing tools do not close this gap. Plain text-to-speech reads logs verbatim, which is noise, not signal. Voice-coding tools dictate prompts into the agent but tell you nothing about what it is doing. Voice Copilot sits in the opposite seat: it is a listening-first companion that watches the agent on your behalf, compresses the stream into short updates you can follow with your eyes off the screen, and keeps a path to interrupt by voice when you need to steer.
How it works
A separate lightweight commentator LLM (Haiku 4.5 by default) watches the coding agent's event stream and produces short summaries of what the agent appears to be doing, why, and what changed. This is parallel analysis and summarization — not verbatim playback of the model's hidden thinking.
coding agent ──► event stream ──► commentator LLM ──► spoken update ──► you
(Claude/Codex) (json / proxy) (short summary) (TTS) (listen · interrupt by voice)
You hear compressed, actionable updates instead of raw text, which lowers cognitive load while keeping the option to inspect the full trace and step in.
Quickstart (~3 minutes to first narration)
Goal: hear Voice Copilot narrate a real Claude Code session. You need Python 3.11+ and an Anthropic API key.
1. Install (cloud-light defaults, no local models required):
pipx install voice-copilot # or: uv tool install voice-copilot
2. Add your key. This opens settings in the browser — paste your
ANTHROPIC_API_KEY (stored in your OS keychain, not in a file):
voice-copilot serve
Expected: a tab opens at http://127.0.0.1:8765 with a settings page.
3. Wrap a real agent run:
voice-copilot run claude -p "fix the failing tests"
Expected: the popup shows a live trace, and within a few seconds you hear a short spoken summary of what Claude is doing. Click once in the popup if the browser blocks autoplay.
4. Talk back. Hold Alt+Space, ask a question or give a correction, release.
It goes to speech-to-text and into the agent as a side-question, a queued next
message, or the clipboard (depending on CLI support).
That's the full loop: listen → understand → interrupt by voice.
Who it's for
Vibe coders building by feel, with the agent doing most of the typing. Reading a fast wall of diffs and tool calls breaks the flow and the fun. Voice Copilot keeps you in the creative loop: you hear what the agent decided and changed in plain language, stay aware of the direction, and jump in by voice the moment it drifts — no need to parse the terminal to stay in control.
Professional engineers running long, autonomous agent sessions (Claude Code, Codex) on real codebases. The risk isn't typing speed, it's a confident wrong turn buried in minutes of output. Voice Copilot surfaces the signal — root cause, risk, next step — so you keep situational awareness while doing something else, and interrupt early instead of reviewing a large bad diff after the fact. Listening also lets you supervise more than one session without staring at every token.
Multitaskers and reviewers who delegate work and need to know when to step in, not read everything. Narration is the ambient channel: glance at the trace only when an update tells you it matters.
CLI authors who want their tool to expose a clean event stream for companion narration (see the integration RFC below).
Status: 0.0.3 alpha. First public alpha, aimed at advanced users comfortable testing CLI workflows and sharing feedback. Created by Volodymyr Moskvin, Conus Vision. We are open to collaboration — info@conus.vision.
What it does
- Wraps an LLM coding CLI (Claude Code, Codex CLI — more to come) and listens to its event stream in real time.
- A small commentator LLM (Haiku 4.5 by default) summarises decisions, file edits and reasoning in short human-voice lines.
- Keeps listening as the primary experience: hear what matters, read the trace when useful, and interrupt only when needed.
- A browser popup on localhost exposes Play/Pause/Mute/Speak/Interrupt buttons and settings.
- Push-to-talk (default
Alt+Space): your question goes to STT → then into the running agent as a native side-question, a queued next message, or to the clipboard (depending on CLI capability). - Pause the CLI while you talk (
Alt+Por auto-on-speak): the subprocess is suspended viapsutil, no races with the agent. - Works in English, Spanish, French, Ukrainian and Russian.
- Plug-in providers for TTS, STT and commentator LLM — run fully local or use cloud APIs.
What Voice Copilot is not
- Not a prompt dictation app or voice keyboard
- Not a replacement for Claude Code, Codex, Gemini CLI, or other coding agents
- Not just text-to-speech for raw terminal logs
- Not direct verbatim reading of the model's hidden thinking or full answer
- Not another chat UI you need to stare at all day
Install options
The Quickstart above uses the cloud-light default. To run models locally instead, install the matching extra:
# light default: cloud STT/TTS
pipx install voice-copilot
# + local TTS (Silero / Piper)
pipx install "voice-copilot[local-tts]"
# + local STT (faster-whisper)
pipx install "voice-copilot[local-stt]"
# everything local
pipx install "voice-copilot[all]"
Or with uv:
uv tool install voice-copilot
uvx voice-copilot run claude -- -p "refactor the auth module"
Codex works the same way: voice-copilot run codex -p "explain what this repo does".
Narrate any CLI via proxy mode
voice-copilot run <target> only knows claude and codex. For everything
else (aider, opencode, Cline, GitHub Copilot CLI that hits OpenAI/Anthropic),
run the proxy as a standalone service and point your CLI's BASE_URL at it:
voice-copilot proxy
# → prints ANTHROPIC_BASE_URL=http://127.0.0.1:8766/anthropic
# OPENAI_BASE_URL =http://127.0.0.1:8766/openai/v1
# ...and OpenRouter / Groq / Mistral / Ollama / Gemini
# in another terminal:
ANTHROPIC_BASE_URL=http://127.0.0.1:8766/anthropic \
aider --model anthropic/claude-3-5-sonnet-20241022
The popup shows one entry per connected client (seen via distinct
Authorization + User-Agent). Pick from the dropdown in the header to
choose which one to narrate — the others keep running silently.
Supported upstream providers:
| Provider | Env var | Upstream |
|---|---|---|
| Anthropic | ANTHROPIC_BASE_URL |
api.anthropic.com |
| OpenAI | OPENAI_BASE_URL |
api.openai.com |
| OpenRouter | OPENROUTER_BASE_URL |
openrouter.ai/api |
| Groq | GROQ_BASE_URL |
api.groq.com/openai |
| Mistral | MISTRAL_BASE_URL |
api.mistral.ai |
| Ollama | OLLAMA_BASE_URL |
127.0.0.1:11434 (local) |
| Gemini | GEMINI_BASE_URL |
generativelanguage.googleapis.com (passthrough) |
OAuth-authenticated CLIs (Claude Code subscription, Codex login flow) work out of the box — we only see the bearer token on the wire and forward it. The OAuth browser round-trip happens on different domains that we don't intercept.
Hotkeys
| Action | Default combo | Notes |
|---|---|---|
| Push-to-talk | Alt+Space |
Hold to record, release to send to STT. |
| Interrupt (pause & listen) | Alt+Shift+Space |
Suspends the CLI process. |
| Pause / resume toggle | Alt+P |
Manual pause of the child CLI. |
| Mute TTS | Alt+M |
Stops narration without affecting the agent. |
All four are rebindable on the settings page.
Providers
Every layer is pluggable. Defaults are cloud-light so pipx install voice-copilot
works out of the box.
| Default (light) | Local (extra) | Premium cloud | Secret name | |
|---|---|---|---|---|
| TTS | edge-tts |
silero, piper |
elevenlabs, openai |
ELEVENLABS_API_KEY, OPENAI_API_KEY |
| STT | openai-whisper-api |
faster-whisper |
deepgram |
OPENAI_API_KEY, DEEPGRAM_API_KEY |
| LLM | anthropic (Haiku) |
openai-compat (Ollama) |
openai |
ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENAI_COMPAT_API_KEY |
Switch via the Settings page or by editing ~/.voice-copilot/config.yaml.
Configuration
~/.voice-copilot/config.yaml— edited by hand or via the settings page.- Secrets live in the OS keychain (Credential Manager / Keychain / Secret
Service) or in a
.envnext to where you runvoice-copilot. - No fallbacks between providers: if the configured one fails, the error surfaces in the popup and narration stops. Fail loud, not silently.
Interception strategies
- Stream-JSON mode of the target CLI (Claude Code, Codex) — default when
available. Use
voice-copilot run claude/run codex. - HTTP reverse-proxy (
voice-copilot proxy, orrun … --proxy) — routes providerBASE_URLs through us so we can narratethinkingblocks even when the underlying TUI hides them. Works with any CLI that respects*_BASE_URLenv vars. No CA certs, no TLS interception — the client just talks plain HTTP to localhost. When--proxyruns alongside a stream-JSON adapter, the adapter suppresses duplicate LLM events so the proxy is the single source of truth. - PTY fallback — wraps any binary. Lower fidelity, last resort.
See docs/architecture.md.
Roadmap
Voice Copilot is in its first alpha. The goal right now is to validate the core idea with advanced users. Planned work:
- improve narration quality, timing, and signal-to-noise ratio
- stabilize multi-session workflows and session switching
- expand structured integrations with more coding CLIs
- refine the companion interface together with CLI authors so it matches real integration needs
- improve advanced configuration, onboarding, and developer documentation
- explore richer host UIs such as VS Code while keeping the core lightweight
The core is open-source under MIT to maximize adoption, experimentation, and community contributions. The CLI companion integration RFC lives in docs/cli-companion-interface.md, with the normative schema in docs/schemas/cli-companion-interface.schema.json.
Development
git clone https://github.com/conus-vision/voice-copilot
cd voice-copilot
uv sync --extra dev
uv run voice-copilot serve --demo # emit synthetic events, exercise the UI
uv run ruff check .
uv run mypy src/voice_copilot
uv run pytest
Troubleshooting
- No voice output — open DevTools in the popup, check the audio element
is receiving
audio_header/bytes frames. Most browsers need a user click before autoplay unlocks; click anywhere in the popup once. - Mic denied — the popup only works over
http://127.0.0.1:<port>(a localhost-trusted origin). Don't serve it from a LAN IP without HTTPS. keyringsays no backend on headless Linux —pip install keyrings.altor set env vars instead.- Commentator silent — check
commentator.min_importanceon the settings page; set tolowto hear everything while debugging.
Get involved
If Voice Copilot is useful to you, here's how to help it grow:
- ⭐ Star this repo — it's the clearest signal that the listening-first approach resonates, and it helps other engineers find the project.
- 🗣️ Tell us how you use it — open an issue or email info@conus.vision. Real workflows shape the roadmap.
- 🔌 Building a coding CLI? Let's design the companion interface together (see the integration RFC).
Contact: info@conus.vision · conus.vision
License
This repository is open-source under the MIT license.
That means individuals, teams, companies, and other open-source projects can use, modify, fork, and redistribute the core with minimal friction.
See LICENSE and LICENSING.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voice_copilot-0.0.3.tar.gz.
File metadata
- Download URL: voice_copilot-0.0.3.tar.gz
- Upload date:
- Size: 99.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
436110e8ff8b2805b478c503b0d1b6cb2af51a712394a2567793991e820c4d6f
|
|
| MD5 |
fadb473c39c7011036653eb05ec15039
|
|
| BLAKE2b-256 |
998725063187307d9faf4d5feab9fb9bf30c523ed53b456712643a7a7eff169c
|
File details
Details for the file voice_copilot-0.0.3-py3-none-any.whl.
File metadata
- Download URL: voice_copilot-0.0.3-py3-none-any.whl
- Upload date:
- Size: 130.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d28403557cd56d804443e99c29cb43dc63fa3db3ad36a25eae544166233581f
|
|
| MD5 |
b94eac8b2a06af2c67737a77d66af352
|
|
| BLAKE2b-256 |
388e3dbfec5daa43c7d2915feab2866437c328e11bc35de7751663fed347f643
|