Skip to main content

Voice-first narrator and control layer for LLM coding CLIs

Project description

Voice Copilot

Voice Copilot is a listening-first companion for LLM coding CLIs. It uses a separate commentator LLM to analyze what the coding agent is doing, summarize the important parts of its reasoning and response, and let you intervene at the right moment.

Instead of constantly reading terminal output, you can listen to short spoken updates, glance at the trace when needed, and step in by voice when the agent needs correction. The main value is lower cognitive load and better situational awareness while the coding agent works.

Status: 0.0.1 alpha. This is the first public alpha release. Created by Volodymyr Moskvin, Conus Vision. Contact: info@conus.vision | https://conus.vision We are open to collaboration and to growing the project together.

Project direction

Alpha status and development plans

Voice Copilot is currently in its first alpha release.

The current goal is to validate the core idea with advanced users: a separate commentator LLM that helps you follow what a coding agent is doing by listening first, reading when needed, and intervening at the right moment.

Planned areas of development:

  • improve narration quality, timing, and signal-to-noise ratio
  • stabilize multi-session workflows and session switching
  • expand structured integrations with more coding CLIs
  • refine the companion interface together with CLI authors so it matches real integration needs
  • improve advanced configuration, onboarding, and developer documentation
  • explore richer host UIs such as VS Code while keeping the core workflow lightweight

What Voice Copilot Is Not

  • Not a prompt dictation app or voice keyboard
  • Not a replacement for Claude Code, Codex, Gemini CLI, or other coding agents
  • Not just text-to-speech for raw terminal logs
  • Not direct verbatim reading of the model's hidden thinking or full answer
  • Not another chat UI you need to stare at all day

How Narration Works

Voice Copilot turns a coding agent's event stream into short spoken updates.

A separate lightweight commentator model watches the coding agent's event stream and produces short summaries of what the agent appears to be doing, why it is doing it, and what changed. In other words, this is parallel analysis and summarization, not direct playback of the coding model's thinking or answer.

That is what makes the listening experience useful: you hear compressed, actionable updates instead of a long stream of raw text, which reduces cognitive load while preserving the option to inspect the trace and intervene.

What it does

  • Wraps an LLM coding CLI (Claude Code, Codex CLI — more to come) and listens to its event stream in real time.
  • A small commentator LLM (Haiku 4.5 by default) summarises decisions, file edits and reasoning in short human-voice lines.
  • Keeps listening as the primary experience: hear what matters, read the trace when useful, and interrupt only when needed.
  • A browser popup on localhost exposes Play/Pause/Mute/Speak/Interrupt buttons and settings.
  • Push-to-talk (default Alt+Space): your question goes to STT → then into the running agent as a native side-question, a queued next message, or to the clipboard (depending on CLI capability).
  • Pause the CLI while you talk (Alt+P or auto-on-speak): the subprocess is suspended via psutil, no races with the agent.
  • Works in English, Spanish, French, Ukrainian and Russian.
  • Plug-in providers for TTS, STT and commentator LLM — run fully local or use cloud APIs.

Install

# light default: cloud STT/TTS
pipx install voice-copilot

# + local TTS (Silero / Piper)
pipx install "voice-copilot[local-tts]"

# + local STT (faster-whisper)
pipx install "voice-copilot[local-stt]"

# everything local
pipx install "voice-copilot[all]"

Or with uv:

uv tool install voice-copilot
uvx voice-copilot run claude -- -p "refactor the auth module"

Quick start

# 1. point voice-copilot at your Anthropic key (keychain-backed)
voice-copilot serve                 # opens settings → paste ANTHROPIC_API_KEY

# 2. wrap Claude Code
voice-copilot run claude -p "fix failing tests"

# 3. or wrap Codex CLI
voice-copilot run codex  -p "explain what this repo does"

A browser tab opens at http://127.0.0.1:8765. Hold Alt+Space to speak.

Narrate any CLI via proxy mode

voice-copilot run <target> only knows claude and codex. For everything else (aider, opencode, Cline, GitHub Copilot CLI that hits OpenAI/Anthropic), run the proxy as a standalone service and point your CLI's BASE_URL at it:

voice-copilot proxy
# → prints ANTHROPIC_BASE_URL=http://127.0.0.1:8766/anthropic
#          OPENAI_BASE_URL   =http://127.0.0.1:8766/openai/v1
#          ...and OpenRouter / Groq / Mistral / Ollama / Gemini

# in another terminal:
ANTHROPIC_BASE_URL=http://127.0.0.1:8766/anthropic \
  aider --model anthropic/claude-3-5-sonnet-20241022

The popup shows one entry per connected client (seen via distinct Authorization + User-Agent). Pick from the dropdown in the header to choose which one to narrate — the others keep running silently.

Supported upstream providers:

Provider Env var Upstream
Anthropic ANTHROPIC_BASE_URL api.anthropic.com
OpenAI OPENAI_BASE_URL api.openai.com
OpenRouter OPENROUTER_BASE_URL openrouter.ai/api
Groq GROQ_BASE_URL api.groq.com/openai
Mistral MISTRAL_BASE_URL api.mistral.ai
Ollama OLLAMA_BASE_URL 127.0.0.1:11434 (local)
Gemini GEMINI_BASE_URL generativelanguage.googleapis.com (passthrough)

OAuth-authenticated CLIs (Claude Code subscription, Codex login flow) work out of the box — we only see the bearer token on the wire and forward it. The OAuth browser round-trip happens on different domains that we don't intercept.

Hotkeys

Action Default combo Notes
Push-to-talk Alt+Space Hold to record, release to send to STT.
Interrupt (pause & listen) Alt+Shift+Space Suspends the CLI process.
Pause / resume toggle Alt+P Manual pause of the child CLI.
Mute TTS Alt+M Stops narration without affecting the agent.

All four are rebindable on the settings page.

Providers

Every layer is pluggable. Defaults are cloud-light so pipx install voice-copilot works out of the box.

Default (light) Local (extra) Premium cloud Secret name
TTS edge-tts silero, piper elevenlabs, openai ELEVENLABS_API_KEY, OPENAI_API_KEY
STT openai-whisper-api faster-whisper deepgram OPENAI_API_KEY, DEEPGRAM_API_KEY
LLM anthropic (Haiku) openai-compat (Ollama) openai ANTHROPIC_API_KEY, OPENAI_API_KEY, OPENAI_COMPAT_API_KEY

Switch via the Settings page or by editing ~/.voice-copilot/config.yaml.

Configuration

  • ~/.voice-copilot/config.yaml — edited by hand or via the settings page.
  • Secrets live in the OS keychain (Credential Manager / Keychain / Secret Service) or in a .env next to where you run voice-copilot.
  • No fallbacks between providers: if the configured one fails, the error surfaces in the popup and narration stops. Fail loud, not silently.

Interception strategies

  1. Stream-JSON mode of the target CLI (Claude Code, Codex) — default when available. Use voice-copilot run claude / run codex.
  2. HTTP reverse-proxy (voice-copilot proxy, or run … --proxy) — routes provider BASE_URLs through us so we can narrate thinking blocks even when the underlying TUI hides them. Works with any CLI that respects *_BASE_URL env vars. No CA certs, no TLS interception — the client just talks plain HTTP to localhost. When --proxy runs alongside a stream-JSON adapter, the adapter suppresses duplicate LLM events so the proxy is the single source of truth.
  3. PTY fallback — wraps any binary. Lower fidelity, last resort.

See docs/architecture.md.

Development

git clone https://github.com/voice-copilot/voice-copilot
cd voice-copilot
uv sync --extra dev
uv run voice-copilot serve --demo     # emit synthetic events, exercise the UI
uv run ruff check .
uv run mypy src/voice_copilot
uv run pytest

Troubleshooting

  • No voice output — open DevTools in the popup, check the audio element is receiving audio_header/bytes frames. Most browsers need a user click before autoplay unlocks; click anywhere in the popup once.
  • Mic denied — the popup only works over http://127.0.0.1:<port> (a localhost-trusted origin). Don't serve it from a LAN IP without HTTPS.
  • keyring says no backend on headless Linux — pip install keyrings.alt or set env vars instead.
  • Commentator silent — check commentator.min_importance on the settings page; set to low to hear everything while debugging.

Contact

For collaboration, feedback, integrations, and community discussions:

License

This repository is open-source under the MIT license.

That means individuals, teams, companies, and other open-source projects can use, modify, fork, and redistribute the core with minimal friction.

See LICENSE and LICENSING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_copilot-0.0.1.tar.gz (326.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voice_copilot-0.0.1-py3-none-any.whl (127.5 kB view details)

Uploaded Python 3

File details

Details for the file voice_copilot-0.0.1.tar.gz.

File metadata

  • Download URL: voice_copilot-0.0.1.tar.gz
  • Upload date:
  • Size: 326.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.24

File hashes

Hashes for voice_copilot-0.0.1.tar.gz
Algorithm Hash digest
SHA256 2c3446e8d322cbd057f902700d8c125d8aedbcf9b0f6e5307e6da205bacb2e31
MD5 3323f74f17b1b83448566dd12f40e12b
BLAKE2b-256 7ee0e9f8114cc7fd234ee738587454d7a35054f0388e4d50dafa79cb9ffd659f

See more details on using hashes here.

File details

Details for the file voice_copilot-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for voice_copilot-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 58bc5aedaa3451fa8905ecb277c47c8783072ceda3f489dd61e9f38fcb6fb392
MD5 95737321187a231a9495feb4a459fb54
BLAKE2b-256 c930629d3f34aeb0a8cb598209291b3ca044ca788c864eda9e7dd8fda188f786

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page