voice-copilot

Voice-first narrator and control layer for LLM coding CLIs

These details have not been verified by PyPI

Project links

Project description

Voice Copilot

Listen to Control — listen to your coding agent and step in by voice the moment it matters.

Coding agents now run for minutes at a time: reading files, editing code, calling tools. You either babysit the terminal the whole time, or you look away and miss the moment it goes in the wrong direction. Voice Copilot narrates what the agent is doing in short spoken updates and lets you cut in by voice the second something matters — so you keep control without staring at the output.

▶ Watch the 60-second demo · Website · Quickstart

Why this exists

Autonomous coding agents changed the loop. A single prompt can trigger minutes of work, and the useful signal — what it understood, what it changed, what risk it noticed — is buried in a fast-scrolling wall of text. Reading every line defeats the point of delegating; ignoring the terminal means you only find out about a bad turn after it happened.

Existing tools do not close this gap. Plain text-to-speech reads logs verbatim, which is noise, not signal. Voice-coding tools dictate prompts into the agent but tell you nothing about what it is doing. Voice Copilot sits in the opposite seat: it is a listening-first companion that watches the agent on your behalf, compresses the stream into short updates you can follow with your eyes off the screen, and keeps a path to interrupt by voice when you need to steer.

How it works

A separate lightweight commentator LLM (Haiku 4.5 by default) watches the coding agent's event stream and produces short summaries of what the agent appears to be doing, why, and what changed. This is parallel analysis and summarization — not verbatim playback of the model's hidden thinking.

 coding agent  ──►  event stream  ──►  commentator LLM  ──►  spoken update  ──►  you
 (Claude/Codex)     (json / proxy)     (short summary)        (TTS)              (listen · interrupt by voice)

You hear compressed, actionable updates instead of raw text, which lowers cognitive load while keeping the option to inspect the full trace and step in.

Quickstart (~3 minutes to first narration)

Goal: hear Voice Copilot narrate a real Claude Code session. You need Python 3.11+ and an Anthropic API key.

1. Install (cloud-light defaults, no local models required):

pipx install voice-copilot          # or: uv tool install voice-copilot

2. Add your key. This opens settings in the browser — paste your ANTHROPIC_API_KEY (stored in your OS keychain, not in a file):

voice-copilot serve

Expected: a tab opens at http://127.0.0.1:8765 with a settings page.

3. Wrap a real agent run:

voice-copilot run claude -p "fix the failing tests"

Expected: the popup shows a live trace, and within a few seconds you hear a short spoken summary of what Claude is doing. Click once in the popup if the browser blocks autoplay.

4. Talk back. Hold Alt+Space, ask a question or give a correction, release. It goes to speech-to-text and into the agent as a side-question, a queued next message, or the clipboard (depending on CLI support).

That's the full loop: listen → understand → interrupt by voice.

Who it's for

Vibe coders building by feel, with the agent doing most of the typing. Reading a fast wall of diffs and tool calls breaks the flow and the fun. Voice Copilot keeps you in the creative loop: you hear what the agent decided and changed in plain language, stay aware of the direction, and jump in by voice the moment it drifts — no need to parse the terminal to stay in control.

Professional engineers running long, autonomous agent sessions (Claude Code, Codex) on real codebases. The risk isn't typing speed, it's a confident wrong turn buried in minutes of output. Voice Copilot surfaces the signal — root cause, risk, next step — so you keep situational awareness while doing something else, and interrupt early instead of reviewing a large bad diff after the fact. Listening also lets you supervise more than one session without staring at every token.

Multitaskers and reviewers who delegate work and need to know when to step in, not read everything. Narration is the ambient channel: glance at the trace only when an update tells you it matters.

CLI authors who want their tool to expose a clean event stream for companion narration (see the integration RFC below).

Status: 0.0.3 alpha. First public alpha, aimed at advanced users comfortable testing CLI workflows and sharing feedback. Created by Volodymyr Moskvin, Conus Vision. We are open to collaboration — info@conus.vision.

What it does

Wraps an LLM coding CLI (Claude Code, Codex CLI — more to come) and listens to its event stream in real time.
A small commentator LLM (Haiku 4.5 by default) summarises decisions, file edits and reasoning in short human-voice lines.
Keeps listening as the primary experience: hear what matters, read the trace when useful, and interrupt only when needed.
A browser popup on localhost exposes Play/Pause/Mute/Speak/Interrupt buttons and settings.
Push-to-talk (default Alt+Space): your question goes to STT → then into the running agent as a native side-question, a queued next message, or to the clipboard (depending on CLI capability).
Pause the CLI while you talk (Alt+P or auto-on-speak): the subprocess is suspended via psutil, no races with the agent.
Works in English, Spanish, French, Ukrainian and Russian.
Plug-in providers for TTS, STT and commentator LLM — run fully local or use cloud APIs.

What Voice Copilot is not

Not a prompt dictation app or voice keyboard
Not a replacement for Claude Code, Codex, Gemini CLI, or other coding agents
Not just text-to-speech for raw terminal logs
Not direct verbatim reading of the model's hidden thinking or full answer
Not another chat UI you need to stare at all day

Install options

The Quickstart above uses the cloud-light default. To run models locally instead, install the matching extra:

# light default: cloud STT/TTS
pipx install voice-copilot

# + local TTS (Silero / Piper)
pipx install "voice-copilot[local-tts]"

# + local STT (faster-whisper)
pipx install "voice-copilot[local-stt]"

# everything local
pipx install "voice-copilot[all]"

Or with uv:

uv tool install voice-copilot
uvx voice-copilot run claude -- -p "refactor the auth module"

Codex works the same way: voice-copilot run codex -p "explain what this repo does".

Narrate any CLI via proxy mode

voice-copilot run <target> only knows claude and codex. For everything else (aider, opencode, Cline, GitHub Copilot CLI that hits OpenAI/Anthropic), run the proxy as a standalone service and point your CLI's BASE_URL at it:

voice-copilot proxy
# → prints ANTHROPIC_BASE_URL=http://127.0.0.1:8766/anthropic
#          OPENAI_BASE_URL   =http://127.0.0.1:8766/openai/v1
#          ...and OpenRouter / Groq / Mistral / Ollama / Gemini

# in another terminal:
ANTHROPIC_BASE_URL=http://127.0.0.1:8766/anthropic \
  aider --model anthropic/claude-3-5-sonnet-20241022

The popup shows one entry per connected client (seen via distinct Authorization + User-Agent). Pick from the dropdown in the header to choose which one to narrate — the others keep running silently.

Supported upstream providers:

Provider	Env var	Upstream
Anthropic	`ANTHROPIC_BASE_URL`	`api.anthropic.com`
OpenAI	`OPENAI_BASE_URL`	`api.openai.com`
OpenRouter	`OPENROUTER_BASE_URL`	`openrouter.ai/api`
Groq	`GROQ_BASE_URL`	`api.groq.com/openai`
Mistral	`MISTRAL_BASE_URL`	`api.mistral.ai`
Ollama	`OLLAMA_BASE_URL`	`127.0.0.1:11434` (local)
Gemini	`GEMINI_BASE_URL`	`generativelanguage.googleapis.com` (passthrough)

OAuth-authenticated CLIs (Claude Code subscription, Codex login flow) work out of the box — we only see the bearer token on the wire and forward it. The OAuth browser round-trip happens on different domains that we don't intercept.

Hotkeys

Action	Default combo	Notes
Push-to-talk	`Alt+Space`	Hold to record, release to send to STT.
Interrupt (pause & listen)	`Alt+Shift+Space`	Suspends the CLI process.
Pause / resume toggle	`Alt+P`	Manual pause of the child CLI.
Mute TTS	`Alt+M`	Stops narration without affecting the agent.

All four are rebindable on the settings page.

Providers

Every layer is pluggable. Defaults are cloud-light so pipx install voice-copilot works out of the box.

	Default (light)	Local (extra)	Premium cloud	Secret name
TTS	`edge-tts`	`silero`, `piper`	`elevenlabs`, `openai`	`ELEVENLABS_API_KEY`, `OPENAI_API_KEY`
STT	`openai-whisper-api`	`faster-whisper`	`deepgram`	`OPENAI_API_KEY`, `DEEPGRAM_API_KEY`
LLM	`anthropic` (Haiku)	`openai-compat` (Ollama)	`openai`	`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `OPENAI_COMPAT_API_KEY`

Switch via the Settings page or by editing ~/.voice-copilot/config.yaml.

Configuration

~/.voice-copilot/config.yaml — edited by hand or via the settings page.
Secrets live in the OS keychain (Credential Manager / Keychain / Secret Service) or in a .env next to where you run voice-copilot.
No fallbacks between providers: if the configured one fails, the error surfaces in the popup and narration stops. Fail loud, not silently.

Interception strategies

Stream-JSON mode of the target CLI (Claude Code, Codex) — default when available. Use voice-copilot run claude / run codex.
HTTP reverse-proxy (voice-copilot proxy, or run … --proxy) — routes provider BASE_URLs through us so we can narrate thinking blocks even when the underlying TUI hides them. Works with any CLI that respects *_BASE_URL env vars. No CA certs, no TLS interception — the client just talks plain HTTP to localhost. When --proxy runs alongside a stream-JSON adapter, the adapter suppresses duplicate LLM events so the proxy is the single source of truth.
PTY fallback — wraps any binary. Lower fidelity, last resort.

See docs/architecture.md.

Roadmap

Voice Copilot is in its first alpha. The goal right now is to validate the core idea with advanced users. Planned work:

improve narration quality, timing, and signal-to-noise ratio
stabilize multi-session workflows and session switching
expand structured integrations with more coding CLIs
refine the companion interface together with CLI authors so it matches real integration needs
improve advanced configuration, onboarding, and developer documentation
explore richer host UIs such as VS Code while keeping the core lightweight

The core is open-source under MIT to maximize adoption, experimentation, and community contributions. The CLI companion integration RFC lives in docs/cli-companion-interface.md, with the normative schema in docs/schemas/cli-companion-interface.schema.json.

Development

git clone https://github.com/conus-vision/voice-copilot
cd voice-copilot
uv sync --extra dev
uv run voice-copilot serve --demo     # emit synthetic events, exercise the UI
uv run ruff check .
uv run mypy src/voice_copilot
uv run pytest

Troubleshooting

No voice output — open DevTools in the popup, check the audio element is receiving audio_header/bytes frames. Most browsers need a user click before autoplay unlocks; click anywhere in the popup once.
Mic denied — the popup only works over http://127.0.0.1:<port> (a localhost-trusted origin). Don't serve it from a LAN IP without HTTPS.
keyring says no backend on headless Linux — pip install keyrings.alt or set env vars instead.
Commentator silent — check commentator.min_importance on the settings page; set to low to hear everything while debugging.

Get involved

If Voice Copilot is useful to you, here's how to help it grow:

⭐ Star this repo — it's the clearest signal that the listening-first approach resonates, and it helps other engineers find the project.
🗣️ Tell us how you use it — open an issue or email info@conus.vision. Real workflows shape the roadmap.
🔌 Building a coding CLI? Let's design the companion interface together (see the integration RFC).

Contact: info@conus.vision · conus.vision

License

This repository is open-source under the MIT license.

That means individuals, teams, companies, and other open-source projects can use, modify, fork, and redistribute the core with minimal friction.

See LICENSE and LICENSING.md.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.3

Jun 23, 2026

0.0.2 yanked

Jun 23, 2026

Reason this release was yanked:

Superseded by 0.0.3 (stale README/demo)

0.0.1

Apr 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_copilot-0.0.3.tar.gz (99.5 kB view details)

Uploaded Jun 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voice_copilot-0.0.3-py3-none-any.whl (130.9 kB view details)

Uploaded Jun 23, 2026 Python 3

File details

Details for the file voice_copilot-0.0.3.tar.gz.

File metadata

Download URL: voice_copilot-0.0.3.tar.gz
Upload date: Jun 23, 2026
Size: 99.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.24

File hashes

Hashes for voice_copilot-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`436110e8ff8b2805b478c503b0d1b6cb2af51a712394a2567793991e820c4d6f`
MD5	`fadb473c39c7011036653eb05ec15039`
BLAKE2b-256	`998725063187307d9faf4d5feab9fb9bf30c523ed53b456712643a7a7eff169c`

See more details on using hashes here.

File details

Details for the file voice_copilot-0.0.3-py3-none-any.whl.

File metadata

Download URL: voice_copilot-0.0.3-py3-none-any.whl
Upload date: Jun 23, 2026
Size: 130.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.24

File hashes

Hashes for voice_copilot-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8d28403557cd56d804443e99c29cb43dc63fa3db3ad36a25eae544166233581f`
MD5	`b94eac8b2a06af2c67737a77d66af352`
BLAKE2b-256	`388e3dbfec5daa43c7d2915feab2866437c328e11bc35de7751663fed347f643`

See more details on using hashes here.

voice-copilot 0.0.3

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Why this exists

How it works

Quickstart (~3 minutes to first narration)

Who it's for

What it does

What Voice Copilot is not

Install options

Narrate any CLI via proxy mode

Hotkeys

Providers

Configuration

Interception strategies

Roadmap

Development

Troubleshooting

Get involved

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes