Voice-first narrator and control layer for LLM coding CLIs

These details have not been verified by PyPI

Project links

Project description

Voice Copilot

Voice Copilot is a listening-first companion for LLM coding CLIs. It uses a separate commentator LLM to analyze what the coding agent is doing, summarize the important parts of its reasoning and response, and let you intervene at the right moment.

Instead of constantly reading terminal output, you can listen to short spoken updates, glance at the trace when needed, and step in by voice when the agent needs correction. The main value is lower cognitive load and better situational awareness while the coding agent works.

Status: 0.0.1 alpha. This is the first public alpha release. Created by Volodymyr Moskvin, Conus Vision. Contact: info@conus.vision | https://conus.vision We are open to collaboration and to growing the project together.

Project direction

The Voice Copilot core is open-source under MIT to maximize adoption, experimentation, and community contributions.
Launch messaging drafts for video and social posts live in docs/launch/README.md.
The CLI companion integration RFC lives in docs/cli-companion-interface.md, with the normative schema in docs/schemas/cli-companion-interface.schema.json.

Alpha status and development plans

Voice Copilot is currently in its first alpha release.

The current goal is to validate the core idea with advanced users: a separate commentator LLM that helps you follow what a coding agent is doing by listening first, reading when needed, and intervening at the right moment.

Planned areas of development:

improve narration quality, timing, and signal-to-noise ratio
stabilize multi-session workflows and session switching
expand structured integrations with more coding CLIs
refine the companion interface together with CLI authors so it matches real integration needs
improve advanced configuration, onboarding, and developer documentation
explore richer host UIs such as VS Code while keeping the core workflow lightweight

What Voice Copilot Is Not

Not a prompt dictation app or voice keyboard
Not a replacement for Claude Code, Codex, Gemini CLI, or other coding agents
Not just text-to-speech for raw terminal logs
Not direct verbatim reading of the model's hidden thinking or full answer
Not another chat UI you need to stare at all day

How Narration Works

Voice Copilot turns a coding agent's event stream into short spoken updates.

A separate lightweight commentator model watches the coding agent's event stream and produces short summaries of what the agent appears to be doing, why it is doing it, and what changed. In other words, this is parallel analysis and summarization, not direct playback of the coding model's thinking or answer.

That is what makes the listening experience useful: you hear compressed, actionable updates instead of a long stream of raw text, which reduces cognitive load while preserving the option to inspect the trace and intervene.

What it does

Wraps an LLM coding CLI (Claude Code, Codex CLI — more to come) and listens to its event stream in real time.
A small commentator LLM (Haiku 4.5 by default) summarises decisions, file edits and reasoning in short human-voice lines.
Keeps listening as the primary experience: hear what matters, read the trace when useful, and interrupt only when needed.
A browser popup on localhost exposes Play/Pause/Mute/Speak/Interrupt buttons and settings.
Push-to-talk (default Alt+Space): your question goes to STT → then into the running agent as a native side-question, a queued next message, or to the clipboard (depending on CLI capability).
Pause the CLI while you talk (Alt+P or auto-on-speak): the subprocess is suspended via psutil, no races with the agent.
Works in English, Spanish, French, Ukrainian and Russian.
Plug-in providers for TTS, STT and commentator LLM — run fully local or use cloud APIs.

Install

# light default: cloud STT/TTS
pipx install voice-copilot

# + local TTS (Silero / Piper)
pipx install "voice-copilot[local-tts]"

# + local STT (faster-whisper)
pipx install "voice-copilot[local-stt]"

# everything local
pipx install "voice-copilot[all]"

Or with uv:

uv tool install voice-copilot
uvx voice-copilot run claude -- -p "refactor the auth module"

Quick start

# 1. point voice-copilot at your Anthropic key (keychain-backed)
voice-copilot serve                 # opens settings → paste ANTHROPIC_API_KEY

# 2. wrap Claude Code
voice-copilot run claude -p "fix failing tests"

# 3. or wrap Codex CLI
voice-copilot run codex  -p "explain what this repo does"

A browser tab opens at http://127.0.0.1:8765. Hold Alt+Space to speak.

Narrate any CLI via proxy mode

voice-copilot run <target> only knows claude and codex. For everything else (aider, opencode, Cline, GitHub Copilot CLI that hits OpenAI/Anthropic), run the proxy as a standalone service and point your CLI's BASE_URL at it:

voice-copilot proxy
# → prints ANTHROPIC_BASE_URL=http://127.0.0.1:8766/anthropic
#          OPENAI_BASE_URL   =http://127.0.0.1:8766/openai/v1
#          ...and OpenRouter / Groq / Mistral / Ollama / Gemini

# in another terminal:
ANTHROPIC_BASE_URL=http://127.0.0.1:8766/anthropic \
  aider --model anthropic/claude-3-5-sonnet-20241022

The popup shows one entry per connected client (seen via distinct Authorization + User-Agent). Pick from the dropdown in the header to choose which one to narrate — the others keep running silently.

Supported upstream providers:

Provider	Env var	Upstream
Anthropic	`ANTHROPIC_BASE_URL`	`api.anthropic.com`
OpenAI	`OPENAI_BASE_URL`	`api.openai.com`
OpenRouter	`OPENROUTER_BASE_URL`	`openrouter.ai/api`
Groq	`GROQ_BASE_URL`	`api.groq.com/openai`
Mistral	`MISTRAL_BASE_URL`	`api.mistral.ai`
Ollama	`OLLAMA_BASE_URL`	`127.0.0.1:11434` (local)
Gemini	`GEMINI_BASE_URL`	`generativelanguage.googleapis.com` (passthrough)

OAuth-authenticated CLIs (Claude Code subscription, Codex login flow) work out of the box — we only see the bearer token on the wire and forward it. The OAuth browser round-trip happens on different domains that we don't intercept.

Hotkeys

Action	Default combo	Notes
Push-to-talk	`Alt+Space`	Hold to record, release to send to STT.
Interrupt (pause & listen)	`Alt+Shift+Space`	Suspends the CLI process.
Pause / resume toggle	`Alt+P`	Manual pause of the child CLI.
Mute TTS	`Alt+M`	Stops narration without affecting the agent.

All four are rebindable on the settings page.

Providers

Every layer is pluggable. Defaults are cloud-light so pipx install voice-copilot works out of the box.

	Default (light)	Local (extra)	Premium cloud	Secret name
TTS	`edge-tts`	`silero`, `piper`	`elevenlabs`, `openai`	`ELEVENLABS_API_KEY`, `OPENAI_API_KEY`
STT	`openai-whisper-api`	`faster-whisper`	`deepgram`	`OPENAI_API_KEY`, `DEEPGRAM_API_KEY`
LLM	`anthropic` (Haiku)	`openai-compat` (Ollama)	`openai`	`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `OPENAI_COMPAT_API_KEY`

Switch via the Settings page or by editing ~/.voice-copilot/config.yaml.

Configuration

~/.voice-copilot/config.yaml — edited by hand or via the settings page.
Secrets live in the OS keychain (Credential Manager / Keychain / Secret Service) or in a .env next to where you run voice-copilot.
No fallbacks between providers: if the configured one fails, the error surfaces in the popup and narration stops. Fail loud, not silently.

Interception strategies

Stream-JSON mode of the target CLI (Claude Code, Codex) — default when available. Use voice-copilot run claude / run codex.
HTTP reverse-proxy (voice-copilot proxy, or run … --proxy) — routes provider BASE_URLs through us so we can narrate thinking blocks even when the underlying TUI hides them. Works with any CLI that respects *_BASE_URL env vars. No CA certs, no TLS interception — the client just talks plain HTTP to localhost. When --proxy runs alongside a stream-JSON adapter, the adapter suppresses duplicate LLM events so the proxy is the single source of truth.
PTY fallback — wraps any binary. Lower fidelity, last resort.

See docs/architecture.md.

Development

git clone https://github.com/voice-copilot/voice-copilot
cd voice-copilot
uv sync --extra dev
uv run voice-copilot serve --demo     # emit synthetic events, exercise the UI
uv run ruff check .
uv run mypy src/voice_copilot
uv run pytest

Troubleshooting

No voice output — open DevTools in the popup, check the audio element is receiving audio_header/bytes frames. Most browsers need a user click before autoplay unlocks; click anywhere in the popup once.
Mic denied — the popup only works over http://127.0.0.1:<port> (a localhost-trusted origin). Don't serve it from a LAN IP without HTTPS.
keyring says no backend on headless Linux — pip install keyrings.alt or set env vars instead.
Commentator silent — check commentator.min_importance on the settings page; set to low to hear everything while debugging.

Contact

For collaboration, feedback, integrations, and community discussions:

Email: info@conus.vision
Website: https://conus.vision

License

This repository is open-source under the MIT license.

That means individuals, teams, companies, and other open-source projects can use, modify, fork, and redistribute the core with minimal friction.

See LICENSE and LICENSING.md.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.3

Jun 23, 2026

0.0.2 yanked

Jun 23, 2026

Reason this release was yanked:

Superseded by 0.0.3 (stale README/demo)

This version

0.0.1

Apr 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_copilot-0.0.1.tar.gz (326.6 kB view details)

Uploaded Apr 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voice_copilot-0.0.1-py3-none-any.whl (127.5 kB view details)

Uploaded Apr 27, 2026 Python 3

File details

Details for the file voice_copilot-0.0.1.tar.gz.

File metadata

Download URL: voice_copilot-0.0.1.tar.gz
Upload date: Apr 27, 2026
Size: 326.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.24

File hashes

Hashes for voice_copilot-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`2c3446e8d322cbd057f902700d8c125d8aedbcf9b0f6e5307e6da205bacb2e31`
MD5	`3323f74f17b1b83448566dd12f40e12b`
BLAKE2b-256	`7ee0e9f8114cc7fd234ee738587454d7a35054f0388e4d50dafa79cb9ffd659f`

See more details on using hashes here.

File details

Details for the file voice_copilot-0.0.1-py3-none-any.whl.

File metadata

Download URL: voice_copilot-0.0.1-py3-none-any.whl
Upload date: Apr 27, 2026
Size: 127.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.24

File hashes

Hashes for voice_copilot-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`58bc5aedaa3451fa8905ecb277c47c8783072ceda3f489dd61e9f38fcb6fb392`
MD5	`95737321187a231a9495feb4a459fb54`
BLAKE2b-256	`c930629d3f34aeb0a8cb598209291b3ca044ca788c864eda9e7dd8fda188f786`

See more details on using hashes here.

voice-copilot 0.0.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Voice Copilot

Project direction

Alpha status and development plans

What Voice Copilot Is Not

How Narration Works

What it does

Install

Quick start

Narrate any CLI via proxy mode

Hotkeys

Providers

Configuration

Interception strategies

Development

Troubleshooting

Contact

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes