VAD-driven streaming voice dictation for macOS with editing commands, tech-term correction, and keyboard output

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

voice-command

VAD-driven streaming voice dictation for macOS (Apple Silicon). Speaks into your mic, text appears in a terminal buffer or gets typed directly into any app.

All inference runs locally — Whisper for ASR, Silero for VAD, Qwen for tech-term correction. No cloud APIs.

Requirements

macOS with Apple Silicon
Python 3.12+
Microphone access (System Settings > Privacy > Microphone)
Accessibility access for type mode (System Settings > Privacy > Accessibility)

Install

From PyPI

pip install voice-command
# or
uv tool install voice-command

From source

git clone https://github.com/depoledna/voice-command.git
cd voice-command
uv sync

Models download automatically on first run (~300MB for Whisper + ~1GB for Qwen).

Usage

voice-cmd                # launch the TUI
voice-cmd --version      # print version and exit
voice-cmd --help         # usage

Dictation always types into the focused window (the TUI itself is auto-skipped to avoid feedback loops). The TUI mirrors what was transcribed so you can see and verbally edit it. From source: uv run python voice_cmd.py.

Configuration

Persistent settings live at ~/.config/voice-command/settings.json (or $XDG_CONFIG_HOME/voice-command/settings.json). The file is auto-created with defaults on first launch.

Key	Default	Description
`device`	`null`	Audio input device index. `null` → auto-detect
`llm_correction`	`false`	Run Qwen tech-term correction after ASR (downloads ~1GB)
`vad_threshold`	`0.45`	VAD speech threshold (0.10–0.95)
`min_silence_ms`	`600`	Silence (ms) required to end an utterance
`inactivity_clear_seconds`	`5`	Clear the TUI buffer + status message after this idle time. `0` disables auto-clear

Disabling llm_correction skips loading the ~1GB Qwen model entirely.

Hotkeys

The TUI shows a sticky 3-line header and accepts hotkeys at any time:

Key	Action
`P` / Space	Pause / resume listening (live)
`L`	Toggle LLM tech-term correction (live)
`D`	Pick audio device (live; mic is reattached on save)
`V`	Adjust VAD threshold (live)
`S`	Adjust min-silence (live)
`?`	Show help + voice commands
`Q` / `^C`	Quit

All hotkey changes auto-save to settings.json.

Voice Commands

Command	Action
`period` / `comma` / `question mark`	Insert punctuation
`new line`	Line break
`new paragraph`	Double line break
`scratch that`	Delete last ~5 words
`delete last N words`	Delete last N words
`undo`	Undo last action
`clear all`	Clear buffer
`stop listening`	Pause
`start listening`	Resume
`copy all`	Copy to clipboard
`done`	Copy to clipboard and exit
`show commands`	Show help overlay

Commands can appear inline with dictated text: "Send the email period new line Don't forget the attachment" produces two lines with proper punctuation.

Pipeline

Audio - sounddevice captures mic input, resampled to 16kHz
VAD - Silero VAD with hysteresis detects speech boundaries (32ms frames, pre-roll buffering)
ASR - MLX Whisper (small, 8-bit) with dev-vocabulary prompt
LLM - Qwen3 1.7B (4-bit) fixes tech terms: "fast api" -> "FastAPI", "type script" -> "TypeScript" (toggle off via L or llm_correction: false)
Commands - Sentence splitting + leading/trailing command extraction
Output - TUI buffer display or keystroke diff-typing via pynput

Output

Each utterance is typed into whichever app is currently focused via pynput. If your own terminal (the one running voice-cmd) is the frontmost window, typing is suppressed to avoid echoing into your shell. A 3-second countdown after launch lets you switch focus to the target app.

Benchmarks

# Compare ASR models (requires test fixtures in tests/fixtures/)
uv run python tests/benchmark.py

# Pipeline diagnostics
uv run python tests/diagnose_pipeline.py

Releasing

Update the version in pyproject.toml
Commit: git commit -am "chore: bump version to X.Y.Z"
Tag: git tag vX.Y.Z
Push: git push origin main --tags

The GitHub Actions workflow builds and publishes to PyPI automatically via trusted publishers (OIDC).

First-time PyPI setup

Go to https://pypi.org/manage/account/publishing/
Add a "pending publisher":
- Package name: voice-command
- Owner: depoledna
- Repository: voice-command
- Workflow: release.yml
- Environment: pypi
In the GitHub repo, go to Settings > Environments > create pypi

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

depoledn

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Apr 26, 2026

0.1.0

Apr 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_command-0.2.0.tar.gz (22.5 kB view details)

Uploaded Apr 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voice_command-0.2.0-py3-none-any.whl (26.3 kB view details)

Uploaded Apr 26, 2026 Python 3

File details

Details for the file voice_command-0.2.0.tar.gz.

File metadata

Download URL: voice_command-0.2.0.tar.gz
Upload date: Apr 26, 2026
Size: 22.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for voice_command-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`bf8ab520a32eb5fc2286a3e936bfc06cbe704463644415d8388f827a77aef2f2`
MD5	`b049e28ad2864c3b0ab9a452d34de353`
BLAKE2b-256	`f7f010885fab7b272c09b815682c2becc758fd7e219f0604ff58844db8e8f3e5`

See more details on using hashes here.

File details

Details for the file voice_command-0.2.0-py3-none-any.whl.

File metadata

Download URL: voice_command-0.2.0-py3-none-any.whl
Upload date: Apr 26, 2026
Size: 26.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for voice_command-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d2ec01902bdb8dd287abb0bf1f74b5a8513138d3ae61fe99fdf98e0ca06b71a9`
MD5	`ee187386c847817278b5c8f8d3b3b9fa`
BLAKE2b-256	`1d8db40ef250974db490ebbdbbe72567fb963fc77ece1ee52ca169446407e934`

See more details on using hashes here.

voice-command 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

voice-command

Requirements

Install

From PyPI

From source

Usage

Configuration

Hotkeys

Voice Commands

Pipeline

Output

Benchmarks

Releasing

First-time PyPI setup

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes