VAD-driven streaming voice dictation for macOS with editing commands, tech-term correction, and keyboard output
Project description
voice-command
VAD-driven streaming voice dictation for macOS (Apple Silicon). Speaks into your mic, text appears in a terminal buffer or gets typed directly into any app.
All inference runs locally — Whisper for ASR, Silero for VAD, Qwen for tech-term correction. No cloud APIs.
Requirements
- macOS with Apple Silicon
- Python 3.12+
- Microphone access (System Settings > Privacy > Microphone)
- Accessibility access for
typemode (System Settings > Privacy > Accessibility)
Install
From PyPI
pip install voice-command
# or
uv tool install voice-command
From source
git clone https://github.com/depoledna/voice-command.git
cd voice-command
uv sync
Models download automatically on first run (~300MB for Whisper + ~1GB for Qwen).
Usage
voice-cmd # launch the TUI
voice-cmd --version # print version and exit
voice-cmd --help # usage
Dictation always types into the focused window (the TUI itself is auto-skipped to avoid feedback loops). The TUI mirrors what was transcribed so you can see and verbally edit it. From source: uv run python voice_cmd.py.
Configuration
Persistent settings live at ~/.config/voice-command/settings.json (or $XDG_CONFIG_HOME/voice-command/settings.json). The file is auto-created with defaults on first launch.
| Key | Default | Description |
|---|---|---|
device |
null |
Audio input device index. null → auto-detect |
llm_correction |
false |
Run Qwen tech-term correction after ASR (downloads ~1GB) |
vad_threshold |
0.45 |
VAD speech threshold (0.10–0.95) |
min_silence_ms |
600 |
Silence (ms) required to end an utterance |
inactivity_clear_seconds |
5 |
Clear the TUI buffer + status message after this idle time. 0 disables auto-clear |
Disabling llm_correction skips loading the ~1GB Qwen model entirely.
Hotkeys
The TUI shows a sticky 3-line header and accepts hotkeys at any time:
| Key | Action |
|---|---|
P / Space |
Pause / resume listening (live) |
L |
Toggle LLM tech-term correction (live) |
D |
Pick audio device (live; mic is reattached on save) |
V |
Adjust VAD threshold (live) |
S |
Adjust min-silence (live) |
? |
Show help + voice commands |
Q / ^C |
Quit |
All hotkey changes auto-save to settings.json.
Voice Commands
| Command | Action |
|---|---|
period / comma / question mark |
Insert punctuation |
new line |
Line break |
new paragraph |
Double line break |
scratch that |
Delete last ~5 words |
delete last N words |
Delete last N words |
undo |
Undo last action |
clear all |
Clear buffer |
stop listening |
Pause |
start listening |
Resume |
copy all |
Copy to clipboard |
done |
Copy to clipboard and exit |
show commands |
Show help overlay |
Commands can appear inline with dictated text: "Send the email period new line Don't forget the attachment" produces two lines with proper punctuation.
Pipeline
- Audio -
sounddevicecaptures mic input, resampled to 16kHz - VAD - Silero VAD with hysteresis detects speech boundaries (32ms frames, pre-roll buffering)
- ASR - MLX Whisper (small, 8-bit) with dev-vocabulary prompt
- LLM - Qwen3 1.7B (4-bit) fixes tech terms: "fast api" -> "FastAPI", "type script" -> "TypeScript" (toggle off via
Lorllm_correction: false) - Commands - Sentence splitting + leading/trailing command extraction
- Output - TUI buffer display or keystroke diff-typing via pynput
Output
Each utterance is typed into whichever app is currently focused via pynput. If your own terminal (the one running voice-cmd) is the frontmost window, typing is suppressed to avoid echoing into your shell. A 3-second countdown after launch lets you switch focus to the target app.
Benchmarks
# Compare ASR models (requires test fixtures in tests/fixtures/)
uv run python tests/benchmark.py
# Pipeline diagnostics
uv run python tests/diagnose_pipeline.py
Releasing
- Update the version in
pyproject.toml - Commit:
git commit -am "chore: bump version to X.Y.Z" - Tag:
git tag vX.Y.Z - Push:
git push origin main --tags
The GitHub Actions workflow builds and publishes to PyPI automatically via trusted publishers (OIDC).
First-time PyPI setup
- Go to https://pypi.org/manage/account/publishing/
- Add a "pending publisher":
- Package name:
voice-command - Owner:
depoledna - Repository:
voice-command - Workflow:
release.yml - Environment:
pypi
- Package name:
- In the GitHub repo, go to Settings > Environments > create
pypi
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voice_command-0.2.0.tar.gz.
File metadata
- Download URL: voice_command-0.2.0.tar.gz
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf8ab520a32eb5fc2286a3e936bfc06cbe704463644415d8388f827a77aef2f2
|
|
| MD5 |
b049e28ad2864c3b0ab9a452d34de353
|
|
| BLAKE2b-256 |
f7f010885fab7b272c09b815682c2becc758fd7e219f0604ff58844db8e8f3e5
|
File details
Details for the file voice_command-0.2.0-py3-none-any.whl.
File metadata
- Download URL: voice_command-0.2.0-py3-none-any.whl
- Upload date:
- Size: 26.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2ec01902bdb8dd287abb0bf1f74b5a8513138d3ae61fe99fdf98e0ca06b71a9
|
|
| MD5 |
ee187386c847817278b5c8f8d3b3b9fa
|
|
| BLAKE2b-256 |
1d8db40ef250974db490ebbdbbe72567fb963fc77ece1ee52ca169446407e934
|