VAD-driven streaming voice dictation for macOS with editing commands, tech-term correction, and keyboard output

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

voice-command

VAD-driven streaming voice dictation for macOS (Apple Silicon). Speaks into your mic, text appears in a terminal buffer or gets typed directly into any app.

All inference runs locally — Whisper for ASR, Silero for VAD, Qwen for tech-term correction. No cloud APIs.

Requirements

macOS with Apple Silicon
Python 3.12+
Microphone access (System Settings > Privacy > Microphone)
Accessibility access for --type mode (System Settings > Privacy > Accessibility)

Install

From PyPI

pip install voice-command
# or
uv tool install voice-command

From source

git clone https://github.com/depoledna/voice-command.git
cd voice-command
uv sync

Models download automatically on first run (~300MB for Whisper + ~1GB for Qwen).

Usage

# Terminal buffer mode (TUI)
voice-cmd

# Type directly into the focused app
voice-cmd --type

# Transcribe a recording
voice-cmd --file recording.m4a

# List audio devices
voice-cmd --list-devices

# Use a specific device
voice-cmd --device 1

When running from source, use uv run python voice_cmd.py instead of voice-cmd.

Voice Commands

Command	Action
`period` / `comma` / `question mark`	Insert punctuation
`new line`	Line break
`new paragraph`	Double line break
`scratch that`	Delete last ~5 words
`delete last N words`	Delete last N words
`undo`	Undo last action
`clear all`	Clear buffer
`stop listening`	Pause
`start listening`	Resume
`copy all`	Copy to clipboard
`done`	Copy to clipboard and exit
`show commands`	Show command list

Commands can appear inline with dictated text: "Send the email period new line Don't forget the attachment" produces two lines with proper punctuation.

Pipeline

Audio - sounddevice captures mic input, resampled to 16kHz
VAD - Silero VAD with hysteresis detects speech boundaries (32ms frames, pre-roll buffering)
ASR - MLX Whisper (small, 8-bit) with dev-vocabulary prompt
LLM - Qwen3 1.7B (4-bit) fixes tech terms: "fast api" -> "FastAPI", "type script" -> "TypeScript"
Commands - Sentence splitting + leading/trailing command extraction
Output - TUI buffer display or keystroke diff-typing via pynput

Type Mode

--type mode sends keystrokes to the focused app. It detects when its own terminal is focused and skips typing to avoid feedback loops. A 3-second countdown lets you switch to the target app after launching.

Benchmarks

# Compare ASR models (requires test fixtures in tests/fixtures/)
uv run python tests/benchmark.py

# Pipeline diagnostics
uv run python tests/diagnose_pipeline.py

Releasing

Update the version in pyproject.toml
Commit: git commit -am "chore: bump version to X.Y.Z"
Tag: git tag vX.Y.Z
Push: git push origin main --tags

The GitHub Actions workflow builds and publishes to PyPI automatically via trusted publishers (OIDC).

First-time PyPI setup

Go to https://pypi.org/manage/account/publishing/
Add a "pending publisher":
- Package name: voice-command
- Owner: depoledna
- Repository: voice-command
- Workflow: release.yml
- Environment: pypi
In the GitHub repo, go to Settings > Environments > create pypi

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

depoledn

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Apr 26, 2026

This version

0.1.0

Apr 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_command-0.1.0.tar.gz (13.6 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voice_command-0.1.0-py3-none-any.whl (20.4 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file voice_command-0.1.0.tar.gz.

File metadata

Download URL: voice_command-0.1.0.tar.gz
Upload date: Apr 7, 2026
Size: 13.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for voice_command-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`adf988ae42002f39f401e0b0c03114f32ddbeb270a137808d06e35a0429a8b23`
MD5	`9b9ff886b503fa579a39f2f26e867354`
BLAKE2b-256	`4438deffbbd41c0e4d9fdd3367d77f7a5b71cfbad019edfc8a6a9d48e7b2cce1`

See more details on using hashes here.

File details

Details for the file voice_command-0.1.0-py3-none-any.whl.

File metadata

Download URL: voice_command-0.1.0-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 20.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for voice_command-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d6d9b0d613b3680632b129078c912109ec7d760dac9eebca2c04b0016d4d98db`
MD5	`dbfdc58cb822150d0585ef4940a22155`
BLAKE2b-256	`b5ae163b404f5cca4ddbd642b36e609bd55e02140bdf7033359dd2c370f04e1e`

See more details on using hashes here.

voice-command 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

voice-command

Requirements

Install

From PyPI

From source

Usage

Voice Commands

Pipeline

Type Mode

Benchmarks

Releasing

First-time PyPI setup

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes