Skip to main content

VAD-driven streaming voice dictation for macOS with editing commands, tech-term correction, and keyboard output

Project description

voice-command

VAD-driven streaming voice dictation for macOS (Apple Silicon). Speaks into your mic, text appears in a terminal buffer or gets typed directly into any app.

All inference runs locally — Whisper for ASR, Silero for VAD, Qwen for tech-term correction. No cloud APIs.

Requirements

  • macOS with Apple Silicon
  • Python 3.12+
  • Microphone access (System Settings > Privacy > Microphone)
  • Accessibility access for --type mode (System Settings > Privacy > Accessibility)

Install

From PyPI

pip install voice-command
# or
uv tool install voice-command

From source

git clone https://github.com/depoledna/voice-command.git
cd voice-command
uv sync

Models download automatically on first run (~300MB for Whisper + ~1GB for Qwen).

Usage

# Terminal buffer mode (TUI)
voice-cmd

# Type directly into the focused app
voice-cmd --type

# Transcribe a recording
voice-cmd --file recording.m4a

# List audio devices
voice-cmd --list-devices

# Use a specific device
voice-cmd --device 1

When running from source, use uv run python voice_cmd.py instead of voice-cmd.

Voice Commands

Command Action
period / comma / question mark Insert punctuation
new line Line break
new paragraph Double line break
scratch that Delete last ~5 words
delete last N words Delete last N words
undo Undo last action
clear all Clear buffer
stop listening Pause
start listening Resume
copy all Copy to clipboard
done Copy to clipboard and exit
show commands Show command list

Commands can appear inline with dictated text: "Send the email period new line Don't forget the attachment" produces two lines with proper punctuation.

Pipeline

  1. Audio - sounddevice captures mic input, resampled to 16kHz
  2. VAD - Silero VAD with hysteresis detects speech boundaries (32ms frames, pre-roll buffering)
  3. ASR - MLX Whisper (small, 8-bit) with dev-vocabulary prompt
  4. LLM - Qwen3 1.7B (4-bit) fixes tech terms: "fast api" -> "FastAPI", "type script" -> "TypeScript"
  5. Commands - Sentence splitting + leading/trailing command extraction
  6. Output - TUI buffer display or keystroke diff-typing via pynput

Type Mode

--type mode sends keystrokes to the focused app. It detects when its own terminal is focused and skips typing to avoid feedback loops. A 3-second countdown lets you switch to the target app after launching.

Benchmarks

# Compare ASR models (requires test fixtures in tests/fixtures/)
uv run python tests/benchmark.py

# Pipeline diagnostics
uv run python tests/diagnose_pipeline.py

Releasing

  1. Update the version in pyproject.toml
  2. Commit: git commit -am "chore: bump version to X.Y.Z"
  3. Tag: git tag vX.Y.Z
  4. Push: git push origin main --tags

The GitHub Actions workflow builds and publishes to PyPI automatically via trusted publishers (OIDC).

First-time PyPI setup

  1. Go to https://pypi.org/manage/account/publishing/
  2. Add a "pending publisher":
    • Package name: voice-command
    • Owner: depoledna
    • Repository: voice-command
    • Workflow: release.yml
    • Environment: pypi
  3. In the GitHub repo, go to Settings > Environments > create pypi

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voice_command-0.1.0.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voice_command-0.1.0-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file voice_command-0.1.0.tar.gz.

File metadata

  • Download URL: voice_command-0.1.0.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for voice_command-0.1.0.tar.gz
Algorithm Hash digest
SHA256 adf988ae42002f39f401e0b0c03114f32ddbeb270a137808d06e35a0429a8b23
MD5 9b9ff886b503fa579a39f2f26e867354
BLAKE2b-256 4438deffbbd41c0e4d9fdd3367d77f7a5b71cfbad019edfc8a6a9d48e7b2cce1

See more details on using hashes here.

File details

Details for the file voice_command-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: voice_command-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for voice_command-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d6d9b0d613b3680632b129078c912109ec7d760dac9eebca2c04b0016d4d98db
MD5 dbfdc58cb822150d0585ef4940a22155
BLAKE2b-256 b5ae163b404f5cca4ddbd642b36e609bd55e02140bdf7033359dd2c370f04e1e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page