Skip to main content

Voice input layer: captures spoken intent, transcribes locally, injects into active workflow or agent pipeline.

Project description

Vox

Vox is the voice input layer for the system. It captures speech via push-to-talk, transcribes it locally with faster-whisper, and injects the text into the clipboard (and optionally into the focused window). No cloud calls; no silent failures.

Install

From PyPI (no clone required; package name is vox-core because vox is taken on PyPI):

uvx vox-core

Or install the tool, then run it (the CLI command is still vox):

pip install vox-core
vox

From source (development or latest):

git clone https://github.com/jeffrichley/vox.git && cd vox
uv sync
uv run vox

Or install in editable mode: pip install -e . (use a venv), then vox.

Pre-built binaries (GitHub Releases)

Packaged binaries for Windows, macOS, and Linux are built on each GitHub Release. Download the archive for your platform (e.g. vox-<version>-windows-amd64.zip, vox-<version>-macos-arm64.zip, vox-<version>-linux-x86_64.tar.gz), unpack it, then run the binary:

  • Windows: Unzip the archive, then run vox.exe from inside the vox folder (e.g. .\vox\vox.exe --help).
  • macOS / Linux: Unzip or untar the archive, then run ./vox/vox from the extracted directory (e.g. ./vox/vox --help).

On first run, the Whisper model is downloaded automatically (see Transcription model). Binaries are built with PyInstaller and are not signed; you may see a security or Gatekeeper prompt on Windows or macOS.

Configuration

  • Config file: ~/.vox/vox.toml. Create the directory if needed: mkdir -p ~/.vox.
  • Override path: set VOX_CONFIG to the full path of your config file.
  • Env overrides: VOX_HOTKEY, VOX_DEVICE_ID, VOX_MODEL_SIZE, VOX_COMPUTE_TYPE, VOX_COMPUTE_DEVICE, VOX_INJECTION_MODE, VOX_TRAY override the same keys from the file.

Copy the example config and edit:

cp vox.toml.example ~/.vox/vox.toml
# Edit hotkey and optionally device_id, model_size, etc.

Commands

  • vox or vox run — Start push-to-talk. By default a small window with a Stop button appears; with use_tray = true in config (or VOX_TRAY=1), a system tray icon is shown instead—click the icon and choose Quit to stop. Press and hold your configured hotkey, speak, release; the audio is transcribed and placed on the clipboard (and optionally pasted into the focused window).
  • vox devices — List audio input devices (ID, name, host API). Use this to choose device_id in config.
  • vox test-mic [--device ID] [--seconds N] — Record for N seconds, play back the recording, then transcribe and print text. Default 2 seconds. Use to verify mic and model before using vox.

Transcription model (faster-whisper)

  • First run: The model is downloaded automatically from Hugging Face (size from config, default base). No system FFmpeg required (PyAV is used).
  • CPU: Use compute_type = "int8" in config for lower memory and faster inference.
  • GPU: Set compute_device = "cuda" and compute_type = "float16" (or int8) in config. Requires CUDA 12 and cuDNN 9.
  • Model size: Set model_size in config (e.g. tiny, base, small, medium, large-v3) for speed vs accuracy.

OS permissions

  • Microphone: Required for capture. On Windows, allow app access to the microphone. On macOS, grant Microphone access when prompted.
  • Accessibility / input injection: Only needed if you use injection_mode = "clipboard_and_paste" (paste into focused window). On Windows, run the app with normal privileges; on macOS, grant Accessibility permission to Terminal (or the app running vox run) so it can simulate paste.

Definition of Visible Done

A human can verify the MVP by:

  1. Install: From repo run uv sync (or pip install -e .).
  2. Configure: Copy vox.toml.example to ~/.vox/vox.toml; set hotkey (e.g. ctrl+shift+v) and optionally device_id, model_size, compute_type, injection_mode.
  3. Run: Execute uv run vox or vox (or vox run). A small “Vox” window with a Stop button appears, or a tray icon if use_tray is enabled.
  4. Trigger: Focus any text field (or leave focus anywhere). Press and hold the configured hotkey, speak a short phrase, release the key.
  5. Verify: Paste from clipboard (Ctrl+V / Cmd+V) and see the transcribed phrase. If injection_mode = "clipboard_and_paste", the text also appears in the focused field.
  6. Stop: Click Stop in the Vox window (or close the window) to exit.
  7. Errors: If mic or model is missing, a clear error message appears (no silent failure).

Development

  • Quality gate: just test quality (tests, format, lint, types, docstrings, security checks).
  • Tests: just test (pytest). Unit tests under tests/unit/, integration under tests/integration/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vox_core-0.2.1.tar.gz (7.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vox_core-0.2.1-py3-none-any.whl (3.7 MB view details)

Uploaded Python 3

File details

Details for the file vox_core-0.2.1.tar.gz.

File metadata

  • Download URL: vox_core-0.2.1.tar.gz
  • Upload date:
  • Size: 7.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vox_core-0.2.1.tar.gz
Algorithm Hash digest
SHA256 16a039ba13f240b5e51137159c0e1cc24da7575bb5c291c40564ec098c540f53
MD5 1db8b03a259aaa32d90bb16628a09cd5
BLAKE2b-256 6a28a32a3393b2053d41144db456aad49977ce47f8194f887beb4076d538437b

See more details on using hashes here.

File details

Details for the file vox_core-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: vox_core-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vox_core-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 708eb4989a28905ad365ecfba685a09ef4fb40a0e2b01a0b58acbc1ef94d02d8
MD5 2957660631cf6d57828a5a7a75f97d41
BLAKE2b-256 1f1ec9f75cf4ca97671c5cc184c7c6afb53a03cb0feab336d4a5cedb6b6ce185

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page