Voice input layer: captures spoken intent, transcribes locally, injects into active workflow or agent pipeline.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jeffrichley

These details have not been verified by PyPI

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
- End Users/Desktop
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Multimedia :: Sound/Audio :: Speech

Project description

Vox

Vox is the voice input layer for the system. It captures speech via push-to-talk, transcribes it locally with faster-whisper, and injects the text into the clipboard (and optionally into the focused window). No cloud calls; no silent failures.

Install

From PyPI (no clone required; package name is vox-core because vox is taken on PyPI):

uvx vox-core

Or install the tool, then run it (the CLI command is still vox):

pip install vox-core
vox

From source (development or latest):

git clone https://github.com/jeffrichley/vox.git && cd vox
uv sync
uv run vox

Or install in editable mode: pip install -e . (use a venv), then vox.

Pre-built binaries (GitHub Releases)

Packaged binaries for Windows, macOS, and Linux are built on each GitHub Release. Download the archive for your platform (e.g. vox-<version>-windows-amd64.zip, vox-<version>-macos-arm64.zip, vox-<version>-linux-x86_64.tar.gz), unpack it, then run the binary:

Windows: Unzip the archive, then run vox.exe from inside the vox folder (e.g. .\vox\vox.exe --help).
macOS / Linux: Unzip or untar the archive, then run ./vox/vox from the extracted directory (e.g. ./vox/vox --help).

On first run, the Whisper model is downloaded automatically (see Transcription model). Binaries are built with PyInstaller and are not signed; you may see a security or Gatekeeper prompt on Windows or macOS.

Configuration

Config file: ~/.vox/vox.toml. Create the directory if needed: mkdir -p ~/.vox.
Override path: set VOX_CONFIG to the full path of your config file.
Env overrides: VOX_HOTKEY, VOX_DEVICE_ID, VOX_MODEL_SIZE, VOX_COMPUTE_TYPE, VOX_COMPUTE_DEVICE, VOX_INJECTION_MODE, VOX_CUE_VOLUME, VOX_TRAY override the same keys from the file.
Settings screen: run vox settings to edit the supported config keys in a desktop window instead of editing TOML manually. Valid completed changes autosave immediately; there is no Save or Cancel flow.

Copy the example config and edit:

cp vox.toml.example ~/.vox/vox.toml
# Edit hotkey and optionally device_id, cue_volume, model_size, etc.

Commands

vox or vox run — Start push-to-talk. By default a small window with a Stop button appears; with use_tray = true in config (or VOX_TRAY=1), a system tray icon is shown instead—click the icon and choose Quit to stop. Audible start/end recording cues are preloaded during startup so the first hotkey cycle does not stall on cue decode. Press and hold your configured hotkey to hear the start cue and begin recording, then release to hear the end cue while transcription/injection continues according to injection_mode: clipboard only, clipboard then paste, or direct typing into the focused window.
vox settings — Open the standalone settings window. It exposes Recording, Transcription, Output, and Runtime sections, autosaves each valid completed change, warns when env vars currently override file-backed values, and shows restart guidance for changes that do not affect an already-running session until restart.
Cue volume: Set cue_volume in config (or VOX_CUE_VOLUME) to any value from 0.0 to 1.0. Default is 0.5. In the settings screen, changing the cue-volume slider autosaves after a short debounce and then plays a cue preview automatically at the new level.
vox devices — List audio input devices (ID, name, host API). Use this to choose device_id in config.
vox test-mic [--device ID] [--seconds N] — Record for N seconds, play back the recording, then transcribe and print text. Default 2 seconds. Use to verify mic and model before using vox.

Settings Screen

Autosave rules: dropdowns and toggles save immediately after a valid selection; text-like fields such as hotkey save on Enter or focus loss; cue_volume saves after a short debounce so dragging the slider does not write on every movement.
Runtime access: vox settings opens the window directly; if Vox is already running, the Stop window and tray both expose a Settings / Settings... action that launches the same screen as a separate process.
Override warnings: when a VOX_* environment variable currently supersedes a file-backed value, the settings window shows that warning so the on-disk value is not mistaken for the effective runtime value.
Restart/apply guidance: changing hotkey, device selection, transcription settings, injection mode, or tray usage updates the config immediately, but an already-running Vox session applies those changes after restart.

Transcription model (faster-whisper)

First run: The model is downloaded automatically from Hugging Face (size from config, default base). No system FFmpeg required (PyAV is used).
CPU: Use compute_type = "int8" in config for lower memory and faster inference.
GPU: Set compute_device = "cuda" and compute_type = "float16" (or int8) in config. Requires CUDA 12 and cuDNN 9.
Model size: Set model_size in config (e.g. tiny, base, small, medium, large-v3) for speed vs accuracy.

OS permissions

Microphone: Required for capture. On Windows, allow app access to the microphone. On macOS, grant Microphone access when prompted.
Accessibility / input injection: Needed if you use injection_mode = "clipboard_and_paste" or injection_mode = "type" (paste or type into the focused window). On Windows, run the app with normal privileges; on macOS, grant Accessibility permission to Terminal (or the app running vox run) so it can simulate input.

Definition of Visible Done

A human can verify the shipped feature set by:

Install: From repo run uv sync (or pip install -e .).
Open settings: Run uv run vox settings.
Autosave: Change injection_mode and confirm ~/.vox/vox.toml (or VOX_CONFIG) updates without any Save button.
Cue preview: Drag cue_volume, pause briefly, and confirm the file updates once plus the cue preview plays automatically at the new level.
Validation: Edit hotkey to an invalid value and confirm the window shows an error and the invalid value is not persisted.
Round-trip: Close and reopen vox settings and confirm the saved values reload from disk.
Runtime access: Run uv run vox, then launch settings from the Stop window or tray affordance and confirm the settings window opens without shutting down the active session.
Voice workflow: Press and hold the configured hotkey, speak, release, and confirm transcription/injection still works as configured.
Errors: If mic, model, or launch prerequisites are missing, Vox surfaces a clear error message instead of failing silently.

Development

Quality gate: just quality && just test (tests, format, lint, types, docstrings, security checks).
Tests: just test (pytest). Unit tests under tests/unit/, integration under tests/integration/.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jeffrichley

These details have not been verified by PyPI

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
- End Users/Desktop
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Multimedia :: Sound/Audio :: Speech

Release history Release notifications | RSS feed

This version

0.2.4

Mar 24, 2026

0.2.3

Mar 18, 2026

0.2.2

Mar 18, 2026

0.2.1

Mar 18, 2026

0.2.0

Mar 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vox_core-0.2.4.tar.gz (10.0 MB view details)

Uploaded Mar 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vox_core-0.2.4-py3-none-any.whl (3.8 MB view details)

Uploaded Mar 24, 2026 Python 3

File details

Details for the file vox_core-0.2.4.tar.gz.

File metadata

Download URL: vox_core-0.2.4.tar.gz
Upload date: Mar 24, 2026
Size: 10.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vox_core-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`bc37b6d0b194fcaf5ea7c44d32634876437065e89c5bbaa4b1f59a81fef0c786`
MD5	`129c269169f0b77cae63aa23a65819cb`
BLAKE2b-256	`b3107f6ab37c2726100d1b96504f86f3dd8a58903c501fa866ea84bbc58a2dae`

See more details on using hashes here.

File details

Details for the file vox_core-0.2.4-py3-none-any.whl.

File metadata

Download URL: vox_core-0.2.4-py3-none-any.whl
Upload date: Mar 24, 2026
Size: 3.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.0 {"installer":{"name":"uv","version":"0.11.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for vox_core-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0ab2cd25000f9a0c4db4b6258584ad30cd397b5ac4640699d60bfe6aee1b15c5`
MD5	`ff1d1c09330d82fc6556d0e0cb949f1d`
BLAKE2b-256	`9933e7daa62f97eee0575f6c6343de7b47a1eb10217a7f74fd08a4e9d107670a`

See more details on using hashes here.

vox-core 0.2.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Vox

Install

Pre-built binaries (GitHub Releases)

Configuration

Commands

Settings Screen

Transcription model (faster-whisper)

OS permissions

Definition of Visible Done

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes