Speak → text, locally, instantly.
Project description
voiceio
Speak → text, locally, instantly.
Quick start
# 1. Install system dependencies (Ubuntu/Debian)
sudo apt install pipx ibus gir1.2-ibus-1.0 python3-gi portaudio19-dev
# 2. Install voiceio
pipx install python-voiceio
# 3. Run the setup wizard
voiceio setup
That's it. Press Ctrl+Alt+V (or your chosen hotkey) to start dictating.
Fedora
sudo dnf install pipx ibus python3-gobject portaudio-devel
pipx install python-voiceio
voiceio setup
Arch Linux
sudo pacman -S python-pipx ibus python-gobject portaudio
pipx install python-voiceio
voiceio setup
Windows
# Option A: Install with pip (requires Python 3.11+)
pip install python-voiceio
voiceio setup
# Option B: Download the installer from GitHub Releases (no Python needed)
# https://github.com/Hugo0/voiceio/releases
# Also available as a portable .zip if you prefer no installation.
Windows uses pynput for hotkeys and text injection. No extra system dependencies required.
macOS
pipx install python-voiceio
voiceio setup
Build from source
If you want the source code locally to hack on or customize for personal use. PRs are welcome!
git clone https://github.com/Hugo0/voiceio
cd voiceio
uv pip install -e ".[linux,dev]"
# Bootstrap CLI commands onto PATH (creates ~/.local/bin/voiceio)
uv run voiceio setup
Note: Source installs live inside a virtualenv, so
voiceioisn't on PATH until setup creates symlinks in~/.local/bin/. Ifvoiceioisn't found after setup, restart your terminal or runexport PATH="$HOME/.local/bin:$PATH".
You can also install with
uv tool install python-voiceioorpip install python-voiceio.
How it works
hotkey → mic capture → whisper (local) → text at cursor
pre-buffered streaming IBus / clipboard
Press your hotkey to start recording (1s pre-buffer catches the first syllable). Text streams into the focused app as an underlined preview. Press again to commit. Transcription runs locally via faster-whisper, text is injected through IBus (any GTK/Qt app) with clipboard fallback for terminals.
Features
- Streaming: text appears as you speak, not after you stop
- Works everywhere: IBus input method for GUI apps, clipboard for terminals
- Wayland + X11: evdev hotkeys work on both, no root required
- Pre-buffer: never miss the first syllable
- Auto-healing: falls back to the next working backend if one fails
- Autostart: optional systemd service, restarts on crash
- Self-diagnosing:
voiceio doctorchecks everything,--fixrepairs it
Models
| Model | Size | Speed | Accuracy | Good for |
|---|---|---|---|---|
tiny |
75 MB | ~10x realtime | Basic | Quick notes, low-end hardware |
base |
150 MB | ~7x realtime | Good | Daily use (default) |
small |
500 MB | ~4x realtime | Better | Longer dictation |
medium |
1.5 GB | ~2x realtime | Great | Accuracy-sensitive work |
large-v3 |
3 GB | ~1x realtime | Best | Maximum quality, GPU recommended |
Models download automatically on first use. Switch anytime: voiceio --model small.
Commands
voiceio Start the daemon
voiceio setup Interactive setup wizard
voiceio doctor Health check (--fix to auto-repair)
voiceio test Test microphone + live transcription
voiceio toggle Toggle recording on a running daemon
voiceio update Update to latest version
voiceio service install Autostart on login (systemd / Windows Startup)
voiceio logs View recent logs
voiceio uninstall Remove all system integrations
Configuration
voiceio setup handles everything interactively. To tweak later, edit the config file or override at runtime:
- Linux/macOS:
~/.config/voiceio/config.toml - Windows:
%LOCALAPPDATA%\voiceio\config\config.toml
voiceio --model large-v3 --language auto -v
See config.example.toml for all options.
Troubleshooting
voiceio doctor # see what's working
voiceio doctor --fix # auto-fix issues
voiceio logs # check debug output
| Problem | Fix |
|---|---|
| No text appears | voiceio doctor --fix - usually a missing IBus component or GNOME input source |
| Hotkey doesn't work on Wayland | sudo usermod -aG input $USER then log out and back in |
| Transcription too slow | Use a smaller model: voiceio --model tiny |
| Want to start fresh | voiceio uninstall then voiceio setup |
| Windows: antivirus blocks hotkeys | pynput uses global keyboard hooks — add an exception for voiceio |
| Windows: no sound feedback | Check voiceio logs for audio device info |
| macOS issues | Experimental — consider aquavoice.com or contribute a PR |
Platform support
| Platform | Status | Text injection | Hotkeys | Streaming preview |
|---|---|---|---|---|
| Ubuntu / Debian (GNOME, Wayland) | Tested daily | IBus | evdev / GNOME shortcut | Yes |
| Ubuntu / Debian (GNOME, X11) | Supported | IBus | evdev / pynput | Yes |
| Fedora (GNOME) | Supported | IBus | evdev / GNOME shortcut | Yes |
| Arch Linux | Supported | IBus | evdev | Yes |
| KDE / Sway / Hyprland | Should work | IBus / ydotool / wtype | evdev | Yes |
| Windows 10/11 | Experimental | pynput / clipboard | pynput | Type-and-correct (no preedit) |
| macOS | Experimental | pynput / clipboard | pynput | Type-and-correct (no preedit) |
voiceio auto-detects your platform and picks the best available backends. Run voiceio doctor to see what's working on your system.
Uninstall
voiceio uninstall # removes service, IBus, shortcuts, symlinks
pipx uninstall python-voiceio # removes the package
Roadmap
Contributions welcome! See CONTRIBUTING.md and open issues.
Now
- macOS polish (IMKit for native preedit, Accessibility API for text injection)
Soon
- Per-app context awareness (detect focused app, adapt formatting/behavior)
- File/audio transcription mode (
voiceio transcribe recording.mp3)
Backlog
- Multiple engine backends (whisper.cpp for Vulkan/AMD, VOSK for low-end hardware)
- Echo cancellation (filter system audio for meeting use)
- Wake word activation ("Hey voiceio")
- Text-to-speech output (Piper/espeak-ng — completes the "io")
Done
- LLM auto-audit dictionary (
voiceio correct --auto— scan history with LLM, interactive correction) - LLM post-processing via Ollama (grammar cleanup, spelling fixes on final pass)
- Corrections dictionary — auto-replace misheard words, "correct that" voice command
- Transcription history — searchable log of everything you've dictated
- Number-to-digit conversion ("three hundred forty two" → "342")
- VAD-based silence filtering (Silero VAD, prevents Whisper hallucinations)
- Voice commands — "new line", "new paragraph", "scratch that", punctuation by name
- Custom vocabulary / personal dictionary (bias Whisper via
initial_prompt) - Smart punctuation & capitalization post-processing
- Windows support
- System tray icon with animated states
- Auto-stop on silence
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file python_voiceio-0.3.0.tar.gz.
File metadata
- Download URL: python_voiceio-0.3.0.tar.gz
- Upload date:
- Size: 2.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
024cd055827c08cbfd78a2cbdac999826b22af2e4f38b3bf38902a890da776eb
|
|
| MD5 |
88d69eba0bf26c36a1b56c0bfa1de05e
|
|
| BLAKE2b-256 |
e35bd8aa3f1d3ce30ed6bec3402cb5409d830d69be04ca5ba7f994cbd8cf665e
|
Provenance
The following attestation bundles were made for python_voiceio-0.3.0.tar.gz:
Publisher:
publish.yml on Hugo0/voiceio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
python_voiceio-0.3.0.tar.gz -
Subject digest:
024cd055827c08cbfd78a2cbdac999826b22af2e4f38b3bf38902a890da776eb - Sigstore transparency entry: 1077475347
- Sigstore integration time:
-
Permalink:
Hugo0/voiceio@8efef7dcc8e27c16de0acc48624cda15111e07e9 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/Hugo0
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8efef7dcc8e27c16de0acc48624cda15111e07e9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file python_voiceio-0.3.0-py3-none-any.whl.
File metadata
- Download URL: python_voiceio-0.3.0-py3-none-any.whl
- Upload date:
- Size: 2.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
810ffa838d1d96cae07b872e4e2118d574e903cc84cbcfe043323dd639d44ff0
|
|
| MD5 |
60967e8dfda4cdece140abf9c5e1c0fb
|
|
| BLAKE2b-256 |
11d48decdbdd502835ec61b16cb0db40e5037c8b39c2f2f2725e9670b6f4c672
|
Provenance
The following attestation bundles were made for python_voiceio-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on Hugo0/voiceio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
python_voiceio-0.3.0-py3-none-any.whl -
Subject digest:
810ffa838d1d96cae07b872e4e2118d574e903cc84cbcfe043323dd639d44ff0 - Sigstore transparency entry: 1077475382
- Sigstore integration time:
-
Permalink:
Hugo0/voiceio@8efef7dcc8e27c16de0acc48624cda15111e07e9 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/Hugo0
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8efef7dcc8e27c16de0acc48624cda15111e07e9 -
Trigger Event:
push
-
Statement type: