Speak → text, locally, instantly.
Project description
voiceio
Speak → text, locally, instantly.
Quick start
# 1. Install system dependencies (Ubuntu/Debian)
sudo apt install pipx ibus gir1.2-ibus-1.0 python3-gi portaudio19-dev
# 2. Install voiceio
pipx install python-voiceio
# 3. Run the setup wizard
voiceio setup
That's it. Press Ctrl+Alt+V (or your chosen hotkey) to start dictating.
Fedora
sudo dnf install pipx ibus python3-gobject portaudio-devel
pipx install python-voiceio
voiceio setup
Arch Linux
sudo pacman -S python-pipx ibus python-gobject portaudio
pipx install python-voiceio
voiceio setup
Build from source
If you want the source code locally to hack on or customize for personal use. PRs are welcome!
git clone https://github.com/Hugo0/voiceio
cd voiceio
pip install -e ".[linux,dev]"
voiceio setup
You can also install with
uv tool install python-voiceioorpip install python-voiceio.
How it works
hotkey → mic capture → whisper (local) → text at cursor
pre-buffered streaming IBus / clipboard
- Press your hotkey: voiceio starts recording (with a 1-second pre-buffer, so it catches the beginning even if you start speaking before pressing)
- Speak naturally: text streams into the focused app in real-time as an underlined preview
- Press the hotkey again: the final transcription replaces the preview and is committed
Transcription runs locally via faster-whisper. Text is injected through IBus (works in any GTK/Qt app: browsers, Telegram, editors) with an automatic clipboard fallback for terminals.
Features
- Streaming: text appears as you speak, not after you stop
- Works everywhere: IBus input method for GUI apps, clipboard for terminals
- Wayland + X11: evdev hotkeys work on both, no root required
- Pre-buffer: never miss the first syllable
- Auto-healing: falls back to the next working backend if one fails
- Autostart: optional systemd service, restarts on crash
- Self-diagnosing:
voiceio doctorchecks everything,--fixrepairs it
Models
| Model | Size | Speed | Accuracy | Good for |
|---|---|---|---|---|
tiny |
75 MB | ~10x realtime | Basic | Quick notes, low-end hardware |
base |
150 MB | ~7x realtime | Good | Daily use (default) |
small |
500 MB | ~4x realtime | Better | Longer dictation |
medium |
1.5 GB | ~2x realtime | Great | Accuracy-sensitive work |
large-v3 |
3 GB | ~1x realtime | Best | Maximum quality, GPU recommended |
Models download automatically on first use. Switch anytime: voiceio --model small.
Commands
voiceio Start the daemon
voiceio setup Interactive setup wizard
voiceio doctor Health check (--fix to auto-repair)
voiceio test Test microphone + live transcription
voiceio toggle Toggle recording on a running daemon
voiceio update Update to latest version
voiceio service install Autostart on login via systemd
voiceio logs View recent logs
voiceio uninstall Remove all system integrations
Configuration
voiceio setup handles everything interactively. To tweak later, edit ~/.config/voiceio/config.toml or override at runtime:
voiceio --model large-v3 --language auto -v
See config.example.toml for all options.
Troubleshooting
voiceio doctor # see what's working
voiceio doctor --fix # auto-fix issues
voiceio logs # check debug output
| Problem | Fix |
|---|---|
| No text appears | voiceio doctor --fix - usually a missing IBus component or GNOME input source |
| Hotkey doesn't work on Wayland | sudo usermod -aG input $USER then log out and back in |
| Transcription too slow | Use a smaller model: voiceio --model tiny |
| Want to start fresh | voiceio uninstall then voiceio setup |
| Doesn't work on MacOS | I haven't added proper support for apple yet. either use https://aquavoice.com/ or make a PR |
Platform support
| Platform | Status | Text injection | Hotkeys | Streaming preview |
|---|---|---|---|---|
| Ubuntu / Debian (GNOME, Wayland) | Tested daily | IBus | evdev / GNOME shortcut | Yes |
| Ubuntu / Debian (GNOME, X11) | Supported | IBus | evdev / pynput | Yes |
| Fedora (GNOME) | Supported | IBus | evdev / GNOME shortcut | Yes |
| Arch Linux | Supported | IBus | evdev | Yes |
| KDE / Sway / Hyprland | Should work | IBus / ydotool / wtype | evdev | Yes |
| macOS | Experimental | pynput / clipboard | pynput | Type-and-correct (no preedit) |
voiceio auto-detects your platform and picks the best available backends. Run voiceio doctor to see what's working on your system.
Uninstall
voiceio uninstall # removes service, IBus, shortcuts, symlinks
pipx uninstall python-voiceio # removes the package
Wishlist
Contributions welcome! See CONTRIBUTING.md. Open an issue to discuss before starting.
High impact
- macOS support: test and polish pynput hotkey + typer backends
- Silence filtering: VAD-based trimming to prevent Whisper hallucinations on silence
- distil-whisper models: better speed/accuracy tradeoffs
- IBus on non-GNOME desktops: KDE, Sway, Hyprland activation (currently GNOME-only via gsettings)
- Text-to-speech (voice output): select text, press a hotkey, hear it spoken aloud. Completes the "io" in voiceio. Use a local TTS engine (Piper, Coqui, espeak-ng), same philosophy: no cloud, no API keys
- Wake word: "Hey voiceio" hands-free activation (no hotkey needed). Use a small always-on keyword model (e.g. openWakeWord, Porcupine)
- Custom vocabulary / hot words: user-defined word list for names, jargon, technical terms that Whisper gets wrong. Boost via
initial_promptor fine-tuned logit bias - Per-app profiles: different language/model/output settings per application (e.g. formal writing in docs, casual in chat)
- Voice commands: "select all", "new line", "undo that", "delete last sentence". Parse transcribed text for command patterns before injecting
- Punctuation & formatting commands: "period", "comma", "new paragraph", "capitalize that"
- Auto-punctuation model: post-process Whisper output with a small punctuation/capitalization model for cleaner text
Platform expansion
- macOS Input Method (IMKit): native streaming preedit on macOS, matching IBus quality on Linux
- Windows support: Text Services Framework (TSF) for text injection, global hotkeys via win32api
- Flatpak / Snap packaging: sandboxed distribution for Linux
- AUR package: community package for Arch Linux
UX polish
- System tray icon with recording animation: pulsing/colored icon showing recording state, quick menu for model/language switching
- Desktop notifications with transcribed text: show what was typed, with an undo button
- Confidence indicator: visual hint when Whisper is uncertain (maybe highlight low-confidence words)
- Recording timeout: auto-stop after N seconds of silence or max duration, preventing forgotten recordings
- Sound themes: bundled sound packs (subtle, mechanical, sci-fi, none)
- First-run onboarding overlay: lightweight "press Ctrl+Alt+V to start" hint on first launch
Power features
- Multi-language in one session: auto-detect language switches mid-dictation (Whisper supports this but needs tuning)
- Speaker diarization: "Person 1: ... Person 2: ..." for meeting notes (via pyannote or whisperX)
- LLM post-processing: pipe transcription through a local LLM (Ollama) for grammar correction, summarization, or reformatting
- Clipboard history: keep last N transcriptions, quick-paste from history
- Transcription log / journal: searchable history of everything you've dictated, with timestamps
- API / webhook: expose a local API so other tools can trigger recording or receive transcriptions
- Browser extension: inject text into web apps that don't work with IBus (e.g. some Electron apps)
Developer experience
- Plugin system: hooks for pre/post processing (e.g. custom formatters, translators, text transforms)
- Alternative STT backends: support Whisper.cpp, Deepgram, AssemblyAI, OpenAI Whisper API as optional backends
- GPU acceleration docs: CUDA/ROCm setup guide for faster transcription on large models
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file python_voiceio-0.2.4.tar.gz.
File metadata
- Download URL: python_voiceio-0.2.4.tar.gz
- Upload date:
- Size: 74.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e7ba82b3301379f8bfbb52fe5c834b6cc2ccf5f1177cad7ae77a6b6d872b464
|
|
| MD5 |
4b642d60b1576c395fbe82c1d9e6fdf8
|
|
| BLAKE2b-256 |
fbe32c46274c57b461a48d800d9e1f72f7b6d2df5ed5844883599c0f830a4fa9
|
Provenance
The following attestation bundles were made for python_voiceio-0.2.4.tar.gz:
Publisher:
publish.yml on Hugo0/voiceio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
python_voiceio-0.2.4.tar.gz -
Subject digest:
0e7ba82b3301379f8bfbb52fe5c834b6cc2ccf5f1177cad7ae77a6b6d872b464 - Sigstore transparency entry: 1067842244
- Sigstore integration time:
-
Permalink:
Hugo0/voiceio@385cad77066add8cd76585b308d899e99f6f64ca -
Branch / Tag:
refs/tags/v0.2.3 - Owner: https://github.com/Hugo0
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@385cad77066add8cd76585b308d899e99f6f64ca -
Trigger Event:
release
-
Statement type:
File details
Details for the file python_voiceio-0.2.4-py3-none-any.whl.
File metadata
- Download URL: python_voiceio-0.2.4-py3-none-any.whl
- Upload date:
- Size: 74.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b10bf7921a59b0a52e99eeea7e5ebdcdb1212b5415e6679892a2ad64834b3f05
|
|
| MD5 |
0dc9273ed682e3429a7621fb512ddf7b
|
|
| BLAKE2b-256 |
4795da91402d5ac4d6aaaf135fb75da16f60ccfe66bb749fda3465d992e099b8
|
Provenance
The following attestation bundles were made for python_voiceio-0.2.4-py3-none-any.whl:
Publisher:
publish.yml on Hugo0/voiceio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
python_voiceio-0.2.4-py3-none-any.whl -
Subject digest:
b10bf7921a59b0a52e99eeea7e5ebdcdb1212b5415e6679892a2ad64834b3f05 - Sigstore transparency entry: 1067842281
- Sigstore integration time:
-
Permalink:
Hugo0/voiceio@385cad77066add8cd76585b308d899e99f6f64ca -
Branch / Tag:
refs/tags/v0.2.3 - Owner: https://github.com/Hugo0
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@385cad77066add8cd76585b308d899e99f6f64ca -
Trigger Event:
release
-
Statement type: