Push-to-talk voice-to-text for Linux. Press a hotkey, speak, press again - text appears at your cursor.

These details have not been verified by PyPI

Project links

Project description

voiceio

Push-to-talk voice-to-text for Linux and macOS, on any app. Press a hotkey, speak, press again - text appears at your cursor.

100% local and open source. No API keys, no cloud, no telemetry. Use and modify at your will.

Click to watch the demo

Quick start

# 1. Install system dependencies (Ubuntu/Debian)
sudo apt install pipx ibus gir1.2-ibus-1.0 python3-gi portaudio19-dev

# 2. Install voiceio
pipx install python-voiceio

# 3. Run the setup wizard
voiceio setup

That's it. Press Ctrl+Alt+V (or your chosen hotkey) to start dictating.

Fedora

sudo dnf install pipx ibus python3-gobject portaudio-devel
pipx install python-voiceio
voiceio setup

Arch Linux

sudo pacman -S python-pipx ibus python-gobject portaudio
pipx install python-voiceio
voiceio setup

Build from source

If you want the source code locally to hack on or customize for personal use. PRs are welcome!

git clone https://github.com/Hugo0/voiceio
cd voiceio
pip install -e ".[linux,dev]"
voiceio setup

You can also install with uv tool install voiceio or pip install voiceio.

How it works

hotkey → mic capture → whisper (local) → text at cursor
          pre-buffered   streaming        IBus / clipboard

Press your hotkey: voiceio starts recording (with a 1-second pre-buffer, so it catches the beginning even if you start speaking before pressing)
Speak naturally: text streams into the focused app in real-time as an underlined preview
Press the hotkey again: the final transcription replaces the preview and is committed

Transcription runs locally via faster-whisper. Text is injected through IBus (works in any GTK/Qt app: browsers, Telegram, editors) with an automatic clipboard fallback for terminals.

Features

Streaming: text appears as you speak, not after you stop
Works everywhere: IBus input method for GUI apps, clipboard for terminals
Wayland + X11: evdev hotkeys work on both, no root required
Pre-buffer: never miss the first syllable
Auto-healing: falls back to the next working backend if one fails
Autostart: optional systemd service, restarts on crash
Self-diagnosing: voiceio doctor checks everything, --fix repairs it

Models

Model	Size	Speed	Accuracy	Good for
`tiny`	75 MB	~10x realtime	Basic	Quick notes, low-end hardware
`base`	150 MB	~7x realtime	Good	Daily use (default)
`small`	500 MB	~4x realtime	Better	Longer dictation
`medium`	1.5 GB	~2x realtime	Great	Accuracy-sensitive work
`large-v3`	3 GB	~1x realtime	Best	Maximum quality, GPU recommended

Models download automatically on first use. Switch anytime: voiceio --model small.

Commands

voiceio                  Start the daemon
voiceio setup            Interactive setup wizard
voiceio doctor           Health check (--fix to auto-repair)
voiceio test             Test microphone + live transcription
voiceio toggle           Toggle recording on a running daemon
voiceio service install  Autostart on login via systemd
voiceio logs             View recent logs
voiceio uninstall        Remove all system integrations

Configuration

voiceio setup handles everything interactively. To tweak later, edit ~/.config/voiceio/config.toml or override at runtime:

voiceio --model large-v3 --language auto -v

See config.example.toml for all options.

Troubleshooting

voiceio doctor           # see what's working
voiceio doctor --fix     # auto-fix issues
voiceio logs             # check debug output

Problem	Fix
No text appears	`voiceio doctor --fix` - usually a missing IBus component or GNOME input source
Hotkey doesn't work on Wayland	`sudo usermod -aG input $USER` then log out and back in
Transcription too slow	Use a smaller model: `voiceio --model tiny`
Want to start fresh	`voiceio uninstall` then `voiceio setup`
Doesn't work on MacOS	I haven't added proper support for apple yet. either use https://aquavoice.com/ or make a PR

Platform support

Platform	Status	Text injection	Hotkeys	Streaming preview
Ubuntu / Debian (GNOME, Wayland)	Tested daily	IBus	evdev / GNOME shortcut	Yes
Ubuntu / Debian (GNOME, X11)	Supported	IBus	evdev / pynput	Yes
Fedora (GNOME)	Supported	IBus	evdev / GNOME shortcut	Yes
Arch Linux	Supported	IBus	evdev	Yes
KDE / Sway / Hyprland	Should work	IBus / ydotool / wtype	evdev	Yes
macOS	Experimental	pynput / clipboard	pynput	Type-and-correct (no preedit)

voiceio auto-detects your platform and picks the best available backends. Run voiceio doctor to see what's working on your system.

Uninstall

voiceio uninstall        # removes service, IBus, shortcuts, symlinks
pipx uninstall python-voiceio   # removes the package

TODO

Launch

Publish to PyPI
Record demo video + thumbnail
Test clean install on a fresh VM/container
GitHub repo: description, topics, social preview image
Bump version to 0.2.0

Code quality

IBus activation on non-GNOME desktops (KDE, Sway, Hyprland), currently GNOME-only via gsettings
voiceio doctor --json for machine-readable output
Shell completions (voiceio completion bash/zsh/fish)
Refactor wizard.py (882 lines) into smaller, testable modules
Socket protocol versioning (e.g. v1:preedit:text)
Configurable log file path

Wishlist

Contributions welcome! Open an issue to discuss before starting.

High impact

Text-to-speech (voice output): select text, press a hotkey, hear it spoken aloud. Completes the "io" in voiceio. Use a local TTS engine (Piper, Coqui, espeak-ng), same philosophy: no cloud, no API keys
Wake word: "Hey voiceio" hands-free activation (no hotkey needed). Use a small always-on keyword model (e.g. openWakeWord, Porcupine)
Custom vocabulary / hot words: user-defined word list for names, jargon, technical terms that Whisper gets wrong. Boost via initial_prompt or fine-tuned logit bias
Per-app profiles: different language/model/output settings per application (e.g. formal writing in docs, casual in chat)
Voice commands: "select all", "new line", "undo that", "delete last sentence". Parse transcribed text for command patterns before injecting
Punctuation & formatting commands: "period", "comma", "new paragraph", "capitalize that"
Auto-punctuation model: post-process Whisper output with a small punctuation/capitalization model for cleaner text

Platform expansion

macOS Input Method (IMKit): native streaming preedit on macOS, matching IBus quality on Linux
Windows support: Text Services Framework (TSF) for text injection, global hotkeys via win32api
Flatpak / Snap packaging: sandboxed distribution for Linux
AUR package: community package for Arch Linux

UX polish

System tray icon with recording animation: pulsing/colored icon showing recording state, quick menu for model/language switching
Desktop notifications with transcribed text: show what was typed, with an undo button
Confidence indicator: visual hint when Whisper is uncertain (maybe highlight low-confidence words)
Recording timeout: auto-stop after N seconds of silence or max duration, preventing forgotten recordings
Sound themes: bundled sound packs (subtle, mechanical, sci-fi, none)
First-run onboarding overlay: lightweight "press Ctrl+Alt+V to start" hint on first launch

Power features

Multi-language in one session: auto-detect language switches mid-dictation (Whisper supports this but needs tuning)
Speaker diarization: "Person 1: ... Person 2: ..." for meeting notes (via pyannote or whisperX)
LLM post-processing: pipe transcription through a local LLM (Ollama) for grammar correction, summarization, or reformatting
Clipboard history: keep last N transcriptions, quick-paste from history
Transcription log / journal: searchable history of everything you've dictated, with timestamps
API / webhook: expose a local API so other tools can trigger recording or receive transcriptions
Browser extension: inject text into web apps that don't work with IBus (e.g. some Electron apps)

Developer experience

Plugin system: hooks for pre/post processing (e.g. custom formatters, translators, text transforms)
Alternative STT backends: support Whisper.cpp, Deepgram, AssemblyAI, OpenAI Whisper API as optional backends
GPU acceleration docs: CUDA/ROCm setup guide for faster transcription on large models

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.12

Apr 24, 2026

0.3.11

Apr 13, 2026

0.3.10

Apr 10, 2026

0.3.9

Apr 9, 2026

0.3.8

Apr 9, 2026

0.3.7

Apr 8, 2026

0.3.6

Apr 8, 2026

0.3.5

Mar 15, 2026

0.3.4

Mar 14, 2026

0.3.3

Mar 14, 2026

0.3.2

Mar 12, 2026

0.3.1

Mar 11, 2026

0.3.0

Mar 11, 2026

0.2.4

Mar 9, 2026

0.2.3

Mar 9, 2026

This version

0.2.1

Mar 9, 2026

0.2.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_voiceio-0.2.1.tar.gz (101.9 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

python_voiceio-0.2.1-py3-none-any.whl (101.6 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file python_voiceio-0.2.1.tar.gz.

File metadata

Download URL: python_voiceio-0.2.1.tar.gz
Upload date: Mar 9, 2026
Size: 101.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for python_voiceio-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`36abf9f9983bfafcffcc240dc35d86ba1b1e981f6d93fa5a6d3ec33410808fe8`
MD5	`84a5fdddf53b1dd45d9c97589b2e1317`
BLAKE2b-256	`2e64f87088acacbb30957d3a906fefdb56b68beca42ac6438a02da234136538a`

See more details on using hashes here.

File details

Details for the file python_voiceio-0.2.1-py3-none-any.whl.

File metadata

Download URL: python_voiceio-0.2.1-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 101.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for python_voiceio-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`78d54c9602d60bc0f4de3b86edd929fd009a1fb38a8d786baf577f9bd3644c62`
MD5	`8a3a26e510cc3f086f7856a0d62c05fc`
BLAKE2b-256	`285f14777ea3f175d25bb89a757086b793c9cfd4fa8427b8ba5586d7ba746c17`

See more details on using hashes here.

python-voiceio 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

voiceio

Quick start

How it works

Features

Models

Commands

Configuration

Troubleshooting

Platform support

Uninstall

TODO

Wishlist

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes