A seamless voice dictation system for Linux

These details have not been verified by PyPI

Project links

Project description

Vocalinux

Voice-to-text for Linux, finally done right!

Vocalinux Users

Linux has always punched above its weight, except when it comes to voice typing. Vocalinux fixes that.

It's a free, GPLv3-licensed desktop app that lets you dictate text into any application, on X11 or Wayland, using fully offline speech recognition. Pick from three engines (whisper.cpp, OpenAI Whisper, or VOSK), get automatic GPU acceleration via Vulkan, and control it all with customizable keyboard shortcuts: toggle or push-to-talk.

No internet required. No data leaves your machine. Just speak and type.

📚 What's New in v0.13.0-beta

🎉 Release: Guided whisper.cpp model selection, plus Wayland text-injection reliability, hotplug keyboard support, dictation spacing fixes, and refreshed website docs.

🚀 Highlights

Feature	Description
🎙️ Guided Whisper Models	Pick a whisper.cpp size and specialization (English-only, quantized, Turbo) through split dropdowns with in-app guidance
🔌 Hotplug Keyboard Support	Shortcuts keep working on keyboards connected after startup, with automatic recovery from disconnects
✍️ Dictation Spacing	Spacing is preserved between speech segments separated by a pause in the same session
🖥️ Wayland Reliability	Fixes silent text drops on wlroots/COSMIC compositors and garbled output on non-US keyboard layouts

✨ New Features

Guided whisper.cpp model variants — The Settings dialog now splits whisper.cpp selection into Model Size and Specialization, exposing English-only, quantized (Q5/Q8), Large v3 Turbo, and legacy large models with language-aware recommendations and hover guidance. Exact model IDs (e.g. medium.en-q5_0, large-v3-turbo) can also be passed to --model (#465)

🐛 Bug Fixes

Dictation: Preserve spacing between speech segments separated by a pause, so words no longer run together after a silence within the same session (#464)
Shortcuts: The evdev keyboard backend rescans for hotplugged keyboards, so shortcuts work on devices connected after startup and disconnected devices can be replugged (#467)
KDE Plasma Wayland: Detect KDE Plasma Wayland sessions and guide you to enable IBus Wayland (System Settings → Keyboard → Virtual Keyboard) when wtype injection fails, with matching hints during install and in logs (#466)
Wayland: Fix garbled output on non-US keyboard layouts (AZERTY/QWERTZ/Dvorak) and a clipboard-copy hang; ydotool now pastes through the clipboard for layout-independent injection (#480)
Wayland/IBus: Use wtype/ydotool instead of IBus on compositors that don't bridge IBus to native apps (COSMIC, Sway, Hyprland, and similar), fixing silent text drops (#486)
Wayland/IBus: Require a real IM engine before using IBus on Wayland, so a bare xkb layout no longer causes silent text drops on GNOME/Mutter and other compositors (#478)
Wayland: Preserve the keyboard layout on Wayland by not running setxkbmap, which was flipping XWayland/Electron apps to us after dictation (#474)
UI: Cap the settings dialog height on high-resolution displays (#465)

🔧 Improvements

Performance — Faster ydotool text injection via an explicit --key-delay (#488)
Website — New documentation pages for Remote API, Silero VAD, advanced whisper.cpp settings, and desktop reliability, plus responsive layout polish (#470)
CI — Automatic pull-request labeling by changed files (#473)

✨ Features

🎤 Toggle or Push-to-Talk activation modes
⚡ Real-time transcription with minimal latency
🌎 Universal compatibility across all Linux applications
🔒 100% Offline operation for privacy and reliability
🤖 whisper.cpp by default - High-performance C++ speech recognition
🎮 Universal GPU support - Vulkan acceleration for AMD, Intel, and NVIDIA
🎨 System tray integration with visual status indicators
🚀 Start on login support via XDG autostart (desktop-session startup)
🔊 Pleasant audio feedback - smooth gliding tones, headphone-friendly
⚙️ Graphical settings dialog for easy configuration
📦 3 engine choices - whisper.cpp (default), OpenAI Whisper, or VOSK

📸 Screenshots

Here are some screenshots showcasing Vocalinux in action:

Real-time voice-to-text transcription	System tray with listening indicator
About view with version info	Log viewer for debugging
Overview of key features and configuration options with annotations

🚀 Quick Install

Interactive Install (Recommended)

Our new interactive installer guides you through setup with intelligent hardware detection:

curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/main/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh

Choose your engine:

whisper.cpp ⭐ (Recommended) - Fast, works with any GPU via Vulkan
Whisper (OpenAI) - PyTorch-based, NVIDIA GPU only
VOSK - Lightweight, works on older systems

The installer will:

Auto-detect your hardware (GPU, RAM, Vulkan support)
Recommend the best engine for your system
Download the appropriate model (~74MB for the default whisper.cpp tiny model)
Install neural VAD support when ONNX Runtime is available
Install in ~1-2 minutes (vs 5-10 min with old Whisper)

Note: Always installs the latest release. For a specific version, check GitHub Releases.

Installation Options

Default (whisper.cpp - recommended):

curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/main/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh

Fastest installation (~1-2 min), universal GPU support via Vulkan.

Whisper (OpenAI) - if you prefer PyTorch:

curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/main/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh --engine=whisper

NVIDIA GPU only (~5-10 min, downloads PyTorch + CUDA).

VOSK only - for low-RAM systems:

curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/main/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh --engine=vosk

Lightweight option (~40MB), works on systems with 4GB RAM.

Alternative: Install from Source

# Clone the repository
git clone https://github.com/jatinkrmalik/vocalinux.git
cd vocalinux

# Run the installer (will prompt for Whisper)
./install.sh

# Or with Whisper support
./install.sh --with-whisper

The installer handles everything: system dependencies, Python environment, speech models, and desktop integration.

🌙 Nightly Releases (Bleeding Edge)

For developers and early adopters who want to test the latest features, check out our GitHub Releases page which includes both beta and nightly builds.

⚠️ Warning: Nightly releases contain the absolute latest code and may be unstable. For production use, we recommend using the latest beta release.

Nightly builds are automatically generated from the main branch every day. They include all merged changes but haven't undergone the same testing as beta releases.

Release Channels:

Beta (Recommended) - Tested pre-releases with known features
Nightly - Untested bleeding edge with latest commits

After Installation

# If ~/.local/bin is in your PATH (recommended):
vocalinux

# Or activate the virtual environment first:
source ~/.local/bin/activate-vocalinux.sh
vocalinux

# Or run directly:
~/.local/share/vocalinux/venv/bin/vocalinux

Or launch it from your application menu!

📋 Requirements

OS: Linux (tested on Ubuntu 22.04+, Debian 11+, Fedora 39+, Arch Linux, openSUSE Tumbleweed)
Python: 3.9 or newer
Display: X11 or Wayland
Hardware: Microphone for voice input

Note: See Distribution Compatibility for distribution-specific information and experimental support for Gentoo, Alpine, Void, Solus, and more.

🎙️ Usage

Voice Dictation

Toggle mode: Double-tap the shortcut key (default Ctrl) to start recording
Speak clearly into your microphone
Toggle mode: Double-tap again (or pause speaking) to stop, or Push-to-Talk mode: release the key to stop

Voice Commands

Command	Action
"new line"	Inserts a line break
"period" / "full stop"	Types a period (.)
"comma"	Types a comma (,)
"question mark"	Types a question mark (?)
"exclamation mark"	Types an exclamation mark (!)
"delete that"	Deletes the last sentence
"capitalize"	Capitalizes the next word

Command Line Options

vocalinux --help                  # Show all options
vocalinux --debug                 # Enable debug logging
vocalinux --engine whisper_cpp    # Use whisper.cpp engine (default)
vocalinux --engine whisper        # Use OpenAI Whisper engine
vocalinux --engine vosk           # Use VOSK engine
vocalinux --model medium          # Use medium-sized model
vocalinux --model medium.en-q5_0  # Use exact whisper.cpp model variant
vocalinux --model large-v3-turbo  # Use large-v3 Turbo with whisper.cpp
vocalinux --wayland               # Force Wayland mode
vocalinux --start-minimized       # Start without first-run modal prompts

Autostart on Login

Vocalinux uses the Linux desktop standard for autostart:

Mechanism: XDG autostart desktop entry (vocalinux.desktop)
Path: $XDG_CONFIG_HOME/autostart/ or ~/.config/autostart/ (fallback)
Launch mode: Starts as a regular user desktop app in your graphical session
Not used: No systemd unit/service is created by Vocalinux for autostart

How to enable/disable:

First-run welcome dialog
Tray menu: Start on Login
Settings dialog: Start on Login

Compatibility notes:

Works on mainstream desktop environments (GNOME, KDE, Xfce, Cinnamon, MATE, LXQt)
On minimal/custom window-manager sessions, an autostart handler may be required (for example DE-specific startup hooks or tools like dex)

⚙️ Configuration

Configuration is stored in ~/.config/vocalinux/config.json:

{
  "speech_recognition": {
    "engine": "whisper_cpp",
    "model_size": "tiny",
    "vad_sensitivity": 3,
    "silence_timeout": 2.0
  }
}

For whisper.cpp, model_size may be a size such as tiny or an exact ggml model ID such as medium.en-q5_0 or large-v3-turbo. You can also configure this through the graphical Settings dialog, where whisper.cpp models are split into Model Size and Specialization controls.

Neural Voice Activity Detection

Vocalinux ships with a Silero VAD model and uses it automatically when onnxruntime is available. The official installer attempts to install this support automatically. Without it, recording falls back to the simpler amplitude-threshold VAD.

For manual or PyPI installs, enable neural VAD with:

pip install "vocalinux[vad]"

Restart Vocalinux after install. The Recognition tab in Settings shows which backend is active. The same vad_sensitivity (1-5) works for both -- it's mapped to a Silero probability threshold internally (1 = 0.8, 5 = 0.3).

🔧 Development Setup

# Clone and install in dev mode
git clone https://github.com/jatinkrmalik/vocalinux.git
cd vocalinux
./install.sh --dev

# Activate environment
source venv/bin/activate

# Run tests
pytest

# Run from source with debug
python -m vocalinux.main --debug

📁 Project Structure

vocalinux/
├── src/vocalinux/                 # Main application code
│   ├── speech_recognition/        # Speech recognition engines (VOSK, Whisper, whisper.cpp)
│   │   └── recognition_manager.py # Unified engine interface
│   ├── text_injection/            # Text injection (X11/Wayland)
│   ├── ui/                        # GTK UI components
│   └── utils/                     # Utility functions
│       ├── whispercpp_model_info.py   # whisper.cpp model metadata & hardware detection
│       └── vosk_model_info.py         # VOSK model metadata
├── tests/                         # Test suite
├── scripts/                       # Development utilities
│   └── generate_sounds.py         # Sound generation script
├── resources/                     # Icons and sounds
├── docs/                          # Documentation
└── web/                           # Website source

📖 Documentation

Installation Guide - Detailed installation instructions
Update Guide - How to update Vocalinux
User Guide - Complete user documentation
Distribution Compatibility - Distro/session behavior and caveats
Contributing - Development setup and contribution guidelines

🔊 Sound Customization

Vocalinux uses smooth, pleasant gliding tones for audio feedback:

Start: Ascending F4→A4 (0.6s) - positive, uplifting
Stop: Descending A4→F4 (0.6s) - resolves completion
Error: Lower descending E4→C4 (0.7s) - gentle but noticeable

All sounds use pure sine waves with smoothstep interpolation for buttery smooth pitch transitions - perfect for headphone use!

Regenerate Sounds

To modify or regenerate the notification sounds:

python scripts/generate_sounds.py

This script generates all three sounds using the same smooth glide algorithm. You can edit the frequencies, durations, and amplitudes in the script to customize the sounds to your preference.

🗺️ Roadmap

~~Custom icon design~~ ✅
~~Graphical settings dialog~~ ✅
~~Whisper AI support~~ ✅
~~Multi-language support (FR, DE, RU)~~ ✅
~~whisper.cpp integration (default engine)~~ ✅
~~Vulkan GPU support~~ ✅
In-app update mechanism
Application-specific commands
Debian/Ubuntu package (.deb)
~~Wayland support via IBus~~ ✅
Voice command customization

🌐 The Voca Ecosystem

Vocalinux is part of a family of privacy-first, offline voice dictation tools. Same mission, every operating system.

Platform	Project	Website	GitHub	Status
🐧 Linux	VocaLinux	vocalinux.com	jatinkrmalik/vocalinux	✅ Beta v0.13.0
🍎 macOS	VocaMac	vocamac.com	jatinkrmalik/vocamac	🚀 Beta
🪟 Windows	VocaWin	vocawin.com	jatinkrmalik/vocawin	📋 Planned

Each platform uses native technologies for the best possible integration, while sharing the same privacy-first philosophy and offline-only architecture.

🤝 Contributing

We welcome contributions! Whether it's bug reports, feature requests, or code contributions, please check out our Contributing Guide.

Contributors

Thanks to everyone who has contributed to Vocalinux! 🙌

Quick Links

⭐ Support

If you find Vocalinux useful, please consider:

⭐ Starring this repository
🐛 Reporting bugs you encounter
📖 Improving documentation
🔀 Contributing code

📜 License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Star Chart

Made with ❤️ for the Linux community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.13.0b0 pre-release

Jul 2, 2026

0.12.0b0 pre-release

Jun 7, 2026

0.11.0b0 pre-release

May 30, 2026

0.10.2b0 pre-release

Apr 8, 2026

0.10.1b0 pre-release

Mar 30, 2026

0.10.0b0 pre-release

Mar 26, 2026

0.9.0b0 pre-release

Mar 14, 2026

0.8.0b0 pre-release

Mar 1, 2026

0.7.0b0 pre-release

Feb 23, 2026

0.6.3b0 pre-release

Feb 19, 2026

0.6.2b0 pre-release

Feb 18, 2026

0.6.1b0 pre-release

Feb 12, 2026

0.6.0b0 pre-release

Feb 12, 2026

0.5.0b0 pre-release

Feb 6, 2026

0.4.1a0 pre-release

Jan 29, 2026

0.4.0a0 pre-release

Jan 29, 2026

0.3.0a0 pre-release

Jan 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocalinux-0.13.0b0.tar.gz (2.8 MB view details)

Uploaded Jul 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vocalinux-0.13.0b0-py3-none-any.whl (2.4 MB view details)

Uploaded Jul 2, 2026 Python 3

File details

Details for the file vocalinux-0.13.0b0.tar.gz.

File metadata

Download URL: vocalinux-0.13.0b0.tar.gz
Upload date: Jul 2, 2026
Size: 2.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for vocalinux-0.13.0b0.tar.gz
Algorithm	Hash digest
SHA256	`ab597b57e0df86b45f68cefdf68d06c6078545add6ea491c175e3144dbfe1d7f`
MD5	`8d14c49afe31c3139f569a68c6ee3540`
BLAKE2b-256	`b00a847a76d92b912193ec6d98b774330bdbb22d8c26cb59a8a89947d23efd2b`

See more details on using hashes here.

File details

Details for the file vocalinux-0.13.0b0-py3-none-any.whl.

File metadata

Download URL: vocalinux-0.13.0b0-py3-none-any.whl
Upload date: Jul 2, 2026
Size: 2.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for vocalinux-0.13.0b0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`03274404dcf4844a45f051bf1d29c4780f54ca5fd4aa99ec5b7adc51052fbbb6`
MD5	`681f8780f246863f5dee9c1bfd04deee`
BLAKE2b-256	`f1465d1acbb0bf8a7da46c293a4c07b9197a33efc522db18212443a19a71c504`

See more details on using hashes here.

vocalinux 0.13.0b0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Vocalinux

Voice-to-text for Linux, finally done right!

📚 What's New in v0.13.0-beta

🚀 Highlights

✨ New Features

🐛 Bug Fixes

🔧 Improvements

✨ Features

📸 Screenshots

🚀 Quick Install

Interactive Install (Recommended)

Installation Options

Alternative: Install from Source

🌙 Nightly Releases (Bleeding Edge)

After Installation

📋 Requirements

🎙️ Usage

Voice Dictation

Voice Commands

Command Line Options

Autostart on Login

⚙️ Configuration

Neural Voice Activity Detection

🔧 Development Setup

📁 Project Structure

📖 Documentation

🔊 Sound Customization

Regenerate Sounds

🗺️ Roadmap

🌐 The Voca Ecosystem

🤝 Contributing

Contributors

Quick Links

⭐ Support

📜 License

Star Chart

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes