A seamless voice dictation system for Linux
Project description
Vocalinux
Voice-to-text for Linux, finally done right!
A seamless free open-source private voice dictation system for Linux, comparable to built-in solutions on macOS and Windows.
๐ What's New in v0.6.3-beta
๐ Beta Release with whisper.cpp! โ Fast, private, offline voice dictation for Linux.
๐ Highlights (v0.6.0 โ v0.6.3)
| Feature | Description |
|---|---|
| โก whisper.cpp Default | 10x faster installation (~1-2 min), C++ optimized inference |
| ๐ฎ Universal GPU Support | Vulkan acceleration for AMD, Intel, and NVIDIA GPUs |
| ๐ฆ Interactive Installer | Choose between 3 engines with hardware auto-detection |
| ๐ง Multi-Distro Support | Works on Ubuntu, Debian, Fedora, Arch, and more |
โจ New Features (v0.6.3)
- IBus Text Injection Engine โ Full Wayland support via IBus input method
- X11 IBus Support โ Extended IBus support to X11 for non-US keyboard layouts
- Thread-Safe Model Access โ Improved stability with concurrent model operations
๐ Bug Fixes (v0.6.3)
- #229: Fixed
[BLANK_AUDIO]token suppression in whisper.cpp output - #228: Removed premature pkg-config check in installer
- #227: Fixed text injection for non-US keyboard layouts on X11
- #216: Fixed thread safety crash when accessing speech models
- #221: Fixed missing
psutildependency for fresh installs - #219: Suppressed
[BLANK_AUDIO]tokens in whisper.cpp output - #204: Fixed PyAudio
paInt16error on device reconnection - #205: Fixed whisper module installation with
--autoflag
๐ง Recent Improvements
- Interactive Backend Selection โ Choose GPU (Vulkan/CUDA) or CPU backend
- Enhanced Welcome Message โ Clear post-install instructions
- Simplified Install Commands โ No more
--tagparameter needed - Better Vulkan Detection โ Improved shader package installation
โจ Features
- ๐ค Double-tap Ctrl to start/stop voice dictation
- โก Real-time transcription with minimal latency
- ๐ Universal compatibility across all Linux applications
- ๐ 100% Offline operation for privacy and reliability
- ๐ค whisper.cpp by default - High-performance C++ speech recognition
- ๐ฎ Universal GPU support - Vulkan acceleration for AMD, Intel, and NVIDIA
- ๐จ System tray integration with visual status indicators
- ๐ Pleasant audio feedback - smooth gliding tones, headphone-friendly
- โ๏ธ Graphical settings dialog for easy configuration
- ๐ฆ 3 engine choices - whisper.cpp (default), OpenAI Whisper, or VOSK
๐ธ Screenshots
Here are some screenshots showcasing Vocalinux in action:
|
Real-time voice-to-text transcription |
System tray with listening indicator |
|
About view with version info |
Log viewer for debugging |
|
Overview of key features and configuration options with annotations |
|
๐ Quick Install
Interactive Install (Recommended)
Our new interactive installer guides you through setup with intelligent hardware detection:
curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/v0.6.3-beta/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh
Choose your engine:
- whisper.cpp โญ (Recommended) - Fast, works with any GPU via Vulkan
- Whisper (OpenAI) - PyTorch-based, NVIDIA GPU only
- VOSK - Lightweight, works on older systems
The installer will:
- Auto-detect your hardware (GPU, RAM, Vulkan support)
- Recommend the best engine for your system
- Download the appropriate model (~39MB for whisper.cpp tiny)
- Install in ~1-2 minutes (vs 5-10 min with old Whisper)
Note: Installs v0.6.3-beta. For other versions, check GitHub Releases.
Installation Options
Default (whisper.cpp - recommended):
curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/v0.6.3-beta/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh
Fastest installation (~1-2 min), universal GPU support via Vulkan.
Whisper (OpenAI) - if you prefer PyTorch:
curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/v0.6.3-beta/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh --engine=whisper
NVIDIA GPU only (~5-10 min, downloads PyTorch + CUDA).
VOSK only - for low-RAM systems:
curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/v0.6.3-beta/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh --engine=vosk
Lightweight option (~40MB), works on systems with 4GB RAM.
Alternative: Install from Source
# Clone the repository
git clone https://github.com/jatinkrmalik/vocalinux.git
cd vocalinux
# Run the installer (will prompt for Whisper)
./install.sh
# Or with Whisper support
./install.sh --with-whisper
The installer handles everything: system dependencies, Python environment, speech models, and desktop integration.
๐ Nightly Releases (Bleeding Edge)
For developers and early adopters who want to test the latest features, check out our GitHub Releases page which includes both beta and nightly builds.
โ ๏ธ Warning: Nightly releases contain the absolute latest code and may be unstable. For production use, we recommend using the latest beta release.
Nightly builds are automatically generated from the main branch every day. They include all merged changes but haven't undergone the same testing as beta releases.
Release Channels:
- Beta (Recommended) โ Tested pre-releases with known features
- Nightly โ Untested bleeding edge with latest commits
After Installation
# If ~/.local/bin is in your PATH (recommended):
vocalinux
# Or activate the virtual environment first:
source ~/.local/bin/activate-vocalinux.sh
vocalinux
# Or run directly:
~/.local/share/vocalinux/venv/bin/vocalinux
Or launch it from your application menu!
๐ Requirements
- OS: Linux (tested on Ubuntu 22.04+, Debian 11+, Fedora 39+, Arch Linux, openSUSE Tumbleweed)
- Python: 3.8 or newer
- Display: X11 or Wayland
- Hardware: Microphone for voice input
Note: See Distribution Compatibility for distribution-specific information and experimental support for Gentoo, Alpine, Void, Solus, and more.
๐๏ธ Usage
Voice Dictation
- Double-tap Ctrl to start recording
- Speak clearly into your microphone
- Double-tap Ctrl again (or pause speaking) to stop
Voice Commands
| Command | Action |
|---|---|
| "new line" | Inserts a line break |
| "period" / "full stop" | Types a period (.) |
| "comma" | Types a comma (,) |
| "question mark" | Types a question mark (?) |
| "exclamation mark" | Types an exclamation mark (!) |
| "delete that" | Deletes the last sentence |
| "capitalize" | Capitalizes the next word |
Command Line Options
vocalinux --help # Show all options
vocalinux --debug # Enable debug logging
vocalinux --engine whisper_cpp # Use whisper.cpp engine (default)
vocalinux --engine whisper # Use OpenAI Whisper engine
vocalinux --engine vosk # Use VOSK engine
vocalinux --model medium # Use medium-sized model
vocalinux --wayland # Force Wayland mode
โ๏ธ Configuration
Configuration is stored in ~/.config/vocalinux/config.json:
{
"speech_recognition": {
"engine": "whisper_cpp",
"model_size": "tiny",
"vad_sensitivity": 3,
"silence_timeout": 2.0
}
}
You can also configure settings through the graphical Settings dialog (right-click the tray icon).
๐ง Development Setup
# Clone and install in dev mode
git clone https://github.com/jatinkrmalik/vocalinux.git
cd vocalinux
./install.sh --dev
# Activate environment
source venv/bin/activate
# Run tests
pytest
# Run from source with debug
python -m vocalinux.main --debug
๐ Project Structure
vocalinux/
โโโ src/vocalinux/ # Main application code
โ โโโ speech_recognition/ # Speech recognition engines (VOSK, Whisper, whisper.cpp)
โ โ โโโ recognition_manager.py # Unified engine interface
โ โโโ text_injection/ # Text injection (X11/Wayland)
โ โโโ ui/ # GTK UI components
โ โโโ utils/ # Utility functions
โ โโโ whispercpp_model_info.py # whisper.cpp model metadata & hardware detection
โ โโโ vosk_model_info.py # VOSK model metadata
โโโ tests/ # Test suite
โโโ scripts/ # Development utilities
โ โโโ generate_sounds.py # Sound generation script
โโโ resources/ # Icons and sounds
โโโ docs/ # Documentation
โโโ web/ # Website source
๐ Documentation
- Installation Guide - Detailed installation instructions
- Update Guide - How to update Vocalinux
- User Guide - Complete user documentation
- Contributing - Development setup and contribution guidelines
๐ Sound Customization
Vocalinux uses smooth, pleasant gliding tones for audio feedback:
- Start: Ascending F4โA4 (0.6s) - positive, uplifting
- Stop: Descending A4โF4 (0.6s) - resolves completion
- Error: Lower descending E4โC4 (0.7s) - gentle but noticeable
All sounds use pure sine waves with smoothstep interpolation for buttery smooth pitch transitions - perfect for headphone use!
Regenerate Sounds
To modify or regenerate the notification sounds:
python scripts/generate_sounds.py
This script generates all three sounds using the same smooth glide algorithm. You can edit the frequencies, durations, and amplitudes in the script to customize the sounds to your preference.
๐บ๏ธ Roadmap
-
Custom icon designโ -
Graphical settings dialogโ -
Whisper AI supportโ -
Multi-language support (FR, DE, RU)โ -
whisper.cpp integration (default engine)โ -
Vulkan GPU supportโ - In-app update mechanism
- Application-specific commands
- Debian/Ubuntu package (.deb)
-
Wayland support via IBusโ - Voice command customization
๐ค Contributing
We welcome contributions! Whether it's bug reports, feature requests, or code contributions, please check out our Contributing Guide.
Quick Links
- ๐ Report a Bug
- ๐ก Request a Feature
- ๐ฌ Discussions
โญ Support
If you find Vocalinux useful, please consider:
- โญ Starring this repository
- ๐ Reporting bugs you encounter
- ๐ Improving documentation
- ๐ Contributing code
๐ License
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Made with โค๏ธ for the Linux community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vocalinux-0.6.3b0.tar.gz.
File metadata
- Download URL: vocalinux-0.6.3b0.tar.gz
- Upload date:
- Size: 408.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fe3401dd38e3f806cfc7786cdd0ce90815bfa22c7f41cd45fe249ea1c8b467f
|
|
| MD5 |
5905e5d2e75cfc103888390e54187592
|
|
| BLAKE2b-256 |
7da5fcf02d6777bb8ad457dafeeb0d5755bba75da39c2134f8eabae5efe69462
|
File details
Details for the file vocalinux-0.6.3b0-py3-none-any.whl.
File metadata
- Download URL: vocalinux-0.6.3b0-py3-none-any.whl
- Upload date:
- Size: 478.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6392ce37d9c11582cad2133e2987531154a9f762e3e7ceb58cd9f0b3e720b47f
|
|
| MD5 |
219ffc836f5399e60e91a25c3379ecf3
|
|
| BLAKE2b-256 |
deb69858c663d6eb12512aa9bdce96f68abe9b16348213e8661c904d1c15577d
|