Skip to main content

A seamless voice dictation system for Linux

Project description

Vocalinux

Voice-to-text for Linux, finally done right!

Status: Beta GitHub release License: GPL v3

Vocalinux CI Platform: Linux Python 3.9+ Made with GTK codecov

GitHub stars GitHub forks GitHub watchers Last commit Commit activity Contributions welcome GitHub issues

Vocalinux Users

Linux has always punched above its weight, except when it comes to voice typing. Vocalinux fixes that.

It's a free, GPLv3-licensed desktop app that lets you dictate text into any application, on X11 or Wayland, using fully offline speech recognition. Pick from three engines (whisper.cpp, OpenAI Whisper, or VOSK), get automatic GPU acceleration via Vulkan, and control it all with customizable keyboard shortcuts: toggle or push-to-talk.

No internet required. No data leaves your machine. Just speak and type.

๐Ÿ“š What's New in v0.11.0-beta

๐ŸŽ‰ Release: Advanced Settings tab with anti-hallucination parameters, IBus/recognition hardening, and distro compatibility improvements.

๐Ÿš€ Highlights

Feature Description
โš™๏ธ Advanced Settings Tab New Advanced tab with whisper.cpp anti-hallucination parameters (temperature, no_speech_threshold, etc.)
๐Ÿ”Œ IBus Hardening Engine readiness probe at startup, runtime failure recovery, and proper engine destruction on layout switch
๐ŸŽค Recognition Reliability Preserve final speech on stop and improved stop-sound playback timing
๐Ÿ“ฆ Distro Compatibility Hardened Debian layer, corrected openSUSE Tumbleweed deps, Python 3.14 support
๐Ÿ”ง Installer Fixes Repair pywhispercpp loading, validate proj configs before local repo mode, reuse existing whispercpp builds

โœจ New Features

  • Advanced Settings tab โ€” New tab in Settings dialog exposing whisper.cpp anti-hallucination parameters:
    • Temperature override
    • No-speech threshold
    • Max segment length
    • Other inference-time parameters for fine-tuning recognition behavior

๐Ÿ› Bug Fixes

  • IBus: Probe engine readiness at startup with hardened retries (#391)
  • IBus: Handle engine instance destruction on layout switch (#389)
  • IBus: Recover from runtime failures gracefully (#411)
  • Recognition: Preserve final speech when stopping (#401)
  • Recognition: Play stop sound immediately on release (#426) and after audio thread joins (#436)
  • Install: Repair pywhispercpp library loading (#433)
  • Install: Correct openSUSE Tumbleweed dependencies (#420, #418)
  • Install: Harden Debian compatibility layer (#437)
  • Install: Validate pyproject.toml/setup.py before local repo mode (#396)
  • Install: Reuse existing whispercpp builds (#421)
  • Install: Refresh ldconfig after openSUSE typelib install (#438)
  • whisper.cpp: Reduce CPU threads and ensure GPU backend builds in dev mode (#439)
  • Logging: Clean up runtime log noise and cache hardware detection
  • Python 3.14 support: Compatibility fix and lxml>=6.1.0 (#404)

๐Ÿ”ง Improvements

  • Installer refresh โ€” openSUSE fallback handling, Debian compatibility hardening
  • Test coverage โ€” Recognition internals, IBus edge cases, CI notification suppression (#410, #414)
  • Dependency bumps โ€” Next.js security updates (#399, #429), PostCSS (#2753679)
  • PyPI docs โ€” Clarify installation requirements (#423)

โœจ Features

  • ๐ŸŽค Toggle or Push-to-Talk activation modes
  • โšก Real-time transcription with minimal latency
  • ๐ŸŒŽ Universal compatibility across all Linux applications
  • ๐Ÿ”’ 100% Offline operation for privacy and reliability
  • ๐Ÿค– whisper.cpp by default - High-performance C++ speech recognition
  • ๐ŸŽฎ Universal GPU support - Vulkan acceleration for AMD, Intel, and NVIDIA
  • ๐ŸŽจ System tray integration with visual status indicators
  • ๐Ÿš€ Start on login support via XDG autostart (desktop-session startup)
  • ๐Ÿ”Š Pleasant audio feedback - smooth gliding tones, headphone-friendly
  • โš™๏ธ Graphical settings dialog for easy configuration
  • ๐Ÿ“ฆ 3 engine choices - whisper.cpp (default), OpenAI Whisper, or VOSK

๐Ÿ“ธ Screenshots

Here are some screenshots showcasing Vocalinux in action:

Transcription in Action
Real-time voice-to-text transcription
System Tray
System tray with listening indicator
About View
About view with version info
Log Viewer
Log viewer for debugging
Features Overview
Overview of key features and configuration options with annotations

๐Ÿš€ Quick Install

Interactive Install (Recommended)

Our new interactive installer guides you through setup with intelligent hardware detection:

curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/main/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh

Choose your engine:

  1. whisper.cpp โญ (Recommended) - Fast, works with any GPU via Vulkan
  2. Whisper (OpenAI) - PyTorch-based, NVIDIA GPU only
  3. VOSK - Lightweight, works on older systems

The installer will:

  • Auto-detect your hardware (GPU, RAM, Vulkan support)
  • Recommend the best engine for your system
  • Download the appropriate model (~39MB for whisper.cpp tiny)
  • Install in ~1-2 minutes (vs 5-10 min with old Whisper)

Note: Always installs the latest release. For a specific version, check GitHub Releases.

Installation Options

Default (whisper.cpp - recommended):

curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/main/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh

Fastest installation (~1-2 min), universal GPU support via Vulkan.

Whisper (OpenAI) - if you prefer PyTorch:

curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/main/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh --engine=whisper

NVIDIA GPU only (~5-10 min, downloads PyTorch + CUDA).

VOSK only - for low-RAM systems:

curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/main/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh --engine=vosk

Lightweight option (~40MB), works on systems with 4GB RAM.

Alternative: Install from Source

# Clone the repository
git clone https://github.com/jatinkrmalik/vocalinux.git
cd vocalinux

# Run the installer (will prompt for Whisper)
./install.sh

# Or with Whisper support
./install.sh --with-whisper

The installer handles everything: system dependencies, Python environment, speech models, and desktop integration.

๐ŸŒ™ Nightly Releases (Bleeding Edge)

For developers and early adopters who want to test the latest features, check out our GitHub Releases page which includes both beta and nightly builds.

โš ๏ธ Warning: Nightly releases contain the absolute latest code and may be unstable. For production use, we recommend using the latest beta release.

Nightly builds are automatically generated from the main branch every day. They include all merged changes but haven't undergone the same testing as beta releases.

Release Channels:

  • Beta (Recommended) - Tested pre-releases with known features
  • Nightly - Untested bleeding edge with latest commits

After Installation

# If ~/.local/bin is in your PATH (recommended):
vocalinux

# Or activate the virtual environment first:
source ~/.local/bin/activate-vocalinux.sh
vocalinux

# Or run directly:
~/.local/share/vocalinux/venv/bin/vocalinux

Or launch it from your application menu!

๐Ÿ“‹ Requirements

  • OS: Linux (tested on Ubuntu 22.04+, Debian 11+, Fedora 39+, Arch Linux, openSUSE Tumbleweed)
  • Python: 3.9 or newer
  • Display: X11 or Wayland
  • Hardware: Microphone for voice input

Note: See Distribution Compatibility for distribution-specific information and experimental support for Gentoo, Alpine, Void, Solus, and more.

๐ŸŽ™๏ธ Usage

Voice Dictation

  1. Toggle mode: Double-tap the shortcut key (default Ctrl) to start recording
  2. Speak clearly into your microphone
  3. Toggle mode: Double-tap again (or pause speaking) to stop, or Push-to-Talk mode: release the key to stop

Voice Commands

Command Action
"new line" Inserts a line break
"period" / "full stop" Types a period (.)
"comma" Types a comma (,)
"question mark" Types a question mark (?)
"exclamation mark" Types an exclamation mark (!)
"delete that" Deletes the last sentence
"capitalize" Capitalizes the next word

Command Line Options

vocalinux --help                  # Show all options
vocalinux --debug                 # Enable debug logging
vocalinux --engine whisper_cpp    # Use whisper.cpp engine (default)
vocalinux --engine whisper        # Use OpenAI Whisper engine
vocalinux --engine vosk           # Use VOSK engine
vocalinux --model medium          # Use medium-sized model
vocalinux --wayland               # Force Wayland mode
vocalinux --start-minimized       # Start without first-run modal prompts

Autostart on Login

Vocalinux uses the Linux desktop standard for autostart:

  • Mechanism: XDG autostart desktop entry (vocalinux.desktop)
  • Path: $XDG_CONFIG_HOME/autostart/ or ~/.config/autostart/ (fallback)
  • Launch mode: Starts as a regular user desktop app in your graphical session
  • Not used: No systemd unit/service is created by Vocalinux for autostart

How to enable/disable:

  • First-run welcome dialog
  • Tray menu: Start on Login
  • Settings dialog: Start on Login

Compatibility notes:

  • Works on mainstream desktop environments (GNOME, KDE, Xfce, Cinnamon, MATE, LXQt)
  • On minimal/custom window-manager sessions, an autostart handler may be required (for example DE-specific startup hooks or tools like dex)

โš™๏ธ Configuration

Configuration is stored in ~/.config/vocalinux/config.json:

{
  "speech_recognition": {
    "engine": "whisper_cpp",
    "model_size": "tiny",
    "vad_sensitivity": 3,
    "silence_timeout": 2.0
  }
}

You can also configure settings through the graphical Settings dialog (right-click the tray icon).

๐Ÿ”ง Development Setup

# Clone and install in dev mode
git clone https://github.com/jatinkrmalik/vocalinux.git
cd vocalinux
./install.sh --dev

# Activate environment
source venv/bin/activate

# Run tests
pytest

# Run from source with debug
python -m vocalinux.main --debug

๐Ÿ“ Project Structure

vocalinux/
โ”œโ”€โ”€ src/vocalinux/                 # Main application code
โ”‚   โ”œโ”€โ”€ speech_recognition/        # Speech recognition engines (VOSK, Whisper, whisper.cpp)
โ”‚   โ”‚   โ””โ”€โ”€ recognition_manager.py # Unified engine interface
โ”‚   โ”œโ”€โ”€ text_injection/            # Text injection (X11/Wayland)
โ”‚   โ”œโ”€โ”€ ui/                        # GTK UI components
โ”‚   โ””โ”€โ”€ utils/                     # Utility functions
โ”‚       โ”œโ”€โ”€ whispercpp_model_info.py   # whisper.cpp model metadata & hardware detection
โ”‚       โ””โ”€โ”€ vosk_model_info.py         # VOSK model metadata
โ”œโ”€โ”€ tests/                         # Test suite
โ”œโ”€โ”€ scripts/                       # Development utilities
โ”‚   โ””โ”€โ”€ generate_sounds.py         # Sound generation script
โ”œโ”€โ”€ resources/                     # Icons and sounds
โ”œโ”€โ”€ docs/                          # Documentation
โ””โ”€โ”€ web/                           # Website source

๐Ÿ“– Documentation

๐Ÿ”Š Sound Customization

Vocalinux uses smooth, pleasant gliding tones for audio feedback:

  • Start: Ascending F4โ†’A4 (0.6s) - positive, uplifting
  • Stop: Descending A4โ†’F4 (0.6s) - resolves completion
  • Error: Lower descending E4โ†’C4 (0.7s) - gentle but noticeable

All sounds use pure sine waves with smoothstep interpolation for buttery smooth pitch transitions - perfect for headphone use!

Regenerate Sounds

To modify or regenerate the notification sounds:

python scripts/generate_sounds.py

This script generates all three sounds using the same smooth glide algorithm. You can edit the frequencies, durations, and amplitudes in the script to customize the sounds to your preference.

๐Ÿ—บ๏ธ Roadmap

  • Custom icon design โœ…
  • Graphical settings dialog โœ…
  • Whisper AI support โœ…
  • Multi-language support (FR, DE, RU) โœ…
  • whisper.cpp integration (default engine) โœ…
  • Vulkan GPU support โœ…
  • In-app update mechanism
  • Application-specific commands
  • Debian/Ubuntu package (.deb)
  • Wayland support via IBus โœ…
  • Voice command customization

๐ŸŒ The Voca Ecosystem

Vocalinux is part of a family of privacy-first, offline voice dictation tools. Same mission, every operating system.

Platform Project Website GitHub Status
๐Ÿง Linux VocaLinux vocalinux.com jatinkrmalik/vocalinux โœ… Beta v0.11.0
๐ŸŽ macOS VocaMac vocamac.com jatinkrmalik/vocamac ๐Ÿš€ Beta
๐ŸชŸ Windows VocaWin vocawin.com jatinkrmalik/vocawin ๐Ÿ“‹ Planned

Each platform uses native technologies for the best possible integration, while sharing the same privacy-first philosophy and offline-only architecture.

๐Ÿค Contributing

We welcome contributions! Whether it's bug reports, feature requests, or code contributions, please check out our Contributing Guide.

Contributors

Thanks to everyone who has contributed to Vocalinux! ๐Ÿ™Œ

Quick Links

โญ Support

If you find Vocalinux useful, please consider:

  • โญ Starring this repository
  • ๐Ÿ› Reporting bugs you encounter
  • ๐Ÿ“– Improving documentation
  • ๐Ÿ”€ Contributing code

๐Ÿ“œ License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Star Chart

Star History Chart


Made with โค๏ธ for the Linux community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocalinux-0.11.0b0.tar.gz (773.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vocalinux-0.11.0b0-py3-none-any.whl (397.6 kB view details)

Uploaded Python 3

File details

Details for the file vocalinux-0.11.0b0.tar.gz.

File metadata

  • Download URL: vocalinux-0.11.0b0.tar.gz
  • Upload date:
  • Size: 773.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for vocalinux-0.11.0b0.tar.gz
Algorithm Hash digest
SHA256 d5b948ce7726caf1097b42b74f17242a7fac84227fc930dfb2f6e35a3683cf16
MD5 54dc7c0fb9ee83fb03f0f70533cd24c9
BLAKE2b-256 c80f0401ca05bb0992278c79b6056d9190253dbffa8dde326536b5b22b2db11b

See more details on using hashes here.

File details

Details for the file vocalinux-0.11.0b0-py3-none-any.whl.

File metadata

  • Download URL: vocalinux-0.11.0b0-py3-none-any.whl
  • Upload date:
  • Size: 397.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for vocalinux-0.11.0b0-py3-none-any.whl
Algorithm Hash digest
SHA256 05922c7b06d1e9a36e162b2dede5d6fb414356d62dfd3f5c899df5723b7c397a
MD5 184402c2baf086f20071f0d95274be56
BLAKE2b-256 b0886e52d076fde8fbaa638d5f780ca3e63570e88a37e72815ba056a9dee182d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page