Skip to main content

A seamless voice dictation system for Linux

Project description

Vocalinux

Voice-to-text for Linux, finally done right!

Status: Beta GitHub release License: GPL v3

Vocalinux CI Platform: Linux Python 3.8+ Made with GTK codecov

GitHub stars GitHub forks GitHub watchers Last commit Commit activity Contributions welcome GitHub issues

Vocalinux Users

A seamless free open-source private voice dictation system for Linux, comparable to built-in solutions on macOS and Windows.

๐Ÿ“š What's New in v0.6.1-beta

This patch release brings significant improvements to the installation experience and GPU support!

โœจ New Features

  • Interactive Backend Selection - When installing whisper.cpp, choose between GPU (Vulkan/CUDA) or CPU backend

    • Intelligent hardware detection recommends the best option
    • Shows helpful tips when GPU libraries are missing
    • Respects your choice during installation
  • Enhanced Welcome Message - Completely redesigned end-of-installation experience

    • Shows exactly what was installed (engine, backend, locations)
    • Clear step-by-step testing instructions
    • Voice command examples ("period", "comma", "new line", "delete that")
    • All management commands in one place

๐Ÿ”ง Improvements

  • Simplified Install Commands - No more --tag parameter needed

    • Before: curl .../main/install.sh | bash -s -- --tag=v0.6.1-beta
    • After: curl .../v0.6.1-beta/install.sh | bash
  • Version Consistency - Install script is now downloaded directly from the release tag

    • Ensures perfect version matching between script and code
    • No risk of main branch changes affecting releases
  • Better GPU Support - Fixed Vulkan development library detection

    • Properly installs libvulkan-dev, glslc, and other required packages
    • Falls back gracefully when GPU libraries are missing
    • Clear error messages with installation instructions

๐Ÿ› Bug Fixes

  • Fixed Bash Compatibility - Resolved ((var++)) arithmetic issues with set -e
  • Fixed TTY Detection - Interactive mode now works correctly with --interactive flag
  • Auto-install Git - Installer now automatically installs git if not present
  • Fixed GPU Installation - Properly executes pip commands for GPU backend installation

Previous Release: v0.6.0-beta

Major milestone: whisper.cpp is now the default engine!

  • โšก 10x faster installation (~1-2 min vs ~5-10 min)
  • ๐ŸŽฎ Universal GPU support - AMD, Intel, NVIDIA via Vulkan
  • ๐Ÿค– whisper.cpp integration - C++ optimized speech recognition
  • ๐Ÿ“ฆ Interactive installer - Choose between 3 engines
  • ๐Ÿ”ง Hardware auto-detection - GPU, RAM, Vulkan support detection
  • ๐Ÿ› Critical bug fixes - Text escaping, audio feedback, punctuation

๐ŸŽ‰ Beta Release with whisper.cpp!

We're excited to share Vocalinux Beta with the community! whisper.cpp brings 10x faster installation and universal GPU support. See "What's New" above for details.


โœจ Features

  • ๐ŸŽค Double-tap Ctrl to start/stop voice dictation
  • โšก Real-time transcription with minimal latency
  • ๐ŸŒŽ Universal compatibility across all Linux applications
  • ๐Ÿ”’ 100% Offline operation for privacy and reliability
  • ๐Ÿค– whisper.cpp by default - High-performance C++ speech recognition
  • ๐ŸŽฎ Universal GPU support - Vulkan acceleration for AMD, Intel, and NVIDIA
  • ๐ŸŽจ System tray integration with visual status indicators
  • ๐Ÿ”Š Pleasant audio feedback - smooth gliding tones, headphone-friendly
  • โš™๏ธ Graphical settings dialog for easy configuration
  • ๐Ÿ“ฆ 3 engine choices - whisper.cpp (default), OpenAI Whisper, or VOSK

๐Ÿ“ธ Screenshots

Here are some screenshots showcasing Vocalinux in action:

Transcription in Action
Real-time voice-to-text transcription
System Tray
System tray with listening indicator
About View
About view with version info
Log Viewer
Log viewer for debugging
Features Overview
Overview of key features and configuration options with annotations

๐Ÿš€ Quick Install

Interactive Install (Recommended)

Our new interactive installer guides you through setup with intelligent hardware detection:

curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/v0.6.1-beta/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh

Choose your engine:

  1. whisper.cpp โญ (Recommended) - Fast, works with any GPU via Vulkan
  2. Whisper (OpenAI) - PyTorch-based, NVIDIA GPU only
  3. VOSK - Lightweight, works on older systems

The installer will:

  • Auto-detect your hardware (GPU, RAM, Vulkan support)
  • Recommend the best engine for your system
  • Download the appropriate model (~39MB for whisper.cpp tiny)
  • Install in ~1-2 minutes (vs 5-10 min with old Whisper)

Note: Installs v0.6.1-beta. For other versions, check GitHub Releases.

Installation Options

Default (whisper.cpp - recommended):

curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/v0.6.1-beta/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh

Fastest installation (~1-2 min), universal GPU support via Vulkan.

Whisper (OpenAI) - if you prefer PyTorch:

curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/v0.6.1-beta/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh --engine=whisper

NVIDIA GPU only (~5-10 min, downloads PyTorch + CUDA).

VOSK only - for low-RAM systems:

curl -fsSL raw.githubusercontent.com/jatinkrmalik/vocalinux/v0.6.1-beta/install.sh -o /tmp/vl.sh && bash /tmp/vl.sh --engine=vosk

Lightweight option (~40MB), works on systems with 4GB RAM.

Alternative: Install from Source

# Clone the repository
git clone https://github.com/jatinkrmalik/vocalinux.git
cd vocalinux

# Run the installer (will prompt for Whisper)
./install.sh

# Or with Whisper support
./install.sh --with-whisper

The installer handles everything: system dependencies, Python environment, speech models, and desktop integration.

After Installation

# If ~/.local/bin is in your PATH (recommended):
vocalinux

# Or activate the virtual environment first:
source ~/.local/bin/activate-vocalinux.sh
vocalinux

# Or run directly:
~/.local/share/vocalinux/venv/bin/vocalinux

Or launch it from your application menu!

๐Ÿ“‹ Requirements

  • OS: Linux (tested on Ubuntu 22.04+, Debian 11+, Fedora 39+, Arch Linux, openSUSE Tumbleweed)
  • Python: 3.8 or newer
  • Display: X11 or Wayland
  • Hardware: Microphone for voice input

Note: See Distribution Compatibility for distribution-specific information and experimental support for Gentoo, Alpine, Void, Solus, and more.

๐ŸŽ™๏ธ Usage

Voice Dictation

  1. Double-tap Ctrl to start recording
  2. Speak clearly into your microphone
  3. Double-tap Ctrl again (or pause speaking) to stop

Voice Commands

Command Action
"new line" Inserts a line break
"period" / "full stop" Types a period (.)
"comma" Types a comma (,)
"question mark" Types a question mark (?)
"exclamation mark" Types an exclamation mark (!)
"delete that" Deletes the last sentence
"capitalize" Capitalizes the next word

Command Line Options

vocalinux --help                  # Show all options
vocalinux --debug                 # Enable debug logging
vocalinux --engine whisper_cpp    # Use whisper.cpp engine (default)
vocalinux --engine whisper        # Use OpenAI Whisper engine
vocalinux --engine vosk           # Use VOSK engine
vocalinux --model medium          # Use medium-sized model
vocalinux --wayland               # Force Wayland mode

โš™๏ธ Configuration

Configuration is stored in ~/.config/vocalinux/config.json:

{
  "speech_recognition": {
    "engine": "whisper_cpp",
    "model_size": "tiny",
    "vad_sensitivity": 3,
    "silence_timeout": 2.0
  }
}

You can also configure settings through the graphical Settings dialog (right-click the tray icon).

๐Ÿ”ง Development Setup

# Clone and install in dev mode
git clone https://github.com/jatinkrmalik/vocalinux.git
cd vocalinux
./install.sh --dev

# Activate environment
source venv/bin/activate

# Run tests
pytest

# Run from source with debug
python -m vocalinux.main --debug

๐Ÿ“ Project Structure

vocalinux/
โ”œโ”€โ”€ src/vocalinux/                 # Main application code
โ”‚   โ”œโ”€โ”€ speech_recognition/        # Speech recognition engines (VOSK, Whisper, whisper.cpp)
โ”‚   โ”‚   โ””โ”€โ”€ recognition_manager.py # Unified engine interface
โ”‚   โ”œโ”€โ”€ text_injection/            # Text injection (X11/Wayland)
โ”‚   โ”œโ”€โ”€ ui/                        # GTK UI components
โ”‚   โ””โ”€โ”€ utils/                     # Utility functions
โ”‚       โ”œโ”€โ”€ whispercpp_model_info.py   # whisper.cpp model metadata & hardware detection
โ”‚       โ””โ”€โ”€ vosk_model_info.py         # VOSK model metadata
โ”œโ”€โ”€ tests/                         # Test suite
โ”œโ”€โ”€ scripts/                       # Development utilities
โ”‚   โ””โ”€โ”€ generate_sounds.py         # Sound generation script
โ”œโ”€โ”€ resources/                     # Icons and sounds
โ”œโ”€โ”€ docs/                          # Documentation
โ””โ”€โ”€ web/                           # Website source

๐Ÿ“– Documentation

๐Ÿ”Š Sound Customization

Vocalinux uses smooth, pleasant gliding tones for audio feedback:

  • Start: Ascending F4โ†’A4 (0.6s) - positive, uplifting
  • Stop: Descending A4โ†’F4 (0.6s) - resolves completion
  • Error: Lower descending E4โ†’C4 (0.7s) - gentle but noticeable

All sounds use pure sine waves with smoothstep interpolation for buttery smooth pitch transitions - perfect for headphone use!

Regenerate Sounds

To modify or regenerate the notification sounds:

python scripts/generate_sounds.py

This script generates all three sounds using the same smooth glide algorithm. You can edit the frequencies, durations, and amplitudes in the script to customize the sounds to your preference.

๐Ÿ—บ๏ธ Roadmap

  • Custom icon design โœ…
  • Graphical settings dialog โœ…
  • Whisper AI support โœ…
  • Multi-language support (FR, DE, RU) โœ…
  • whisper.cpp integration (default engine) โœ…
  • Vulkan GPU support โœ…
  • In-app update mechanism
  • Application-specific commands
  • Debian/Ubuntu package (.deb)
  • Improved Wayland support โœ…
  • Voice command customization

๐Ÿค Contributing

We welcome contributions! Whether it's bug reports, feature requests, or code contributions, please check out our Contributing Guide.

Quick Links

โญ Support

If you find Vocalinux useful, please consider:

  • โญ Starring this repository
  • ๐Ÿ› Reporting bugs you encounter
  • ๐Ÿ“– Improving documentation
  • ๐Ÿ”€ Contributing code

๐Ÿ“œ License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.


Made with โค๏ธ for the Linux community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vocalinux-0.6.1b0.tar.gz (397.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vocalinux-0.6.1b0-py3-none-any.whl (470.1 kB view details)

Uploaded Python 3

File details

Details for the file vocalinux-0.6.1b0.tar.gz.

File metadata

  • Download URL: vocalinux-0.6.1b0.tar.gz
  • Upload date:
  • Size: 397.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for vocalinux-0.6.1b0.tar.gz
Algorithm Hash digest
SHA256 86f235a14eb5cc9cd4d326e3c19b7e0e32e71126499503ea4ddb3b5efe0a796b
MD5 06bc32ded52e647a13e41dccf96f80ea
BLAKE2b-256 c324322e1ce2bdcb6714d78e04ab83a0731559d4d31894d71a39d01332635c6e

See more details on using hashes here.

File details

Details for the file vocalinux-0.6.1b0-py3-none-any.whl.

File metadata

  • Download URL: vocalinux-0.6.1b0-py3-none-any.whl
  • Upload date:
  • Size: 470.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for vocalinux-0.6.1b0-py3-none-any.whl
Algorithm Hash digest
SHA256 bc15743fafbf6ed1b7dca442dd958d9748d96cc75c82baea3cab3f5e3fdc6200
MD5 a47dff1199c91f8617df8203588aace0
BLAKE2b-256 119978db78ea07554125f3e5f54969d3fcc26fd3f4e717eca9eb95413f8a893f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page