Skip to main content

System-wide push-to-talk speech-to-text service for Linux

Project description

Alto

System-wide push-to-talk speech-to-text service for Linux.

Overview

Alto is a lightweight background daemon that enables voice dictation in any Linux application. It uses a simple push-to-talk workflow: hold a dedicated key, speak, release, and the transcribed text is automatically typed at your cursor position.

Unlike traditional dictation tools that only work in specific applications, Alto operates at the system level and works everywhere - browsers, text editors, IDEs, terminals, messaging apps, and more.

Features

  • Universal dictation: Works in any application (browser, mail client, IDE, terminal, messaging...)
  • Push-to-talk: Simple key press/release workflow - no wake words needed
  • Cross-desktop: Compatible with both X11 and Wayland
  • Powered by Mistral AI: Uses the Voxtral model for accurate, multilingual transcription
  • System tray indicator: Visual feedback of recording/transcription status
  • Lightweight: Runs silently in the background with minimal resource usage
  • Configurable: Customize language, model, typing speed, and hotkey

Requirements

System Requirements

  • OS: Linux (tested on Ubuntu, Fedora, Arch)
  • Python: 3.11 or higher
  • Desktop: X11 or Wayland

External Dependencies

  • ydotool and ydotoold: For keyboard input simulation (works on both X11 and Wayland)
  • Input device access: Your user must be in the input group to read keyboard events

API Requirements

  • Mistral API key: Required for speech-to-text transcription (Get one here)

Installation

Quick Installation (Recommended)

For a fully automated installation, use the install.sh script:

# Clone the repository
git clone https://codeberg.org/michael-nedjam/Alto.git
cd Alto

# Run the installation script
./install.sh

The script will automatically:

  • Detect your Linux distribution (Ubuntu/Debian, Fedora, or Arch)
  • Install system dependencies (ydotool, portaudio, python3-dev, AppIndicator)
  • Add your user to the input group
  • Start and enable the ydotoold service
  • Create a Python virtual environment and install dependencies
  • Configure your push-to-talk key (interactive)
  • Install the systemd service
  • Set up your Mistral API key (interactive)

Note: You will need to log out and log back in after installation for input group permissions to take effect.

Manual Installation

If you prefer to install manually or need more control over the installation process, follow these steps:

Step 1: Install System Dependencies

Ubuntu/Debian

sudo apt update
sudo apt install ydotool portaudio19-dev python3-dev python3-venv \
    gir1.2-appindicator3-0.1 libappindicator3-1

Fedora

sudo dnf install ydotool portaudio-devel python3-devel \
    libappindicator-gtk3 libappindicator-gtk3-devel

Arch Linux

sudo pacman -S ydotool portaudio libappindicator-gtk3

Step 2: Configure ydotool

Set up ydotoold as a user service (recommended over the system-level service, see Autostart for details):

# Grant /dev/uinput access to the input group
echo 'KERNEL=="uinput", GROUP="input", MODE="0660"' | sudo tee /etc/udev/rules.d/99-uinput.rules
sudo udevadm control --reload-rules
sudo udevadm trigger /dev/uinput

# Disable the system-level service if active
sudo systemctl stop ydotoold 2>/dev/null
sudo systemctl disable ydotoold 2>/dev/null

# Enable and start as a user service
systemctl --user enable ydotoold
systemctl --user start ydotoold

Verify it's running:

systemctl --user status ydotoold

Step 3: Add User to Input Group

To allow Alto to read keyboard events, add your user to the input group:

sudo usermod -a -G input $USER

Important: You must log out and log back in for this change to take effect.

Verify the change:

groups | grep input

Step 4: Clone and Install Alto

# Clone the repository
git clone https://codeberg.org/michael-nedjam/Alto.git
cd Alto

# Create a virtual environment
python3 -m venv venv
source venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt

Step 5: Configure Mistral API Key

Set your Mistral API key as an environment variable:

export MISTRAL_API_KEY="your-api-key-here"

To make it permanent, add it to your shell profile (~/.bashrc, ~/.zshrc, etc.):

echo 'export MISTRAL_API_KEY="your-api-key-here"' >> ~/.bashrc
source ~/.bashrc

Step 6: Configure Your Push-to-Talk Key

Use the built-in key detection utility to find your preferred push-to-talk key:

python -m alto --detect-key

Press your desired key (e.g., F20, Right Control, etc.), and the tool will display its evdev name. Copy this name and update config.json:

{
  "evdev_key": "KEY_F20"
}

Configuration

Alto is configured via the config.json file in the project root. Here are all available options:

{
  "model": "voxtral-mini-latest",
  "language": "fr",
  "evdev_key": "KEY_F20",
  "evdev_devices": "auto",
  "typing_delay_ms": 12,
  "log_level": "INFO",
  "audio_sample_rate": 16000,
  "audio_channels": 1,
  "audio_dtype": "int16",
  "audio_blocksize": 0,
  "audio_enable_noise_suppression": false,
  "audio_enable_agc": false,
  "feedback_url": "https://codeberg.org/michael-nedjam/Alto/issues",
  "enable_telemetry": false,
  "check_for_updates": true,
  "repository_url": "https://codeberg.org/michael-nedjam/Alto.git"
}

Configuration Options

General Settings

Option Type Default Description
model string "voxtral-mini-latest" Mistral model to use for transcription. Options: voxtral-mini-latest, voxtral-large-latest
language string "fr" Transcription language code (e.g., "en", "fr", "es", "de")
evdev_key string "KEY_F20" Push-to-talk key name (use --detect-key to find yours)
evdev_devices string/array "auto" Input devices to monitor. "auto" scans all keyboards, or provide specific device paths
typing_delay_ms number 12 Delay in milliseconds between each keystroke when typing transcribed text
log_level string "INFO" Logging verbosity. Options: "DEBUG", "INFO", "WARNING", "ERROR"

Audio Recording Settings

Option Type Default Description
audio_sample_rate number 16000 Sample rate in Hz. Options: 8000, 16000, 22050, 44100, 48000. Recommended: 16000 for speech recognition
audio_channels number 1 Number of audio channels. Options: 1 (mono), 2 (stereo). Recommended: 1 for speech
audio_dtype string "int16" Audio data type. Options: "int16", "int32", "float32". Recommended: int16 for compatibility
audio_blocksize number 0 Audio buffer size in frames (0 = auto). Increase for lower CPU usage, decrease for lower latency
audio_enable_noise_suppression boolean false Enable simple noise gate to suppress background noise. Useful in noisy environments
audio_enable_agc boolean false Enable automatic gain control to normalize volume levels. Useful if microphone levels vary

Feedback and Update Settings

Option Type Default Description
feedback_url string "https://codeberg.org/michael-nedjam/Alto/issues" URL for feedback and issue reporting
enable_telemetry boolean false Enable opt-in telemetry for anonymized usage data
check_for_updates boolean true Check for updates on daemon startup
repository_url string "https://codeberg.org/michael-nedjam/Alto.git" Git repository URL for update checking

Audio Optimization Guidelines:

  • Best quality: Keep defaults (16000 Hz, mono, int16) - optimized for speech recognition
  • Noisy environment: Enable audio_enable_noise_suppression: true
  • Variable microphone volume: Enable audio_enable_agc: true
  • Lower latency: Set audio_blocksize: 512 or 1024
  • Lower CPU usage: Set audio_blocksize: 2048 or 4096
  • Note: Higher sample rates (44.1kHz, 48kHz) increase file size and processing time without improving speech accuracy

Example Configurations

English dictation with faster typing:

{
  "model": "voxtral-mini-latest",
  "language": "en",
  "evdev_key": "KEY_RIGHTCTRL",
  "typing_delay_ms": 5,
  "log_level": "INFO"
}

Noisy environment optimization:

{
  "model": "voxtral-mini-latest",
  "language": "en",
  "evdev_key": "KEY_F20",
  "typing_delay_ms": 12,
  "audio_enable_noise_suppression": true,
  "audio_enable_agc": true,
  "log_level": "INFO"
}

Low latency configuration:

{
  "model": "voxtral-mini-latest",
  "language": "en",
  "evdev_key": "KEY_F20",
  "typing_delay_ms": 5,
  "audio_blocksize": 512,
  "log_level": "INFO"
}

Debug mode with specific input device:

{
  "model": "voxtral-mini-latest",
  "language": "fr",
  "evdev_key": "KEY_F20",
  "evdev_devices": ["/dev/input/event3"],
  "typing_delay_ms": 12,
  "log_level": "DEBUG"
}

Usage

Starting Alto

From within the project directory with your virtual environment activated:

python -m alto

The daemon will:

  1. Perform preflight checks (ydotool running, input permissions, API key set)
  2. Start the system tray icon (green = ready)
  3. Begin listening for your push-to-talk key

Using Push-to-Talk Dictation

  1. Position your cursor where you want text to appear
  2. Press and hold your configured push-to-talk key
  3. Speak clearly while holding the key
  4. Release the key when done speaking
  5. Wait for transcription (1-3 seconds, icon turns yellow)
  6. Text appears automatically at cursor position

System Tray Icon States

The system tray icon provides visual feedback:

  • Green: Ready for dictation
  • Red: Recording audio (speak now)
  • Yellow: Transcribing speech (processing)
  • Grey: Error state (check logs)

Right-click the tray icon to quit Alto.

Command-Line Options

# Start Alto with default config
python -m alto

# Use a custom config file
python -m alto --config /path/to/config.json

# Run key detection utility
python -m alto --detect-key

Stopping Alto

  • Via tray icon: Right-click and select "Quit"
  • Via terminal: Press Ctrl+C
  • Via signal: Send SIGTERM to the process
  • Via systemd: systemctl --user stop alto (if running as a service)

Auto-Update Notifications

Alto automatically checks for updates when it starts (if check_for_updates is enabled in config.json).

Update notification behavior:

  • On startup, Alto checks the remote Git repository for new versions
  • If an update is available, a menu item appears in the system tray: "⬆ Update Available: x.x.x"
  • Click this menu item to display update instructions in the console/logs

To disable update checks:

Edit config.json and set:

{
  "check_for_updates": false
}

Manual update (Git installation):

If you installed Alto via Git:

# Stop Alto
systemctl --user stop alto  # (if running as service)

# Pull latest changes
cd /path/to/Alto
git pull origin main

# Update dependencies
source venv/bin/activate
pip install -r requirements.txt

# Restart Alto
systemctl --user start alto  # (or run manually)

Manual update (package installation):

If you installed via package manager or manual installation:

  1. Visit: https://codeberg.org/michael-nedjam/Alto/releases
  2. Download the latest release
  3. Follow installation instructions in README.md

Autostart: Running Alto at Every Login

Alto can be configured to start automatically when you open your graphical session, using systemd user services. This section describes the full setup, including the ydotoold dependency.

Overview

The autostart relies on these components:

  1. ydotoold runs as a user service (not root) for keyboard simulation
  2. A udev rule grants your user access to /dev/uinput
  3. Alto runs as a user service, ordered after ydotoold and the graphical session
  4. environment.d provides the API key to systemd services (shell profile variables like .bashrc are not inherited by systemd)
  5. .xprofile imports graphical session variables (DISPLAY, etc.) into systemd at login

Step 1: Set up ydotoold as a user service

By default, ydotoold may run as a system (root) service. This creates a socket owned by root that your user cannot access. Running it as a user service solves this.

# Stop and disable the system-level service (if active)
sudo systemctl stop ydotoold
sudo systemctl disable ydotoold

# Grant /dev/uinput access to the input group
echo 'KERNEL=="uinput", GROUP="input", MODE="0660"' | sudo tee /etc/udev/rules.d/99-uinput.rules
sudo udevadm control --reload-rules
sudo udevadm trigger /dev/uinput

# Enable and start ydotoold as a user service
systemctl --user enable ydotoold
systemctl --user start ydotoold

Verify it works:

systemctl --user status ydotoold
YDOTOOL_SOCKET=/run/user/$(id -u)/.ydotool_socket ydotool key 0:0  # should exit silently

Step 2: Configure the API key for systemd

Systemd user services do not inherit variables from .bashrc or .zshrc. Use environment.d instead:

mkdir -p ~/.config/environment.d
echo "MISTRAL_API_KEY=your-key-here" > ~/.config/environment.d/mistral.conf

This file is read by the systemd user manager at startup and makes the key available to all user services.

Step 3: Import graphical session variables

Alto needs DISPLAY (and XAUTHORITY on X11) to show the system tray icon. These variables are set by the display manager at login, but systemd services don't automatically see them.

Create ~/.xprofile (sourced by GDM, LightDM, SDDM at login):

cat >> ~/.xprofile << 'EOF'
# Import graphical session variables into systemd user session
systemctl --user import-environment DISPLAY XAUTHORITY DBUS_SESSION_BUS_ADDRESS
EOF

Step 4: Install and enable the Alto service

# Copy and configure the service file
cp alto.service.template alto.service
# Edit alto.service: replace ALTO_PATH with your installation path (e.g. /home/youruser/Alto)
nano alto.service

# Install
mkdir -p ~/.config/systemd/user
cp alto.service ~/.config/systemd/user/
systemctl --user daemon-reload

# Enable (auto-start at login) and start
systemctl --user enable alto
systemctl --user start alto

Startup chain

At each login, the following happens in order:

  1. Display manager (GDM/LightDM) starts the graphical session
  2. .xprofile runs and imports DISPLAY/XAUTHORITY into systemd
  3. ydotoold.service starts, creates socket at /run/user/$UID/.ydotool_socket
  4. alto.service starts (after ydotoold + graphical-session), connects to the socket, listens for the hotkey

Managing the service

systemctl --user status alto       # Check status
journalctl --user -u alto -f       # Follow logs
systemctl --user restart alto      # Restart (e.g. after config change)
systemctl --user stop alto         # Stop
systemctl --user disable alto      # Disable auto-start

Troubleshooting autostart

Service fails to start:

journalctl --user -u alto -n 50    # Check error logs

ydotool verification failed:

  • Ensure ydotoold runs as a user service (not root)
  • Check the udev rule: ls -la /dev/uinput should show group input
  • Verify your user is in the input group: groups | grep input
  • PrivateTmp must be false in the service file (ydotool socket is in /tmp or /run/user)

Tray icon not showing:

  • Check that .xprofile exists and imports DISPLAY
  • Verify: systemctl --user show-environment | grep DISPLAY

API key not found:

  • Verify: systemctl --user show-environment | grep MISTRAL
  • Check ~/.config/environment.d/mistral.conf exists and is not empty

Troubleshooting

Alto won't start / Preflight checks failing

Issue: Error message about missing dependencies

Solutions:

  1. ydotoold not running:

    sudo systemctl start ydotool
    pidof ydotoold  # Verify it's running
    
  2. No input device permissions:

    # Add user to input group
    sudo usermod -a -G input $USER
    # Log out and log back in
    groups | grep input  # Verify membership
    
  3. Missing MISTRAL_API_KEY:

    echo $MISTRAL_API_KEY  # Check if set
    export MISTRAL_API_KEY="your-key-here"
    

Push-to-talk key not detected

Issue: Pressing the PTT key doesn't start recording

Solutions:

  1. Verify key name: Run python -m alto --detect-key and press your key. Update config.json with the exact name shown.

  2. Check logs: Look for "Key press event received" messages in the console output.

  3. Test input permissions:

    ls -l /dev/input/event*
    # You should be able to read these files
    

Text not being typed

Issue: Transcription completes but text doesn't appear

Solutions:

  1. Verify ydotoold is running:

    pidof ydotoold
    
  2. Check permissions on /dev/uinput:

    ls -l /dev/uinput
    # Should be readable/writable by input group
    
  3. Try slower typing speed: Increase typing_delay_ms in config.json to 20 or higher.

Transcription returns empty text

Issue: Recording completes but no text is typed

Solutions:

  1. Check recording duration: Recordings shorter than 0.1s are ignored. Hold the key longer.

  2. Verify audio input: Check that your microphone is working and selected as the default input device.

  3. Test API connectivity:

    curl -H "Authorization: Bearer $MISTRAL_API_KEY" https://api.mistral.ai/v1/models
    
  4. Enable debug logging: Set "log_level": "DEBUG" in config.json to see detailed transcription responses.

High CPU usage or lag

Issue: Alto consumes too many resources

Solutions:

  1. Use smaller model: Switch to voxtral-mini-latest instead of voxtral-large-latest.

  2. Increase typing delay: Higher typing_delay_ms reduces CPU load during text injection.

  3. Check for audio device issues: Some audio configurations can cause high CPU usage. Try specifying a specific device.

Logs and Debugging

Enable debug logging for detailed output:

{
  "log_level": "DEBUG"
}

Logs include:

  • State transitions (READY → RECORDING → TRANSCRIBING → TYPING)
  • Audio recording duration
  • Transcription API responses
  • Text injection progress
  • Error messages and stack traces

Testing

Automated Tests

Alto includes comprehensive unit and integration tests to verify core functionality:

# Run all tests
python -m pytest tests/

# Run with verbose output
python -m pytest tests/ -v

# Run specific test file
python -m pytest tests/test_integration.py -v

The test suite covers:

  • Unit tests for all core components (recorder, transcriber, injector, state machine, etc.)
  • Integration tests for the complete workflow
  • Error handling and recovery scenarios
  • State machine transitions
  • Tray icon integration
  • Systemd service configuration
  • Multi-distribution compatibility (Ubuntu/Debian, Fedora/RHEL, Arch Linux)

End-to-End Testing

For manual end-to-end testing in real-world environments:

  1. Run verification script to check prerequisites:

    python tests/verify_e2e.py
    
  2. Follow the test plan in tests/END_TO_END_TESTING.md:

    • Core workflow testing in various applications (browsers, IDEs, terminals)
    • State transition verification
    • Error handling and recovery
    • Performance and latency measurements
    • Multilingual support
    • Desktop environment compatibility (X11/Wayland)
    • Systemd service integration

The end-to-end test plan includes detailed steps, expected results, and a comprehensive checklist for manual verification.

Multi-Distribution Testing

For testing across different Linux distributions:

  1. Run multi-distribution tests to verify compatibility:

    python -m pytest tests/test_multi_distro.py -v
    
  2. Follow the multi-distribution test plan in tests/MULTI_DISTRO_TESTING.md:

    • Installation testing on Ubuntu/Debian, Fedora/RHEL, and Arch Linux
    • End-to-end workflow verification on each distribution
    • System tray integration across different desktop environments
    • Logging and error handling consistency
    • Systemd service compatibility

The multi-distribution test plan provides a comprehensive checklist for verifying Alto works correctly across all major Linux distribution families.

CI/CD Pipeline

Alto uses Woodpecker CI (integrated with Codeberg) for automated testing and continuous deployment.

Pipeline Overview

The CI/CD pipeline automatically runs on every push, pull request, and tag creation. It consists of:

  • Testing: Unit tests, integration tests, multi-distribution tests, performance tests
  • Quality Checks: Code formatting (black, isort), linting (flake8)
  • Build: Automated release artifact generation (Python wheel and source distribution)
  • Coverage: Test coverage reporting and tracking

Quick Start

The pipeline is automatically triggered when you push code:

# Push to any branch (runs tests)
git push origin feature-branch

# Create a release (runs tests + builds artifacts)
git tag -a v1.0.0 -m "Release 1.0.0"
git push origin v1.0.0

Viewing Build Status

Local Testing (Pre-Push)

Before pushing, run tests locally to catch issues early:

# Run all tests with coverage
./run_tests.sh

# Or manually
source venv/bin/activate
pytest tests/ -v --cov=alto --cov-report=term-missing

Release Workflow

To create a new release:

  1. Update version numbers in relevant files
  2. Commit changes: git commit -m "Prepare release v1.0.0"
  3. Create tag: git tag -a v1.0.0 -m "Release version 1.0.0"
  4. Push: git push origin main --tags
  5. CI builds artifacts automatically
  6. Download artifacts from Woodpecker and publish to Codeberg Releases

Documentation

  • Full CI/CD Guide: .woodpecker/README.md
  • Quick Reference: .woodpecker/QUICK_REFERENCE.md
  • Pipeline Configuration: .woodpecker.yml

Key Features

  • Automated testing on every commit
  • Multi-distribution validation (Ubuntu, Fedora, Arch)
  • Code quality enforcement (formatting, linting)
  • Release automation for tags
  • Coverage tracking to maintain test quality
  • Fast feedback (typical build time: 3-5 minutes)

Architecture

Alto consists of several modular components:

  • daemon.py: Main orchestrator that coordinates all components
  • hotkey.py: Monitors keyboard events via evdev
  • recorder.py: Captures audio from the microphone
  • transcriber.py: Sends audio to Mistral API and receives text
  • injector.py: Types text using ydotool
  • tray.py: System tray icon for status indication
  • state.py: State machine managing workflow transitions

Development Status

This project is in early development. See Development Roadmap for planned features including:

  • Custom vocabulary and corrections
  • Silence detection for automatic recording termination
  • Audio feedback options
  • Enhanced multilingual support

Contributing

Contributions are welcome. Please feel free to submit issues and pull requests on the repository.

License

Proprietary. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alto_dictation-1.0.21.tar.gz (80.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alto_dictation-1.0.21-py3-none-any.whl (75.0 kB view details)

Uploaded Python 3

File details

Details for the file alto_dictation-1.0.21.tar.gz.

File metadata

  • Download URL: alto_dictation-1.0.21.tar.gz
  • Upload date:
  • Size: 80.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for alto_dictation-1.0.21.tar.gz
Algorithm Hash digest
SHA256 55977cc15da92a7161119dd894eed55519598d344f8d9231ffe89e6d600c3682
MD5 8f2b6bcb6d203152c1310283a187c973
BLAKE2b-256 81587fd56b8b5578041c79e5a028f43713eb214ca69eb6da6ebddd005c80d2b4

See more details on using hashes here.

File details

Details for the file alto_dictation-1.0.21-py3-none-any.whl.

File metadata

File hashes

Hashes for alto_dictation-1.0.21-py3-none-any.whl
Algorithm Hash digest
SHA256 9a5d02e0a6f7654fe3cfebcc6f25273afca647a186d9024e5a97e1c13d2a348f
MD5 2cea31050c4e16ee35f3c18996b7c620
BLAKE2b-256 2b4e29b9ddde9babfba7e330b617e7982facf3e1b89d3905c33d1a8578b3a0db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page