System-wide push-to-talk speech-to-text service for Linux
Project description
Alto
System-wide push-to-talk speech-to-text service for Linux.
Overview
Alto is a lightweight background daemon that enables voice dictation in any Linux application. It uses a simple push-to-talk workflow: hold a dedicated key, speak, release, and the transcribed text is automatically typed at your cursor position.
Unlike traditional dictation tools that only work in specific applications, Alto operates at the system level and works everywhere - browsers, text editors, IDEs, terminals, messaging apps, and more.
Features
- Universal dictation: Works in any application (browser, mail client, IDE, terminal, messaging...)
- Push-to-talk: Simple key press/release workflow - no wake words needed
- Cross-desktop: Compatible with both X11 and Wayland
- Powered by Mistral AI: Uses the Voxtral model for accurate, multilingual transcription
- System tray indicator: Visual feedback of recording/transcription status
- Lightweight: Runs silently in the background with minimal resource usage
- Configurable: Customize language, model, typing speed, and hotkey
Requirements
System Requirements
- OS: Linux (tested on Ubuntu, Fedora, Arch)
- Python: 3.11 or higher
- Desktop: X11 or Wayland
External Dependencies
- ydotool and ydotoold: For keyboard input simulation (works on both X11 and Wayland)
- Input device access: Your user must be in the
inputgroup to read keyboard events
API Requirements
- Mistral API key: Required for speech-to-text transcription (Get one here)
Installation
Quick Installation (Recommended)
For a fully automated installation, use the install.sh script:
# Clone the repository
git clone https://codeberg.org/michael-nedjam/Alto.git
cd Alto
# Run the installation script
./install.sh
The script will automatically:
- Detect your Linux distribution (Ubuntu/Debian, Fedora, or Arch)
- Install system dependencies (ydotool, portaudio, python3-dev, AppIndicator)
- Add your user to the
inputgroup - Start and enable the ydotoold service
- Create a Python virtual environment and install dependencies
- Configure your push-to-talk key (interactive)
- Install the systemd service
- Set up your Mistral API key (interactive)
Note: You will need to log out and log back in after installation for input group permissions to take effect.
Manual Installation
If you prefer to install manually or need more control over the installation process, follow these steps:
Step 1: Install System Dependencies
Ubuntu/Debian
sudo apt update
sudo apt install ydotool portaudio19-dev python3-dev python3-venv \
gir1.2-appindicator3-0.1 libappindicator3-1
Fedora
sudo dnf install ydotool portaudio-devel python3-devel \
libappindicator-gtk3 libappindicator-gtk3-devel
Arch Linux
sudo pacman -S ydotool portaudio libappindicator-gtk3
Step 2: Configure ydotool
Set up ydotoold as a user service (recommended over the system-level service, see Autostart for details):
# Grant /dev/uinput access to the input group
echo 'KERNEL=="uinput", GROUP="input", MODE="0660"' | sudo tee /etc/udev/rules.d/99-uinput.rules
sudo udevadm control --reload-rules
sudo udevadm trigger /dev/uinput
# Disable the system-level service if active
sudo systemctl stop ydotoold 2>/dev/null
sudo systemctl disable ydotoold 2>/dev/null
# Enable and start as a user service
systemctl --user enable ydotoold
systemctl --user start ydotoold
Verify it's running:
systemctl --user status ydotoold
Step 3: Add User to Input Group
To allow Alto to read keyboard events, add your user to the input group:
sudo usermod -a -G input $USER
Important: You must log out and log back in for this change to take effect.
Verify the change:
groups | grep input
Step 4: Clone and Install Alto
# Clone the repository
git clone https://codeberg.org/michael-nedjam/Alto.git
cd Alto
# Create a virtual environment
python3 -m venv venv
source venv/bin/activate
# Install Python dependencies
pip install -r requirements.txt
Step 5: Configure Mistral API Key
Set your Mistral API key as an environment variable:
export MISTRAL_API_KEY="your-api-key-here"
To make it permanent, add it to your shell profile (~/.bashrc, ~/.zshrc, etc.):
echo 'export MISTRAL_API_KEY="your-api-key-here"' >> ~/.bashrc
source ~/.bashrc
Step 6: Configure Your Push-to-Talk Key
Use the built-in key detection utility to find your preferred push-to-talk key:
python -m alto --detect-key
Press your desired key (e.g., F20, Right Control, etc.), and the tool will display its evdev name. Copy this name and update config.json:
{
"evdev_key": "KEY_F20"
}
Configuration
Alto is configured via the config.json file in the project root. Here are all available options:
{
"model": "voxtral-mini-latest",
"language": "fr",
"evdev_key": "KEY_F20",
"evdev_devices": "auto",
"typing_delay_ms": 12,
"log_level": "INFO",
"audio_sample_rate": 16000,
"audio_channels": 1,
"audio_dtype": "int16",
"audio_blocksize": 0,
"audio_enable_noise_suppression": false,
"audio_enable_agc": false,
"feedback_url": "https://codeberg.org/michael-nedjam/Alto/issues",
"enable_telemetry": false,
"check_for_updates": true,
"repository_url": "https://codeberg.org/michael-nedjam/Alto.git"
}
Configuration Options
General Settings
| Option | Type | Default | Description |
|---|---|---|---|
model |
string | "voxtral-mini-latest" |
Mistral model to use for transcription. Options: voxtral-mini-latest, voxtral-large-latest |
language |
string | "fr" |
Transcription language code (e.g., "en", "fr", "es", "de") |
evdev_key |
string | "KEY_F20" |
Push-to-talk key name (use --detect-key to find yours) |
evdev_devices |
string/array | "auto" |
Input devices to monitor. "auto" scans all keyboards, or provide specific device paths |
typing_delay_ms |
number | 12 |
Delay in milliseconds between each keystroke when typing transcribed text |
log_level |
string | "INFO" |
Logging verbosity. Options: "DEBUG", "INFO", "WARNING", "ERROR" |
Audio Recording Settings
| Option | Type | Default | Description |
|---|---|---|---|
audio_sample_rate |
number | 16000 |
Sample rate in Hz. Options: 8000, 16000, 22050, 44100, 48000. Recommended: 16000 for speech recognition |
audio_channels |
number | 1 |
Number of audio channels. Options: 1 (mono), 2 (stereo). Recommended: 1 for speech |
audio_dtype |
string | "int16" |
Audio data type. Options: "int16", "int32", "float32". Recommended: int16 for compatibility |
audio_blocksize |
number | 0 |
Audio buffer size in frames (0 = auto). Increase for lower CPU usage, decrease for lower latency |
audio_enable_noise_suppression |
boolean | false |
Enable simple noise gate to suppress background noise. Useful in noisy environments |
audio_enable_agc |
boolean | false |
Enable automatic gain control to normalize volume levels. Useful if microphone levels vary |
Feedback and Update Settings
| Option | Type | Default | Description |
|---|---|---|---|
feedback_url |
string | "https://codeberg.org/michael-nedjam/Alto/issues" |
URL for feedback and issue reporting |
enable_telemetry |
boolean | false |
Enable opt-in telemetry for anonymized usage data |
check_for_updates |
boolean | true |
Check for updates on daemon startup |
repository_url |
string | "https://codeberg.org/michael-nedjam/Alto.git" |
Git repository URL for update checking |
Audio Optimization Guidelines:
- Best quality: Keep defaults (
16000 Hz,mono,int16) - optimized for speech recognition - Noisy environment: Enable
audio_enable_noise_suppression: true - Variable microphone volume: Enable
audio_enable_agc: true - Lower latency: Set
audio_blocksize: 512or1024 - Lower CPU usage: Set
audio_blocksize: 2048or4096 - Note: Higher sample rates (44.1kHz, 48kHz) increase file size and processing time without improving speech accuracy
Example Configurations
English dictation with faster typing:
{
"model": "voxtral-mini-latest",
"language": "en",
"evdev_key": "KEY_RIGHTCTRL",
"typing_delay_ms": 5,
"log_level": "INFO"
}
Noisy environment optimization:
{
"model": "voxtral-mini-latest",
"language": "en",
"evdev_key": "KEY_F20",
"typing_delay_ms": 12,
"audio_enable_noise_suppression": true,
"audio_enable_agc": true,
"log_level": "INFO"
}
Low latency configuration:
{
"model": "voxtral-mini-latest",
"language": "en",
"evdev_key": "KEY_F20",
"typing_delay_ms": 5,
"audio_blocksize": 512,
"log_level": "INFO"
}
Debug mode with specific input device:
{
"model": "voxtral-mini-latest",
"language": "fr",
"evdev_key": "KEY_F20",
"evdev_devices": ["/dev/input/event3"],
"typing_delay_ms": 12,
"log_level": "DEBUG"
}
Usage
Starting Alto
From within the project directory with your virtual environment activated:
python -m alto
The daemon will:
- Perform preflight checks (ydotool running, input permissions, API key set)
- Start the system tray icon (green = ready)
- Begin listening for your push-to-talk key
Using Push-to-Talk Dictation
- Position your cursor where you want text to appear
- Press and hold your configured push-to-talk key
- Speak clearly while holding the key
- Release the key when done speaking
- Wait for transcription (1-3 seconds, icon turns yellow)
- Text appears automatically at cursor position
System Tray Icon States
The system tray icon provides visual feedback:
- Green: Ready for dictation
- Red: Recording audio (speak now)
- Yellow: Transcribing speech (processing)
- Grey: Error state (check logs)
Right-click the tray icon to quit Alto.
Command-Line Options
# Start Alto with default config
python -m alto
# Use a custom config file
python -m alto --config /path/to/config.json
# Run key detection utility
python -m alto --detect-key
Stopping Alto
- Via tray icon: Right-click and select "Quit"
- Via terminal: Press
Ctrl+C - Via signal: Send
SIGTERMto the process - Via systemd:
systemctl --user stop alto(if running as a service)
Auto-Update Notifications
Alto automatically checks for updates when it starts (if check_for_updates is enabled in config.json).
Update notification behavior:
- On startup, Alto checks the remote Git repository for new versions
- If an update is available, a menu item appears in the system tray: "⬆ Update Available: x.x.x"
- Click this menu item to display update instructions in the console/logs
To disable update checks:
Edit config.json and set:
{
"check_for_updates": false
}
Manual update (Git installation):
If you installed Alto via Git:
# Stop Alto
systemctl --user stop alto # (if running as service)
# Pull latest changes
cd /path/to/Alto
git pull origin main
# Update dependencies
source venv/bin/activate
pip install -r requirements.txt
# Restart Alto
systemctl --user start alto # (or run manually)
Manual update (package installation):
If you installed via package manager or manual installation:
- Visit: https://codeberg.org/michael-nedjam/Alto/releases
- Download the latest release
- Follow installation instructions in README.md
Autostart: Running Alto at Every Login
Alto can be configured to start automatically when you open your graphical session, using systemd user services. This section describes the full setup, including the ydotoold dependency.
Overview
The autostart relies on these components:
- ydotoold runs as a user service (not root) for keyboard simulation
- A udev rule grants your user access to
/dev/uinput - Alto runs as a user service, ordered after ydotoold and the graphical session
environment.dprovides the API key to systemd services (shell profile variables like.bashrcare not inherited by systemd).xprofileimports graphical session variables (DISPLAY, etc.) into systemd at login
Step 1: Set up ydotoold as a user service
By default, ydotoold may run as a system (root) service. This creates a socket owned by root that your user cannot access. Running it as a user service solves this.
# Stop and disable the system-level service (if active)
sudo systemctl stop ydotoold
sudo systemctl disable ydotoold
# Grant /dev/uinput access to the input group
echo 'KERNEL=="uinput", GROUP="input", MODE="0660"' | sudo tee /etc/udev/rules.d/99-uinput.rules
sudo udevadm control --reload-rules
sudo udevadm trigger /dev/uinput
# Enable and start ydotoold as a user service
systemctl --user enable ydotoold
systemctl --user start ydotoold
Verify it works:
systemctl --user status ydotoold
YDOTOOL_SOCKET=/run/user/$(id -u)/.ydotool_socket ydotool key 0:0 # should exit silently
Step 2: Configure the API key for systemd
Systemd user services do not inherit variables from .bashrc or .zshrc. Use environment.d instead:
mkdir -p ~/.config/environment.d
echo "MISTRAL_API_KEY=your-key-here" > ~/.config/environment.d/mistral.conf
This file is read by the systemd user manager at startup and makes the key available to all user services.
Step 3: Import graphical session variables
Alto needs DISPLAY (and XAUTHORITY on X11) to show the system tray icon. These variables are set by the display manager at login, but systemd services don't automatically see them.
Create ~/.xprofile (sourced by GDM, LightDM, SDDM at login):
cat >> ~/.xprofile << 'EOF'
# Import graphical session variables into systemd user session
systemctl --user import-environment DISPLAY XAUTHORITY DBUS_SESSION_BUS_ADDRESS
EOF
Step 4: Install and enable the Alto service
# Copy and configure the service file
cp alto.service.template alto.service
# Edit alto.service: replace ALTO_PATH with your installation path (e.g. /home/youruser/Alto)
nano alto.service
# Install
mkdir -p ~/.config/systemd/user
cp alto.service ~/.config/systemd/user/
systemctl --user daemon-reload
# Enable (auto-start at login) and start
systemctl --user enable alto
systemctl --user start alto
Startup chain
At each login, the following happens in order:
- Display manager (GDM/LightDM) starts the graphical session
.xprofileruns and importsDISPLAY/XAUTHORITYinto systemd- ydotoold.service starts, creates socket at
/run/user/$UID/.ydotool_socket - alto.service starts (after ydotoold + graphical-session), connects to the socket, listens for the hotkey
Managing the service
systemctl --user status alto # Check status
journalctl --user -u alto -f # Follow logs
systemctl --user restart alto # Restart (e.g. after config change)
systemctl --user stop alto # Stop
systemctl --user disable alto # Disable auto-start
Troubleshooting autostart
Service fails to start:
journalctl --user -u alto -n 50 # Check error logs
ydotool verification failed:
- Ensure ydotoold runs as a user service (not root)
- Check the udev rule:
ls -la /dev/uinputshould show groupinput - Verify your user is in the
inputgroup:groups | grep input PrivateTmpmust befalsein the service file (ydotool socket is in/tmpor/run/user)
Tray icon not showing:
- Check that
.xprofileexists and importsDISPLAY - Verify:
systemctl --user show-environment | grep DISPLAY
API key not found:
- Verify:
systemctl --user show-environment | grep MISTRAL - Check
~/.config/environment.d/mistral.confexists and is not empty
Troubleshooting
Alto won't start / Preflight checks failing
Issue: Error message about missing dependencies
Solutions:
-
ydotoold not running:
sudo systemctl start ydotool pidof ydotoold # Verify it's running
-
No input device permissions:
# Add user to input group sudo usermod -a -G input $USER # Log out and log back in groups | grep input # Verify membership
-
Missing MISTRAL_API_KEY:
echo $MISTRAL_API_KEY # Check if set export MISTRAL_API_KEY="your-key-here"
Push-to-talk key not detected
Issue: Pressing the PTT key doesn't start recording
Solutions:
-
Verify key name: Run
python -m alto --detect-keyand press your key. Updateconfig.jsonwith the exact name shown. -
Check logs: Look for "Key press event received" messages in the console output.
-
Test input permissions:
ls -l /dev/input/event* # You should be able to read these files
Text not being typed
Issue: Transcription completes but text doesn't appear
Solutions:
-
Verify ydotoold is running:
pidof ydotoold -
Check permissions on /dev/uinput:
ls -l /dev/uinput # Should be readable/writable by input group
-
Try slower typing speed: Increase
typing_delay_msinconfig.jsonto20or higher.
Transcription returns empty text
Issue: Recording completes but no text is typed
Solutions:
-
Check recording duration: Recordings shorter than 0.1s are ignored. Hold the key longer.
-
Verify audio input: Check that your microphone is working and selected as the default input device.
-
Test API connectivity:
curl -H "Authorization: Bearer $MISTRAL_API_KEY" https://api.mistral.ai/v1/models
-
Enable debug logging: Set
"log_level": "DEBUG"inconfig.jsonto see detailed transcription responses.
High CPU usage or lag
Issue: Alto consumes too many resources
Solutions:
-
Use smaller model: Switch to
voxtral-mini-latestinstead ofvoxtral-large-latest. -
Increase typing delay: Higher
typing_delay_msreduces CPU load during text injection. -
Check for audio device issues: Some audio configurations can cause high CPU usage. Try specifying a specific device.
Logs and Debugging
Enable debug logging for detailed output:
{
"log_level": "DEBUG"
}
Logs include:
- State transitions (READY → RECORDING → TRANSCRIBING → TYPING)
- Audio recording duration
- Transcription API responses
- Text injection progress
- Error messages and stack traces
Testing
Automated Tests
Alto includes comprehensive unit and integration tests to verify core functionality:
# Run all tests
python -m pytest tests/
# Run with verbose output
python -m pytest tests/ -v
# Run specific test file
python -m pytest tests/test_integration.py -v
The test suite covers:
- Unit tests for all core components (recorder, transcriber, injector, state machine, etc.)
- Integration tests for the complete workflow
- Error handling and recovery scenarios
- State machine transitions
- Tray icon integration
- Systemd service configuration
- Multi-distribution compatibility (Ubuntu/Debian, Fedora/RHEL, Arch Linux)
End-to-End Testing
For manual end-to-end testing in real-world environments:
-
Run verification script to check prerequisites:
python tests/verify_e2e.py -
Follow the test plan in
tests/END_TO_END_TESTING.md:- Core workflow testing in various applications (browsers, IDEs, terminals)
- State transition verification
- Error handling and recovery
- Performance and latency measurements
- Multilingual support
- Desktop environment compatibility (X11/Wayland)
- Systemd service integration
The end-to-end test plan includes detailed steps, expected results, and a comprehensive checklist for manual verification.
Multi-Distribution Testing
For testing across different Linux distributions:
-
Run multi-distribution tests to verify compatibility:
python -m pytest tests/test_multi_distro.py -v
-
Follow the multi-distribution test plan in
tests/MULTI_DISTRO_TESTING.md:- Installation testing on Ubuntu/Debian, Fedora/RHEL, and Arch Linux
- End-to-end workflow verification on each distribution
- System tray integration across different desktop environments
- Logging and error handling consistency
- Systemd service compatibility
The multi-distribution test plan provides a comprehensive checklist for verifying Alto works correctly across all major Linux distribution families.
CI/CD Pipeline
Alto uses Woodpecker CI (integrated with Codeberg) for automated testing and continuous deployment.
Pipeline Overview
The CI/CD pipeline automatically runs on every push, pull request, and tag creation. It consists of:
- Testing: Unit tests, integration tests, multi-distribution tests, performance tests
- Quality Checks: Code formatting (black, isort), linting (flake8)
- Build: Automated release artifact generation (Python wheel and source distribution)
- Coverage: Test coverage reporting and tracking
Quick Start
The pipeline is automatically triggered when you push code:
# Push to any branch (runs tests)
git push origin feature-branch
# Create a release (runs tests + builds artifacts)
git tag -a v1.0.0 -m "Release 1.0.0"
git push origin v1.0.0
Viewing Build Status
- Woodpecker Dashboard: https://ci.codeberg.org/michael-nedjam/Alto
- Build Logs: Available in the Woodpecker UI for each pipeline run
- Test Coverage: Generated in
htmlcov/directory after tests run
Local Testing (Pre-Push)
Before pushing, run tests locally to catch issues early:
# Run all tests with coverage
./run_tests.sh
# Or manually
source venv/bin/activate
pytest tests/ -v --cov=alto --cov-report=term-missing
Release Workflow
To create a new release:
- Update version numbers in relevant files
- Commit changes:
git commit -m "Prepare release v1.0.0" - Create tag:
git tag -a v1.0.0 -m "Release version 1.0.0" - Push:
git push origin main --tags - CI builds artifacts automatically
- Download artifacts from Woodpecker and publish to Codeberg Releases
Documentation
- Full CI/CD Guide:
.woodpecker/README.md - Quick Reference:
.woodpecker/QUICK_REFERENCE.md - Pipeline Configuration:
.woodpecker.yml
Key Features
- ✅ Automated testing on every commit
- ✅ Multi-distribution validation (Ubuntu, Fedora, Arch)
- ✅ Code quality enforcement (formatting, linting)
- ✅ Release automation for tags
- ✅ Coverage tracking to maintain test quality
- ✅ Fast feedback (typical build time: 3-5 minutes)
Architecture
Alto consists of several modular components:
- daemon.py: Main orchestrator that coordinates all components
- hotkey.py: Monitors keyboard events via evdev
- recorder.py: Captures audio from the microphone
- transcriber.py: Sends audio to Mistral API and receives text
- injector.py: Types text using ydotool
- tray.py: System tray icon for status indication
- state.py: State machine managing workflow transitions
Development Status
This project is in early development. See Development Roadmap for planned features including:
- Custom vocabulary and corrections
- Silence detection for automatic recording termination
- Audio feedback options
- Enhanced multilingual support
Contributing
Contributions are welcome. Please feel free to submit issues and pull requests on the repository.
License
Proprietary. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alto_dictation-1.0.21.tar.gz.
File metadata
- Download URL: alto_dictation-1.0.21.tar.gz
- Upload date:
- Size: 80.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55977cc15da92a7161119dd894eed55519598d344f8d9231ffe89e6d600c3682
|
|
| MD5 |
8f2b6bcb6d203152c1310283a187c973
|
|
| BLAKE2b-256 |
81587fd56b8b5578041c79e5a028f43713eb214ca69eb6da6ebddd005c80d2b4
|
File details
Details for the file alto_dictation-1.0.21-py3-none-any.whl.
File metadata
- Download URL: alto_dictation-1.0.21-py3-none-any.whl
- Upload date:
- Size: 75.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a5d02e0a6f7654fe3cfebcc6f25273afca647a186d9024e5a97e1c13d2a348f
|
|
| MD5 |
2cea31050c4e16ee35f3c18996b7c620
|
|
| BLAKE2b-256 |
2b4e29b9ddde9babfba7e330b617e7982facf3e1b89d3905c33d1a8578b3a0db
|