System-wide dictation for Linux using OpenAI's Whisper AI
Project description
Talky ๐ค
System-wide dictation for Linux using OpenAI's Whisper AI model.
Overview
Talky is a system-wide dictation application for Linux, similar to WisprFlow AI on Windows/Mac. It uses OpenAI's Whisper model for accurate speech-to-text transcription and works across all applications.
Status: โ Production Ready | 7/7 Integration Tests Passing | Full GUI | Phases 1-4 Complete
Features
- ๐ค Push-to-Talk: Hold hotkey, speak, release - just like WisprFlow AI!
- ๐ฏ System-wide: Works in any application (browsers, editors, terminals, chat apps)
- โก Fast: <1.5s latency with CUDA GPU acceleration
- ๐ Multi-language: 99 languages supported with real-time switching
- ๐ฅ๏ธ System Tray: Full GUI with language selection, settings, and setup wizard
- ๐ First-Run Wizard: Easy configuration on first launch
- โ๏ธ Settings Dialog: Complete GUI for all configuration options
- ๐ง Wayland Helper: Built-in permission checker and setup guide
- ๐ฅ๏ธ X11 & Wayland: Compatible with both display servers
- ๐ Privacy-focused: Local processing, no cloud required
- ๐ฆ Easy Install: Desktop integration and application menu entry
Requirements
System Requirements
- Linux (X11 or Wayland)
- Python 3.10+
- NVIDIA GPU with CUDA (recommended) or CPU
External Tools
- X11:
xdotool(for text injection) - Wayland:
ydotool(for text injection)
Install external tools:
# Ubuntu/Debian
sudo apt install xdotool ydotool
# Fedora
sudo dnf install xdotool ydotool
# Arch
sudo pacman -S xdotool ydotool
Wayland Permissions Setup
For Wayland users, ydotool requires special permissions:
# Add your user to input and uinput groups
sudo usermod -aG input,uinput $USER
# Load uinput kernel module
sudo modprobe uinput
# Create udev rule for persistent access
echo 'KERNEL=="uinput", MODE="0660", GROUP="uinput", OPTIONS+="static_node=uinput"' | sudo tee /etc/udev/rules.d/80-uinput.rules
# Reload udev rules
sudo udevadm control --reload-rules && sudo udevadm trigger
# Log out and back in for group changes to take effect
Installation
Option 1: Install from PyPI (Recommended - Coming Soon)
# Basic installation
pip install talky-dictation
# With GPU support
pip install talky-dictation[gpu]
# Install desktop integration (optional)
talky-install-desktop
Option 2: Install from Source (Development)
# Clone repository
git clone https://github.com/ChrisKalahiki/talky.git
cd talky
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# For NVIDIA GPU support
pip install faster-whisper[gpu]
# Install in editable mode
pip install -e .
Desktop Integration (Optional)
Add Talky to your application menu:
# Generate icons (run once)
python scripts/generate_icons.py
# Install desktop entry and icons
./scripts/install_desktop.sh
# For system-wide installation (all users)
sudo ./scripts/install_desktop.sh
This installs:
- Desktop entry file (
.desktop) - Icons at multiple resolutions (16x16 to 256x256)
- Appears in application menu under "AudioVideo > Utility"
Configuration
Configuration file location: ~/.config/talky/config.yaml
Default configuration:
audio:
sample_rate: 16000
channels: 1
buffer_size: 1024
whisper:
model: base # tiny, base, small, medium, large-v3
language: en # Language code or "auto"
device: auto # auto, cuda, cpu
compute_type: default
hotkeys:
toggle_recording: "<ctrl>+<super>" # Ctrl+Win (push-to-talk)
platform:
prefer_method: auto # auto, xdotool, ydotool, clipboard
typing_delay_ms: 0
Whisper Models
| Model | Size | VRAM | Speed | Accuracy |
|---|---|---|---|---|
| tiny | 39MB | ~1GB | Fastest | Good |
| base | 74MB | ~1GB | Very Fast | Better |
| small | 244MB | ~2GB | Fast | High |
| medium | 769MB | ~5GB | Moderate | Very High |
| large-v3 | 1.5GB | ~10GB | Slow | Highest |
Recommended: Start with base for balanced performance.
Usage
Running Talky
# Run from terminal
talky
# Or if installed in development mode
python -m talky.main
Basic Workflow (Push-to-Talk)
- Launch: Start Talky
- Hold Hotkey: Press and hold
Ctrl+Win(or your configured hotkey) - Speak: Recording starts immediately - speak while holding
- Release: Let go of the hotkey when finished speaking
- Wait: Brief processing (<1.5s with GPU)
- Text Appears: Transcribed text automatically inserted at cursor
Just like WisprFlow AI - hold to talk, release to transcribe!
GUI Features
System Tray
- Visual Status Indicators: Icon changes color based on state (idle/recording/processing)
- Language Selection: Quick-switch between 99 languages via tray menu
- Settings: Access full configuration GUI from tray
- About: View version and current configuration
- Desktop Notifications: Get notified on transcription completion
First-Run Setup Wizard
On first launch, Talky guides you through:
- Welcome and features overview
- Platform detection and Wayland setup (if needed)
- Whisper model and language selection
- Configuration summary and autostart option
Skip with: talky --skip-setup-wizard
Settings Dialog
Access via tray menu โ Settings:
- General Tab: Version info, config file location
- Whisper Tab: Model selection, language, device (CUDA/CPU)
- Hotkeys Tab: Configure push-to-talk hotkey
- Platform Tab: View system info (display server, DE, tools)
CLI Options
# Standard usage
talky
# Disable system tray (headless mode)
talky --no-tray
# Check Wayland setup status
talky --wayland-setup
# Show complete Wayland setup guide
talky --wayland-setup-guide
# Autostart management
talky --enable-autostart
talky --disable-autostart
talky --autostart-status
# Skip first-run wizard
talky --skip-setup-wizard
Platform Support
X11 Support โ
- Global hotkeys: Native support via
pynput - Text injection:
xdotool(primary) or clipboard - Works on: GNOME (X11), KDE (X11), XFCE, MATE, etc.
Wayland Support โ ๏ธ
- Global hotkeys: Desktop-specific (GNOME, KDE) or manual config
- Text injection:
ydotool(requires setup) or clipboard - Works on: GNOME (Wayland), KDE (Wayland), Sway, Hyprland
Note: Wayland has security restrictions that require additional setup. See Wayland Permissions Setup above.
Testing & Development
Test Suites
Comprehensive test suites are available for quality assurance:
Transcription Quality Tests
# Interactive tests with live microphone
python tests/test_transcription_quality.py
Features:
- Live recording tests with similarity metrics
- Multi-language validation
- Pass/fail criteria (80% similarity threshold)
- Automated report generation
Performance Benchmarking
# Benchmark all components
python tests/benchmark_performance.py
Measures:
- Audio capture latency (<50ms target)
- Whisper inference time by duration
- End-to-end workflow (<1.5s target)
- Memory usage tracking
- Real-time factor (RTF) calculations
Memory Profiling
# Profile memory usage
python tests/profile_memory.py
Analyzes:
- Model loading memory footprint
- Memory usage during repeated transcriptions
- Memory leak detection
- Component-level profiling
Integration Tests
# Run all integration tests
python tests/test_integration.py
Status: 7/7 tests passing โ
Contributing
See CLAUDE.md for development guidelines and architecture overview.
Autostart
To launch Talky automatically when you log in:
# Enable autostart
talky --enable-autostart
# Disable autostart
talky --disable-autostart
# Check status
talky --autostart-status
Or edit your config file (~/.config/talky/config.yaml):
autostart:
enabled: true
delay_seconds: 5 # Wait 5 seconds after login before starting
How it works:
- Creates a
.desktopfile in~/.config/autostart/ - Uses the standard XDG Autostart specification
- Works across all Linux desktop environments (GNOME, KDE, XFCE, etc.)
- You can also manage it via your desktop's "Startup Applications" settings
Notes:
- Autostart is disabled by default (opt-in)
- Requires system tray mode (autostart won't work with
--no-tray) - Desktop file automatically updates when you upgrade Talky
Troubleshooting
No audio capture
# Check audio devices
python -c "import sounddevice as sd; print(sd.query_devices())"
# Test PipeWire/PulseAudio
pactl list sources
Text not injecting (X11)
# Install xdotool
sudo apt install xdotool
# Test manually
xdotool type "test"
Text not injecting (Wayland)
# Check ydotool service
systemctl --user status ydotool
# Verify permissions
groups | grep -E 'input|uinput'
# Test manually
ydotool type "test"
CUDA not detected
# Check NVIDIA driver
nvidia-smi
# Verify CUDA installation
python -c "import torch; print(torch.cuda.is_available())"
Project Structure
talky/
โโโ src/talky/
โ โโโ audio/ # Audio capture
โ โโโ whisper/ # Whisper integration
โ โโโ input/ # Text injection
โ โโโ hotkeys/ # Hotkey management
โ โโโ ui/ # System tray & UI
โ โโโ utils/ # Config, logging, platform
โโโ config/ # Default configuration
โโโ tests/ # Test suite
โโโ PROJECT_PLAN.md # Development roadmap
โโโ README.md # This file
Development Status
Phase 1: Core Foundation โ Complete
- Project structure w/ modular architecture
- Platform detection (X11/Wayland/DE/CUDA)
- Abstract interfaces (AudioCapture, TextInjector, HotkeyManager)
- YAML configuration system
- Audio capture (sounddevice, 16kHz mono)
Phase 2: Platform Backends โ Complete
- X11 text injection (xdotool โ pynput โ clipboard fallback)
- Wayland text injection (ydotool โ clipboard fallback)
- X11 hotkeys (pynput global listener)
- Wayland hotkeys (DE-specific + manual compositor config)
- Push-to-talk implementation (on_press/on_release callbacks)
Phase 3: Whisper Integration โ Complete
- faster-whisper engine w/ CUDA support
- Multi-language support (99 languages)
- Model management & caching (~/.cache/talky/models/)
- Voice Activity Detection (VAD)
- Main application orchestrator (main.py)
- Full end-to-end pipeline working
Integration Tests: โ 7/7 Passing
- Configuration System โ
- Platform Detection โ
- Audio Capture โ
- Whisper Engine โ
- Text Injector โ
- Hotkey Manager (Push-to-Talk) โ
- End-to-End Workflow โ
Phase 4: UI & Polish โณ Not Started
- System tray interface (pystray)
- Desktop notifications
- Settings GUI
- Visual recording state indicator
Phase 5: Packaging โณ Not Started
- PyPI distribution
- AppImage build
- Distribution packages (deb/rpm/AUR)
- Systemd user service
Contributing
Contributions welcome! See PROJECT_PLAN.md for development roadmap.
License
MIT License - See LICENSE file for details
Acknowledgments
- OpenAI Whisper - Speech recognition model
- faster-whisper - Optimized Whisper implementation
- Inspired by WisprFlow AI
Project Status
Development Phases
- โ Phase 1: Core Foundation (Config, Platform Detection, Interfaces)
- โ Phase 2: Platform Backends (X11/Wayland Text Injection & Hotkeys)
- โ Phase 3: Whisper Integration (faster-whisper, Multi-language, CUDA)
- โ Phase 4: UI & Integration (System Tray, Settings GUI, Setup Wizard)
- ๐ง Phase 5: Testing & Packaging (In Progress - High-priority items complete)
What's Working
- โ Push-to-talk dictation on X11 and Wayland
- โ 99 language support with real-time switching
- โ CUDA GPU acceleration (<1.5s transcription)
- โ System tray with visual indicators
- โ Complete settings GUI and first-run wizard
- โ Desktop integration (app menu, icons)
- โ Wayland setup checker and guide
- โ Comprehensive test suites (quality, performance, memory)
- โ 7/7 integration tests passing
- โ PyPI-ready packaging (pyproject.toml)
What's Next
- โณ Cross-platform validation (Ubuntu, Fedora, Arch)
- โณ Application compatibility testing
- โณ Native packages (.deb, .rpm, AUR)
- โณ AppImage build
- โณ PyPI publication
See PROJECT_PLAN.md for detailed roadmap.
Support
- ๐ Issues: GitHub Issues
- ๐ฌ Discussions: GitHub Discussions
- ๐ง Email: your.email@example.com
Made with โค๏ธ for the Linux community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file talky_dictation-0.5.0.tar.gz.
File metadata
- Download URL: talky_dictation-0.5.0.tar.gz
- Upload date:
- Size: 112.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
775894b646447499fbab2cdc083b2b9fc0bf41281487136d848d670bb84a5b62
|
|
| MD5 |
8d5007b6603fb6dbaa8846c493d73463
|
|
| BLAKE2b-256 |
6f058a66f3c85969c81d24b54b53b213dcc1878ef9d76d2b01a2f2802b9f0c8f
|
File details
Details for the file talky_dictation-0.5.0-py3-none-any.whl.
File metadata
- Download URL: talky_dictation-0.5.0-py3-none-any.whl
- Upload date:
- Size: 51.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a60fe9349fd0525db389fa6fed7dd1d7b1f161fbf2fabd9365974b6cf171845e
|
|
| MD5 |
279882cc4ae247271f8114d017806618
|
|
| BLAKE2b-256 |
789ef86df4268bffa388cfb371c0c54c033e4f8de28b281aa3bfb23a12521eb9
|