System-wide dictation for Linux using OpenAI's Whisper AI

These details have not been verified by PyPI

Project links

Project description

Talky 🎤

System-wide dictation for Linux using OpenAI's Whisper AI model.

Overview

Talky is a system-wide dictation application for Linux, similar to WisprFlow AI on Windows/Mac. It uses OpenAI's Whisper model for accurate speech-to-text transcription and works across all applications.

Status: ✅ Production Ready | 7/7 Integration Tests Passing | Full GUI | Phases 1-4 Complete

Features

🎤 Push-to-Talk: Hold hotkey, speak, release - just like WisprFlow AI!
🎯 System-wide: Works in any application (browsers, editors, terminals, chat apps)
⚡ Fast: <1.5s latency with CUDA GPU acceleration
🌍 Multi-language: 99 languages supported with real-time switching
🖥️ System Tray: Full GUI with language selection, settings, and setup wizard
📝 First-Run Wizard: Easy configuration on first launch
⚙️ Settings Dialog: Complete GUI for all configuration options
🔧 Wayland Helper: Built-in permission checker and setup guide
🖥️ X11 & Wayland: Compatible with both display servers
🔒 Privacy-focused: Local processing, no cloud required
📦 Easy Install: Desktop integration and application menu entry

Requirements

System Requirements

Linux (X11 or Wayland)
Python 3.10+
NVIDIA GPU with CUDA (recommended) or CPU

External Tools

X11: xdotool (for text injection)
Wayland: ydotool (for text injection)

Install external tools:

# Ubuntu/Debian
sudo apt install xdotool ydotool

# Fedora
sudo dnf install xdotool ydotool

# Arch
sudo pacman -S xdotool ydotool

Wayland Permissions Setup

For Wayland users, ydotool requires special permissions:

# Add your user to input and uinput groups
sudo usermod -aG input,uinput $USER

# Load uinput kernel module
sudo modprobe uinput

# Create udev rule for persistent access
echo 'KERNEL=="uinput", MODE="0660", GROUP="uinput", OPTIONS+="static_node=uinput"' | sudo tee /etc/udev/rules.d/80-uinput.rules

# Reload udev rules
sudo udevadm control --reload-rules && sudo udevadm trigger

# Log out and back in for group changes to take effect

Installation

Option 1: Install from PyPI (Recommended - Coming Soon)

# Basic installation
pip install talky-dictation

# With GPU support
pip install talky-dictation[gpu]

# Install desktop integration (optional)
talky-install-desktop

Option 2: Install from Source (Development)

# Clone repository
git clone https://github.com/ChrisKalahiki/talky.git
cd talky

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# For NVIDIA GPU support
pip install faster-whisper[gpu]

# Install in editable mode
pip install -e .

Desktop Integration (Optional)

Add Talky to your application menu:

# Generate icons (run once)
python scripts/generate_icons.py

# Install desktop entry and icons
./scripts/install_desktop.sh

# For system-wide installation (all users)
sudo ./scripts/install_desktop.sh

This installs:

Desktop entry file (.desktop)
Icons at multiple resolutions (16x16 to 256x256)
Appears in application menu under "AudioVideo > Utility"

Configuration

Configuration file location: ~/.config/talky/config.yaml

Default configuration:

audio:
  sample_rate: 16000
  channels: 1
  buffer_size: 1024

whisper:
  model: base           # tiny, base, small, medium, large-v3
  language: en          # Language code or "auto"
  device: auto          # auto, cuda, cpu
  compute_type: default

hotkeys:
  toggle_recording: "<ctrl>+<super>"  # Ctrl+Win (push-to-talk)

platform:
  prefer_method: auto   # auto, xdotool, ydotool, clipboard
  typing_delay_ms: 0

Whisper Models

Model	Size	VRAM	Speed	Accuracy
tiny	39MB	~1GB	Fastest	Good
base	74MB	~1GB	Very Fast	Better
small	244MB	~2GB	Fast	High
medium	769MB	~5GB	Moderate	Very High
large-v3	1.5GB	~10GB	Slow	Highest

Recommended: Start with base for balanced performance.

Usage

Running Talky

# Run from terminal
talky

# Or if installed in development mode
python -m talky.main

Basic Workflow (Push-to-Talk)

Launch: Start Talky
Hold Hotkey: Press and hold Ctrl+Win (or your configured hotkey)
Speak: Recording starts immediately - speak while holding
Release: Let go of the hotkey when finished speaking
Wait: Brief processing (<1.5s with GPU)
Text Appears: Transcribed text automatically inserted at cursor

Just like WisprFlow AI - hold to talk, release to transcribe!

GUI Features

System Tray

Visual Status Indicators: Icon changes color based on state (idle/recording/processing)
Language Selection: Quick-switch between 99 languages via tray menu
Settings: Access full configuration GUI from tray
About: View version and current configuration
Desktop Notifications: Get notified on transcription completion

First-Run Setup Wizard

On first launch, Talky guides you through:

Welcome and features overview
Platform detection and Wayland setup (if needed)
Whisper model and language selection
Configuration summary and autostart option

Skip with: talky --skip-setup-wizard

Settings Dialog

Access via tray menu → Settings:

General Tab: Version info, config file location
Whisper Tab: Model selection, language, device (CUDA/CPU)
Hotkeys Tab: Configure push-to-talk hotkey
Platform Tab: View system info (display server, DE, tools)

CLI Options

# Standard usage
talky

# Disable system tray (headless mode)
talky --no-tray

# Check Wayland setup status
talky --wayland-setup

# Show complete Wayland setup guide
talky --wayland-setup-guide

# Autostart management
talky --enable-autostart
talky --disable-autostart
talky --autostart-status

# Skip first-run wizard
talky --skip-setup-wizard

Platform Support

X11 Support ✅

Global hotkeys: Native support via pynput
Text injection: xdotool (primary) or clipboard
Works on: GNOME (X11), KDE (X11), XFCE, MATE, etc.

Wayland Support ⚠️

Global hotkeys: Desktop-specific (GNOME, KDE) or manual config
Text injection: ydotool (requires setup) or clipboard
Works on: GNOME (Wayland), KDE (Wayland), Sway, Hyprland

Note: Wayland has security restrictions that require additional setup. See Wayland Permissions Setup above.

Testing & Development

Test Suites

Comprehensive test suites are available for quality assurance:

Transcription Quality Tests

# Interactive tests with live microphone
python tests/test_transcription_quality.py

Features:

Live recording tests with similarity metrics
Multi-language validation
Pass/fail criteria (80% similarity threshold)
Automated report generation

Performance Benchmarking

# Benchmark all components
python tests/benchmark_performance.py

Measures:

Audio capture latency (<50ms target)
Whisper inference time by duration
End-to-end workflow (<1.5s target)
Memory usage tracking
Real-time factor (RTF) calculations

Memory Profiling

# Profile memory usage
python tests/profile_memory.py

Analyzes:

Model loading memory footprint
Memory usage during repeated transcriptions
Memory leak detection
Component-level profiling

Integration Tests

# Run all integration tests
python tests/test_integration.py

Status: 7/7 tests passing ✅

Contributing

See CLAUDE.md for development guidelines and architecture overview.

Autostart

To launch Talky automatically when you log in:

# Enable autostart
talky --enable-autostart

# Disable autostart
talky --disable-autostart

# Check status
talky --autostart-status

Or edit your config file (~/.config/talky/config.yaml):

autostart:
  enabled: true
  delay_seconds: 5  # Wait 5 seconds after login before starting

How it works:

Creates a .desktop file in ~/.config/autostart/
Uses the standard XDG Autostart specification
Works across all Linux desktop environments (GNOME, KDE, XFCE, etc.)
You can also manage it via your desktop's "Startup Applications" settings

Notes:

Autostart is disabled by default (opt-in)
Requires system tray mode (autostart won't work with --no-tray)
Desktop file automatically updates when you upgrade Talky

Troubleshooting

No audio capture

# Check audio devices
python -c "import sounddevice as sd; print(sd.query_devices())"

# Test PipeWire/PulseAudio
pactl list sources

Text not injecting (X11)

# Install xdotool
sudo apt install xdotool

# Test manually
xdotool type "test"

Text not injecting (Wayland)

# Check ydotool service
systemctl --user status ydotool

# Verify permissions
groups | grep -E 'input|uinput'

# Test manually
ydotool type "test"

CUDA not detected

# Check NVIDIA driver
nvidia-smi

# Verify CUDA installation
python -c "import torch; print(torch.cuda.is_available())"

Project Structure

talky/
├── src/talky/
│   ├── audio/          # Audio capture
│   ├── whisper/        # Whisper integration
│   ├── input/          # Text injection
│   ├── hotkeys/        # Hotkey management
│   ├── ui/             # System tray & UI
│   └── utils/          # Config, logging, platform
├── config/             # Default configuration
├── tests/              # Test suite
├── PROJECT_PLAN.md     # Development roadmap
└── README.md           # This file

Development Status

Phase 1: Core Foundation ✅ Complete

Project structure w/ modular architecture
Platform detection (X11/Wayland/DE/CUDA)
Abstract interfaces (AudioCapture, TextInjector, HotkeyManager)
YAML configuration system
Audio capture (sounddevice, 16kHz mono)

Phase 2: Platform Backends ✅ Complete

X11 text injection (xdotool → pynput → clipboard fallback)
Wayland text injection (ydotool → clipboard fallback)
X11 hotkeys (pynput global listener)
Wayland hotkeys (DE-specific + manual compositor config)
Push-to-talk implementation (on_press/on_release callbacks)

Phase 3: Whisper Integration ✅ Complete

faster-whisper engine w/ CUDA support
Multi-language support (99 languages)
Model management & caching (~/.cache/talky/models/)
Voice Activity Detection (VAD)
Main application orchestrator (main.py)
Full end-to-end pipeline working

Integration Tests: ✅ 7/7 Passing

Configuration System ✓
Platform Detection ✓
Audio Capture ✓
Whisper Engine ✓
Text Injector ✓
Hotkey Manager (Push-to-Talk) ✓
End-to-End Workflow ✓

Phase 4: UI & Polish ⏳ Not Started

System tray interface (pystray)
Desktop notifications
Settings GUI
Visual recording state indicator

Phase 5: Packaging ⏳ Not Started

PyPI distribution
AppImage build
Distribution packages (deb/rpm/AUR)
Systemd user service

Contributing

Contributions welcome! See PROJECT_PLAN.md for development roadmap.

License

MIT License - See LICENSE file for details

Acknowledgments

OpenAI Whisper - Speech recognition model
faster-whisper - Optimized Whisper implementation
Inspired by WisprFlow AI

Project Status

Development Phases

✅ Phase 1: Core Foundation (Config, Platform Detection, Interfaces)
✅ Phase 2: Platform Backends (X11/Wayland Text Injection & Hotkeys)
✅ Phase 3: Whisper Integration (faster-whisper, Multi-language, CUDA)
✅ Phase 4: UI & Integration (System Tray, Settings GUI, Setup Wizard)
🚧 Phase 5: Testing & Packaging (In Progress - High-priority items complete)

What's Working

✅ Push-to-talk dictation on X11 and Wayland
✅ 99 language support with real-time switching
✅ CUDA GPU acceleration (<1.5s transcription)
✅ System tray with visual indicators
✅ Complete settings GUI and first-run wizard
✅ Desktop integration (app menu, icons)
✅ Wayland setup checker and guide
✅ Comprehensive test suites (quality, performance, memory)
✅ 7/7 integration tests passing
✅ PyPI-ready packaging (pyproject.toml)

What's Next

⏳ Cross-platform validation (Ubuntu, Fedora, Arch)
⏳ Application compatibility testing
⏳ Native packages (.deb, .rpm, AUR)
⏳ AppImage build
⏳ PyPI publication

See PROJECT_PLAN.md for detailed roadmap.

Support

🐛 Issues: GitHub Issues
💬 Discussions: GitHub Discussions
📧 Email: your.email@example.com

Made with ❤️ for the Linux community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.0

Nov 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

talky_dictation-0.5.0.tar.gz (112.5 kB view details)

Uploaded Nov 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

talky_dictation-0.5.0-py3-none-any.whl (51.8 kB view details)

Uploaded Nov 7, 2025 Python 3

File details

Details for the file talky_dictation-0.5.0.tar.gz.

File metadata

Download URL: talky_dictation-0.5.0.tar.gz
Upload date: Nov 7, 2025
Size: 112.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for talky_dictation-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`775894b646447499fbab2cdc083b2b9fc0bf41281487136d848d670bb84a5b62`
MD5	`8d5007b6603fb6dbaa8846c493d73463`
BLAKE2b-256	`6f058a66f3c85969c81d24b54b53b213dcc1878ef9d76d2b01a2f2802b9f0c8f`

See more details on using hashes here.

File details

Details for the file talky_dictation-0.5.0-py3-none-any.whl.

File metadata

Download URL: talky_dictation-0.5.0-py3-none-any.whl
Upload date: Nov 7, 2025
Size: 51.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for talky_dictation-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a60fe9349fd0525db389fa6fed7dd1d7b1f161fbf2fabd9365974b6cf171845e`
MD5	`279882cc4ae247271f8114d017806618`
BLAKE2b-256	`789ef86df4268bffa388cfb371c0c54c033e4f8de28b281aa3bfb23a12521eb9`

See more details on using hashes here.

talky-dictation 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Talky 🎤

Overview

Features

Requirements

System Requirements

External Tools

Wayland Permissions Setup

Installation

Option 1: Install from PyPI (Recommended - Coming Soon)

Option 2: Install from Source (Development)

Desktop Integration (Optional)

Configuration

Whisper Models

Usage

Running Talky

Basic Workflow (Push-to-Talk)

GUI Features

System Tray

First-Run Setup Wizard

Settings Dialog

CLI Options

Platform Support

X11 Support ✅

Wayland Support ⚠️

Testing & Development

Test Suites

Transcription Quality Tests

Performance Benchmarking

Memory Profiling

Integration Tests

Contributing

Autostart

Troubleshooting

No audio capture

Text not injecting (X11)

Text not injecting (Wayland)

CUDA not detected

Project Structure

Development Status

Phase 1: Core Foundation ✅ Complete

Phase 2: Platform Backends ✅ Complete

Phase 3: Whisper Integration ✅ Complete

Phase 4: UI & Polish ⏳ Not Started

Phase 5: Packaging ⏳ Not Started

Contributing

License

Acknowledgments

Project Status

Development Phases

What's Working

What's Next

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes