Real-time voice transcription tool that converts speech to text and types it directly into any application
Project description
Whisper-to-Me
A real-time voice transcription tool that converts speech to text using FasterWhisper and types the result directly into any application via simulated keystrokes.
Features
- Push-to-talk and tap-to-start recording modes with configurable hotkeys
- Local speech recognition (no internet required)
- Global hotkey support across all applications
- Multiple language support with auto-detection
- Multiple audio device support
- System tray integration with visual recording indicator
- Single instance protection - prevents multiple instances
- Recording discard option in tap mode (press Esc to cancel)
- Debug mode for troubleshooting
- High-accuracy transcription using FasterWhisper
- Real-time performance optimized for responsiveness
Requirements
- Python 3.12+
- CUDA-capable GPU (optional, CPU mode available)
- Audio input device (microphone)
- Linux operating system
Installation
From PyPI (Recommended)
# Install using pip
pip install whisper-to-me
# Or using uv (faster)
uv tool install whisper-to-me
From Source
- Install system dependencies:
# Ubuntu/Debian
sudo apt install portaudio19-dev libsndfile1-dev
# Fedora
sudo dnf install portaudio-devel libsndfile-devel
# Arch Linux
sudo pacman -S portaudio libsndfile
- Clone and install:
git clone https://github.com/marnunez/whisper-to-me.git
cd whisper-to-me
uv tool install .
Usage
Basic Usage
Simply run the command after installation:
whisper-to-me
The application will:
- Load the Whisper model (first run may take a moment)
- Show a system tray icon (microphone)
- Listen for the trigger key (Scroll Lock by default)
Push-to-talk mode (default): 4. Press and hold the trigger key to record 5. Release to transcribe and type the text
Tap mode (--tap-mode): 4. Tap the trigger key to start recording 5. Tap again to stop and transcribe, or press Esc to discard
Command Line Options
whisper-to-me [options]
Options:
--model MODEL Whisper model size (tiny, base, small, medium, large-v3)
--device DEVICE Processing device (cpu, cuda)
--key KEY Trigger key (scroll_lock, pause, ctrl, alt, caps, etc.)
--language LANG Target language (auto, en, es, fr, etc.)
--list-devices List available audio input devices
--audio-device ID Audio device ID to use
--debug Save recorded audio files for debugging
--no-tray Disable system tray icon
--tap-mode Use tap-to-start/tap-to-stop instead of push-to-talk
--discard-key KEY Key to discard recording in tap mode (default: esc)
--help Show help message
Examples
# Use default settings (large-v3 model, CUDA, scroll lock key, auto language)
whisper-to-me
# Use smaller model on CPU with caps lock trigger
whisper-to-me --model base --device cpu --key caps_lock
# Spanish transcription with debug mode
whisper-to-me --language es --debug --audio-device 2
# Run without system tray (terminal only)
whisper-to-me --no-tray
# List available audio devices
whisper-to-me --list-devices
# Use tap-to-start/tap-to-stop mode
whisper-to-me --tap-mode
# Tap mode with delete key to discard recordings
whisper-to-me --tap-mode --discard-key delete
System Tray
The system tray icon shows:
- Gray microphone: Ready to record
- Red microphone: Currently recording
- Right-click menu: View status and quit
How It Works
- Single Instance Protection: Ensures only one instance runs at a time
- Global Hotkey Detection: Monitors for configured trigger key across all applications
- Audio Recording: Captures microphone input while key is held
- Speech Processing: Uses FasterWhisper for local speech-to-text conversion
- Keystroke Simulation: Types the transcribed text directly into the active application
- System Integration: Shows status in system tray with visual feedback
Performance Notes
- First Run: May take longer as the Whisper model downloads (~1-3GB)
- GPU Acceleration: CUDA significantly improves transcription speed
- Model Sizes:
tiny: Fastest, least accurate (~39MB)base: Good balance (~74MB)small: Better accuracy (~244MB)medium: High accuracy (~769MB)large-v3: Best accuracy (~1550MB, default)
- Audio Quality: Better microphone input improves transcription accuracy
Troubleshooting
Common Issues
- "Already running" error: Only one instance allowed - check system
tray or use
pkill whisper-to-me - Permission errors: May need permissions for global key capture and microphone access
- Audio issues: Check microphone permissions with
--list-devices - CUDA errors: Install CUDA drivers or use
--device cpu - Trigger key not working: Try different keys like
--key caps_lock
Debug Mode
Use --debug to save recorded audio files for troubleshooting:
whisper-to-me --debug
System Requirements Check
# Check audio devices
whisper-to-me --list-devices
# Test with smaller model
whisper-to-me --model tiny --device cpu
Uninstallation
# If installed with pip
pip uninstall whisper-to-me
# If installed with uv tool
uv tool uninstall whisper-to-me
Development
Setup Development Environment
git clone https://github.com/marnunez/whisper-to-me.git
cd whisper-to-me
uv sync --all-extras --dev
Run Tests
uv run pytest
Code Quality
uv run ruff check
uv run ruff format
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests if applicable
- Ensure code quality (
uv run ruff check && uv run pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- FasterWhisper for fast speech recognition
- OpenAI Whisper for the underlying model
- PyNput for cross-platform input control
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whisper_to_me-0.2.0.tar.gz.
File metadata
- Download URL: whisper_to_me-0.2.0.tar.gz
- Upload date:
- Size: 65.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f747ea18e52c9d3cc8739b65c26a73ab1173e1341c32039e235aff823f5bdf7e
|
|
| MD5 |
933686705f3e475784399acaac20c2ee
|
|
| BLAKE2b-256 |
6293be18e1114b1c3acecba5e47a6d9d402678793183b01a17baf30f19ef320b
|
File details
Details for the file whisper_to_me-0.2.0-py3-none-any.whl.
File metadata
- Download URL: whisper_to_me-0.2.0-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72ccd6db106fb2e87d6d9db4f337faf8ff261454b1907c8013bbaee47036c966
|
|
| MD5 |
ba31f0aded709f1ceb875eed8b3cdaf7
|
|
| BLAKE2b-256 |
983266b69ea333df253fcb66ee1094c5cef5b7863e9b57329d6a433757716c83
|