Real-time voice transcription tool that converts speech to text and types it directly into any application
Project description
Whisper-to-Me
A real-time voice transcription tool that converts speech to text using FasterWhisper and types the result directly into any application via simulated keystrokes.
Features
- Push-to-talk and tap-to-start recording modes with configurable hotkeys
- Local speech recognition (no internet required)
- Global hotkey support across all applications
- Multiple language support with auto-detection
- Multiple audio device support
- System tray integration with visual recording indicator
- Single instance protection - prevents multiple instances
- Recording discard option in tap mode (press Esc to cancel)
- Debug mode for troubleshooting
- High-accuracy transcription using FasterWhisper
- Real-time performance optimized for responsiveness
Requirements
- Python 3.12+
- CUDA-capable GPU (optional, CPU mode available)
- Audio input device (microphone)
- Linux operating system
Installation
From PyPI (Recommended)
# Install using pip
pip install whisper-to-me
# Or using uv (faster)
uv tool install whisper-to-me
From Source
- Install system dependencies:
# Ubuntu/Debian
sudo apt install portaudio19-dev libsndfile1-dev
# Fedora
sudo dnf install portaudio-devel libsndfile-devel
# Arch Linux
sudo pacman -S portaudio libsndfile
- Clone and install:
git clone https://github.com/marnunez/whisper-to-me.git
cd whisper-to-me
uv tool install .
Usage
Basic Usage
Simply run the command after installation:
whisper-to-me
The application will:
- Load the Whisper model (first run may take a moment)
- Show a system tray icon (microphone)
- Listen for the trigger key (Scroll Lock by default)
Push-to-talk mode (default): 4. Press and hold the trigger key to record 5. Release to transcribe and type the text
Tap mode (--tap-mode): 4. Tap the trigger key to start recording 5. Tap again to stop and transcribe, or press Esc to discard
Command Line Options
whisper-to-me [options]
Options:
--model MODEL Whisper model size (tiny, base, small, medium, large-v3)
--device DEVICE Processing device (cpu, cuda)
--key KEY Trigger key (single key or combination, e.g., <scroll_lock>, <ctrl>+<shift>+r)
--language LANG Target language (auto, en, es, fr, etc.)
--list-devices List available audio input devices
--audio-device ID Audio device ID to use
--debug Save recorded audio files for debugging
--no-tray Disable system tray icon
--tap-mode Use tap-to-start/tap-to-stop instead of push-to-talk
--discard-key KEY Key to discard recording in tap mode (default: esc)
--help Show help message
Examples
# Use default settings (large-v3 model, CUDA, scroll lock key, auto language)
whisper-to-me
# Use smaller model on CPU with caps lock trigger
whisper-to-me --model base --device cpu --key "<caps_lock>"
# Use key combination as trigger (Ctrl+Shift+R)
whisper-to-me --key "<ctrl>+<shift>+r"
# Use Ctrl+- (minus) as trigger
whisper-to-me --key "<ctrl>+-"
# Spanish transcription with debug mode
whisper-to-me --language es --debug --audio-device 2
# Run without system tray (terminal only)
whisper-to-me --no-tray
# List available audio devices
whisper-to-me --list-devices
# Use tap-to-start/tap-to-stop mode
whisper-to-me --tap-mode
# Tap mode with delete key to discard recordings
whisper-to-me --tap-mode --discard-key "<delete>"
Configuration
Whisper-to-Me supports persistent configuration through a TOML config file and multiple profiles for different use cases.
Configuration File
Location: ~/.config/whisper-to-me/config.toml
View the config file location:
whisper-to-me --config-path
Configuration Sections
General Settings ([general])
-
model: Whisper model size- Options:
"tiny","base","small","medium","large-v3"(default) - Affects: Transcription accuracy vs speed trade-off
- Options:
-
device: Processing device- Options:
"cpu","cuda"(default) - Affects: Transcription speed (GPU acceleration)
- Options:
-
language: Target language- Options:
"auto"(default),"en","es","fr", etc. - Affects: Transcription accuracy for specific languages
- Options:
-
debug: Debug mode- Options:
true,false(default) - Affects: Saves audio files for troubleshooting
- Options:
Recording Settings ([recording])
-
mode: Recording mode- Options:
"push-to-talk"(default),"tap-mode" - Affects: How recording is triggered
- Options:
-
trigger_key: Key combination to trigger recording- Default:
"<scroll_lock>" - Examples:
"<caps_lock>","<ctrl>+<shift>+r","<alt>+<space>"
- Default:
-
discard_key: Key to discard recording in tap mode- Default:
"<esc>" - Options: Single keys like
"<delete>","<backspace>"
- Default:
-
audio_device: Audio input device ID- Default:
""(system default) - Use
--list-devicesto see available devices
- Default:
UI Settings ([ui])
use_tray: System tray integration- Options:
true(default),false - Affects: Shows microphone icon in system tray
- Options:
Advanced Settings ([advanced])
-
sample_rate: Audio sample rate- Default:
16000Hz - Affects: Audio quality and processing speed
- Default:
-
chunk_size: Audio processing chunk size- Default:
512 - Affects: Real-time processing performance
- Default:
-
vad_filter: Voice Activity Detection filter- Default:
true - Affects: Noise filtering during recording
- Default:
Configuration Profiles
Create and manage multiple configuration profiles for different use cases:
Profile Management
# List available profiles
whisper-to-me --list-profiles
# Use specific profile
whisper-to-me --profile work
# Create new profile from current settings
whisper-to-me --model tiny --device cpu --create-profile quick
Example Profile Configuration
[general]
model = "large-v3"
device = "cuda"
language = "auto"
debug = false
last_profile = "default"
[recording]
mode = "push-to-talk"
trigger_key = "<scroll_lock>"
discard_key = "<esc>"
audio_device = ""
[ui]
use_tray = true
[advanced]
sample_rate = 16000
chunk_size = 512
vad_filter = true
# Work profile - English only, medium model, caps lock trigger
[profiles.work]
[profiles.work.general]
language = "en"
model = "medium"
[profiles.work.recording]
trigger_key = "<caps_lock>"
# Spanish profile - Spanish language, large model
[profiles.spanish]
[profiles.spanish.general]
language = "es"
model = "large-v3"
# Quick profile - Fast transcription, CPU only
[profiles.quick]
[profiles.quick.general]
model = "tiny"
device = "cpu"
[profiles.quick.recording]
mode = "tap-mode"
Configuration Priority
Settings are applied in this order (highest to lowest priority):
- Command line arguments
- Profile settings
- Base configuration file
- Default values
System Tray
The system tray icon shows:
- Gray microphone: Ready to record
- Red microphone: Currently recording
- Right-click menu: View status and quit
How It Works
- Single Instance Protection: Ensures only one instance runs at a time
- Global Hotkey Detection: Monitors for configured trigger key across all applications
- Audio Recording: Captures microphone input while key is held
- Speech Processing: Uses FasterWhisper for local speech-to-text conversion
- Keystroke Simulation: Types the transcribed text directly into the active application
- System Integration: Shows status in system tray with visual feedback
Performance Notes
- First Run: May take longer as the Whisper model downloads (~1-3GB)
- GPU Acceleration: CUDA significantly improves transcription speed
- Model Sizes:
tiny: Fastest, least accurate (~39MB)base: Good balance (~74MB)small: Better accuracy (~244MB)medium: High accuracy (~769MB)large-v3: Best accuracy (~1550MB, default)
- Audio Quality: Better microphone input improves transcription accuracy
Key Combinations
You can use key combinations as trigger keys:
# Single keys
whisper-to-me --key "<scroll_lock>"
whisper-to-me --key "<caps_lock>"
whisper-to-me --key "a" # Single character
# Key combinations
whisper-to-me --key "<ctrl>+<shift>+r"
whisper-to-me --key "<alt>+<space>"
whisper-to-me --key "<ctrl>+-" # Ctrl + minus
whisper-to-me --key "<shift>+1" # Shift + 1
Uses standard pynput format:
- Named keys: Wrap in angle brackets
<ctrl>,<alt>,<shift>,<esc>,<tab>, etc. - Single characters: Use directly
a,1,-,+, etc. - Combinations: Join with
+symbol
Troubleshooting
Common Issues
- "Already running" error: Only one instance allowed - check system
tray or use
pkill whisper-to-me - Permission errors: May need permissions for global key capture and microphone access
- Audio issues: Check microphone permissions with
--list-devices - CUDA errors: Install CUDA drivers or use
--device cpu - Trigger key not working: Try different keys like
--key "<caps_lock>"
Debug Mode
Use --debug to save recorded audio files for troubleshooting:
whisper-to-me --debug
System Requirements Check
# Check audio devices
whisper-to-me --list-devices
# Test with smaller model
whisper-to-me --model tiny --device cpu
Uninstallation
# If installed with pip
pip uninstall whisper-to-me
# If installed with uv tool
uv tool uninstall whisper-to-me
Development
Setup Development Environment
git clone https://github.com/marnunez/whisper-to-me.git
cd whisper-to-me
uv sync --all-extras --dev
Run Tests
uv run pytest
Code Quality
uv run ruff check
uv run ruff format
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests if applicable
- Ensure code quality (
uv run ruff check && uv run pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- FasterWhisper for fast speech recognition
- OpenAI Whisper for the underlying model
- PyNput for cross-platform input control
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whisper_to_me-0.4.0.tar.gz.
File metadata
- Download URL: whisper_to_me-0.4.0.tar.gz
- Upload date:
- Size: 113.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eac6b911634f8a64988b5668b47d416d5cba7285a77364338309c0c80f76c089
|
|
| MD5 |
56dcb12b631b38e9df8ec79ee9e9438f
|
|
| BLAKE2b-256 |
d677649b56d9844388ff15cc3b19bcec784c4fd65d40ba626a14d52519e470cd
|
File details
Details for the file whisper_to_me-0.4.0-py3-none-any.whl.
File metadata
- Download URL: whisper_to_me-0.4.0-py3-none-any.whl
- Upload date:
- Size: 47.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
827a8653f4b8dfb631ea34c00bd4dc679b6e07373deff6b6a604eeca70feb012
|
|
| MD5 |
04b7d7e40e2f2e35fa0f8fe798eb50fc
|
|
| BLAKE2b-256 |
4bd9f4b26a0baed6bc8eb503a889f85f25e459b239f075f0d70291554ac4763a
|