Skip to main content

Real-time voice transcription tool that converts speech to text and types it directly into any application

Project description

Whisper-to-Me

A real-time voice transcription tool that converts speech to text using FasterWhisper and types the result directly into any application via simulated keystrokes.

Features

  • Push-to-talk and tap-to-start recording modes with configurable hotkeys
  • Local speech recognition (no internet required)
  • Global hotkey support across all applications
  • Multiple language support with auto-detection
  • Multiple audio device support
  • System tray integration with visual recording indicator
  • Single instance protection - prevents multiple instances
  • Recording discard option in tap mode (press Esc to cancel)
  • Debug mode for troubleshooting
  • High-accuracy transcription using FasterWhisper
  • Real-time performance optimized for responsiveness

Requirements

  • Python 3.12+
  • CUDA-capable GPU (optional, CPU mode available)
  • Audio input device (microphone)
  • Linux operating system

Installation

From PyPI (Recommended)

# Install using pip
pip install whisper-to-me

# Or using uv (faster)
uv tool install whisper-to-me

From Source

  1. Install system dependencies:
# Ubuntu/Debian
sudo apt install portaudio19-dev libsndfile1-dev

# Fedora
sudo dnf install portaudio-devel libsndfile-devel

# Arch Linux
sudo pacman -S portaudio libsndfile
  1. Clone and install:
git clone https://github.com/marnunez/whisper-to-me.git
cd whisper-to-me
uv tool install .

Usage

Basic Usage

Simply run the command after installation:

whisper-to-me

The application will:

  1. Load the Whisper model (first run may take a moment)
  2. Show a system tray icon (microphone)
  3. Listen for the trigger key (Scroll Lock by default)

Push-to-talk mode (default): 4. Press and hold the trigger key to record 5. Release to transcribe and type the text

Tap mode (--tap-mode): 4. Tap the trigger key to start recording 5. Tap again to stop and transcribe, or press Esc to discard

Command Line Options

whisper-to-me [options]

Options:
  --model MODEL         Whisper model size (tiny, base, small, medium, large-v3)
  --device DEVICE       Processing device (cpu, cuda)
  --key KEY            Trigger key (single key or combination, e.g., <scroll_lock>, <ctrl>+<shift>+r)
  --language LANG      Target language (auto, en, es, fr, etc.)
  --list-devices       List available audio input devices
  --audio-device ID    Audio device ID to use
  --debug             Save recorded audio files for debugging
  --no-tray           Disable system tray icon
  --tap-mode          Use tap-to-start/tap-to-stop instead of push-to-talk
  --discard-key KEY   Key to discard recording in tap mode (default: esc)
  --help              Show help message

Examples

# Use default settings (large-v3 model, CUDA, scroll lock key, auto language)
whisper-to-me

# Use smaller model on CPU with caps lock trigger
whisper-to-me --model base --device cpu --key "<caps_lock>"

# Use key combination as trigger (Ctrl+Shift+R)
whisper-to-me --key "<ctrl>+<shift>+r"

# Use Ctrl+- (minus) as trigger
whisper-to-me --key "<ctrl>+-"

# Spanish transcription with debug mode
whisper-to-me --language es --debug --audio-device 2

# Run without system tray (terminal only)
whisper-to-me --no-tray

# List available audio devices
whisper-to-me --list-devices

# Use tap-to-start/tap-to-stop mode
whisper-to-me --tap-mode

# Tap mode with delete key to discard recordings
whisper-to-me --tap-mode --discard-key "<delete>"

Configuration

Whisper-to-Me supports persistent configuration through a TOML config file and multiple profiles for different use cases.

Configuration File

Location: ~/.config/whisper-to-me/config.toml

View the config file location:

whisper-to-me --config-path

Configuration Sections

General Settings ([general])

  • model: Whisper model size

    • Options: "tiny", "base", "small", "medium", "large-v3" (default)
    • Affects: Transcription accuracy vs speed trade-off
  • device: Processing device

    • Options: "cpu", "cuda" (default)
    • Affects: Transcription speed (GPU acceleration)
  • language: Target language

    • Options: "auto" (default), "en", "es", "fr", etc.
    • Affects: Transcription accuracy for specific languages
  • debug: Debug mode

    • Options: true, false (default)
    • Affects: Saves audio files for troubleshooting

Recording Settings ([recording])

  • mode: Recording mode

    • Options: "push-to-talk" (default), "tap-mode"
    • Affects: How recording is triggered
  • trigger_key: Key combination to trigger recording

    • Default: "<scroll_lock>"
    • Examples: "<caps_lock>", "<ctrl>+<shift>+r", "<alt>+<space>"
  • discard_key: Key to discard recording in tap mode

    • Default: "<esc>"
    • Options: Single keys like "<delete>", "<backspace>"
  • audio_device: Audio input device ID

    • Default: "" (system default)
    • Use --list-devices to see available devices

UI Settings ([ui])

  • use_tray: System tray integration
    • Options: true (default), false
    • Affects: Shows microphone icon in system tray

Advanced Settings ([advanced])

  • sample_rate: Audio sample rate

    • Default: 16000 Hz
    • Affects: Audio quality and processing speed
  • chunk_size: Audio processing chunk size

    • Default: 512
    • Affects: Real-time processing performance
  • vad_filter: Voice Activity Detection filter

    • Default: true
    • Affects: Noise filtering during recording

Configuration Profiles

Create and manage multiple configuration profiles for different use cases:

Profile Management

# List available profiles
whisper-to-me --list-profiles

# Use specific profile
whisper-to-me --profile work

# Create new profile from current settings
whisper-to-me --model tiny --device cpu --create-profile quick

Example Profile Configuration

[general]
model = "large-v3"
device = "cuda"
language = "auto"
debug = false
last_profile = "default"

[recording]
mode = "push-to-talk"
trigger_key = "<scroll_lock>"
discard_key = "<esc>"
audio_device = ""

[ui]
use_tray = true

[advanced]
sample_rate = 16000
chunk_size = 512
vad_filter = true

# Work profile - English only, medium model, caps lock trigger
[profiles.work]
[profiles.work.general]
language = "en"
model = "medium"
[profiles.work.recording]
trigger_key = "<caps_lock>"

# Spanish profile - Spanish language, large model
[profiles.spanish]
[profiles.spanish.general]
language = "es"
model = "large-v3"

# Quick profile - Fast transcription, CPU only
[profiles.quick]
[profiles.quick.general]
model = "tiny"
device = "cpu"
[profiles.quick.recording]
mode = "tap-mode"

Configuration Priority

Settings are applied in this order (highest to lowest priority):

  1. Command line arguments
  2. Profile settings
  3. Base configuration file
  4. Default values

System Tray

The system tray icon shows:

  • Gray microphone: Ready to record
  • Red microphone: Currently recording
  • Right-click menu: View status and quit

How It Works

  1. Single Instance Protection: Ensures only one instance runs at a time
  2. Global Hotkey Detection: Monitors for configured trigger key across all applications
  3. Audio Recording: Captures microphone input while key is held
  4. Speech Processing: Uses FasterWhisper for local speech-to-text conversion
  5. Keystroke Simulation: Types the transcribed text directly into the active application
  6. System Integration: Shows status in system tray with visual feedback

Performance Notes

  • First Run: May take longer as the Whisper model downloads (~1-3GB)
  • GPU Acceleration: CUDA significantly improves transcription speed
  • Model Sizes:
    • tiny: Fastest, least accurate (~39MB)
    • base: Good balance (~74MB)
    • small: Better accuracy (~244MB)
    • medium: High accuracy (~769MB)
    • large-v3: Best accuracy (~1550MB, default)
  • Audio Quality: Better microphone input improves transcription accuracy

Key Combinations

You can use key combinations as trigger keys:

# Single keys
whisper-to-me --key "<scroll_lock>"
whisper-to-me --key "<caps_lock>"
whisper-to-me --key "a"           # Single character

# Key combinations  
whisper-to-me --key "<ctrl>+<shift>+r"
whisper-to-me --key "<alt>+<space>"
whisper-to-me --key "<ctrl>+-"    # Ctrl + minus
whisper-to-me --key "<shift>+1"   # Shift + 1

Uses standard pynput format:

  • Named keys: Wrap in angle brackets <ctrl>, <alt>, <shift>, <esc>, <tab>, etc.
  • Single characters: Use directly a, 1, -, +, etc.
  • Combinations: Join with + symbol

Troubleshooting

Common Issues

  1. "Already running" error: Only one instance allowed - check system tray or use pkill whisper-to-me
  2. Permission errors: May need permissions for global key capture and microphone access
  3. Audio issues: Check microphone permissions with --list-devices
  4. CUDA errors: Install CUDA drivers or use --device cpu
  5. Trigger key not working: Try different keys like --key "<caps_lock>"

Debug Mode

Use --debug to save recorded audio files for troubleshooting:

whisper-to-me --debug

System Requirements Check

# Check audio devices
whisper-to-me --list-devices

# Test with smaller model
whisper-to-me --model tiny --device cpu

Uninstallation

# If installed with pip
pip uninstall whisper-to-me

# If installed with uv tool
uv tool uninstall whisper-to-me

Development

Setup Development Environment

git clone https://github.com/marnunez/whisper-to-me.git
cd whisper-to-me
uv sync --all-extras --dev

Run Tests

uv run pytest

Code Quality

uv run ruff check
uv run ruff format

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Add tests if applicable
  5. Ensure code quality (uv run ruff check && uv run pytest)
  6. Commit your changes (git commit -m 'Add amazing feature')
  7. Push to the branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_to_me-0.3.0.tar.gz (71.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisper_to_me-0.3.0-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file whisper_to_me-0.3.0.tar.gz.

File metadata

  • Download URL: whisper_to_me-0.3.0.tar.gz
  • Upload date:
  • Size: 71.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.0

File hashes

Hashes for whisper_to_me-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b39ed822b1a662b2c492941e18bc30b3e94215685b5e8514886c85c1467ab924
MD5 b51a4f7f838ba222a09b9d9d02917595
BLAKE2b-256 0e0dcfe27880ed553fe6a2b1f4a6b8cf96deebc98bcaf6936b9756685e106039

See more details on using hashes here.

File details

Details for the file whisper_to_me-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for whisper_to_me-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a1ea364e590bd2d300b05f12325a022d5250db2b3131a34c6bcfb4ec73f6292
MD5 b1a57fff5eb909cd3df82ca3a80b43e8
BLAKE2b-256 1d3cdf7a5c7a00ba38794ab9a1ccd951bbf5f6ea25aaffc0c040da95922eb8b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page