Real-time voice transcription tool that converts speech to text and types it directly into any application

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Whisper-to-Me

A real-time voice transcription tool that converts speech to text using FasterWhisper and types the result directly into any application via simulated keystrokes.

Features

Push-to-talk and tap-to-start recording modes with configurable hotkeys
Local speech recognition (no internet required)
Global hotkey support across all applications
Multiple language support with auto-detection
Multiple audio device support
System tray integration with visual recording indicator
Single instance protection - prevents multiple instances
Recording discard option in tap mode (press Esc to cancel)
Debug mode for troubleshooting
High-accuracy transcription using FasterWhisper
Real-time performance optimized for responsiveness

Requirements

Python 3.12+
CUDA-capable GPU (optional, CPU mode available)
Audio input device (microphone)
Linux operating system

Installation

From PyPI (Recommended)

# Install using pip
pip install whisper-to-me

# Or using uv (faster)
uv tool install whisper-to-me

From Source

Install system dependencies:

# Ubuntu/Debian
sudo apt install portaudio19-dev libsndfile1-dev

# Fedora
sudo dnf install portaudio-devel libsndfile-devel

# Arch Linux
sudo pacman -S portaudio libsndfile

Clone and install:

git clone https://github.com/marnunez/whisper-to-me.git
cd whisper-to-me
uv tool install .

Usage

Basic Usage

Simply run the command after installation:

whisper-to-me

The application will:

Load the Whisper model (first run may take a moment)
Show a system tray icon (microphone)
Listen for the trigger key (Scroll Lock by default)

Push-to-talk mode (default): 4. Press and hold the trigger key to record 5. Release to transcribe and type the text

Tap mode (--tap-mode): 4. Tap the trigger key to start recording 5. Tap again to stop and transcribe, or press Esc to discard

Command Line Options

whisper-to-me [options]

Options:
  --model MODEL         Whisper model size (tiny, base, small, medium, large-v3)
  --device DEVICE       Processing device (cpu, cuda)
  --key KEY            Trigger key (single key or combination, e.g., <scroll_lock>, <ctrl>+<shift>+r)
  --language LANG      Target language (auto, en, es, fr, etc.)
  --list-devices       List available audio input devices
  --audio-device ID    Audio device ID to use
  --debug             Save recorded audio files for debugging
  --no-tray           Disable system tray icon
  --tap-mode          Use tap-to-start/tap-to-stop instead of push-to-talk
  --discard-key KEY   Key to discard recording in tap mode (default: esc)
  --help              Show help message

Examples

# Use default settings (large-v3 model, CUDA, scroll lock key, auto language)
whisper-to-me

# Use smaller model on CPU with caps lock trigger
whisper-to-me --model base --device cpu --key "<caps_lock>"

# Use key combination as trigger (Ctrl+Shift+R)
whisper-to-me --key "<ctrl>+<shift>+r"

# Use Ctrl+- (minus) as trigger
whisper-to-me --key "<ctrl>+-"

# Spanish transcription with debug mode
whisper-to-me --language es --debug --audio-device 2

# Run without system tray (terminal only)
whisper-to-me --no-tray

# List available audio devices
whisper-to-me --list-devices

# Use tap-to-start/tap-to-stop mode
whisper-to-me --tap-mode

# Tap mode with delete key to discard recordings
whisper-to-me --tap-mode --discard-key "<delete>"

Configuration

Whisper-to-Me supports persistent configuration through a TOML config file and multiple profiles for different use cases.

Configuration File

Location: ~/.config/whisper-to-me/config.toml

View the config file location:

whisper-to-me --config-path

Configuration Sections

General Settings (`[general]`)

model: Whisper model size
- Options: "tiny", "base", "small", "medium", "large-v3" (default)
- Affects: Transcription accuracy vs speed trade-off
device: Processing device
- Options: "cpu", "cuda" (default)
- Affects: Transcription speed (GPU acceleration)
language: Target language
- Options: "auto" (default), "en", "es", "fr", etc.
- Affects: Transcription accuracy for specific languages
debug: Debug mode
- Options: true, false (default)
- Affects: Saves audio files for troubleshooting

Recording Settings (`[recording]`)

mode: Recording mode
- Options: "push-to-talk" (default), "tap-mode"
- Affects: How recording is triggered
trigger_key: Key combination to trigger recording
- Default: "<scroll_lock>"
- Examples: "<caps_lock>", "<ctrl>+<shift>+r", "<alt>+<space>"
discard_key: Key to discard recording in tap mode
- Default: "<esc>"
- Options: Single keys like "<delete>", "<backspace>"
audio_device: Audio input device ID
- Default: "" (system default)
- Use --list-devices to see available devices

UI Settings (`[ui]`)

use_tray: System tray integration
- Options: true (default), false
- Affects: Shows microphone icon in system tray

Advanced Settings (`[advanced]`)

sample_rate: Audio sample rate
- Default: 16000 Hz
- Affects: Audio quality and processing speed
chunk_size: Audio processing chunk size
- Default: 512
- Affects: Real-time processing performance
vad_filter: Voice Activity Detection filter
- Default: true
- Affects: Noise filtering during recording

Configuration Profiles

Create and manage multiple configuration profiles for different use cases:

Profile Management

# List available profiles
whisper-to-me --list-profiles

# Use specific profile
whisper-to-me --profile work

# Create new profile from current settings
whisper-to-me --model tiny --device cpu --create-profile quick

Example Profile Configuration

[general]
model = "large-v3"
device = "cuda"
language = "auto"
debug = false
last_profile = "default"

[recording]
mode = "push-to-talk"
trigger_key = "<scroll_lock>"
discard_key = "<esc>"
audio_device = ""

[ui]
use_tray = true

[advanced]
sample_rate = 16000
chunk_size = 512
vad_filter = true

# Work profile - English only, medium model, caps lock trigger
[profiles.work]
[profiles.work.general]
language = "en"
model = "medium"
[profiles.work.recording]
trigger_key = "<caps_lock>"

# Spanish profile - Spanish language, large model
[profiles.spanish]
[profiles.spanish.general]
language = "es"
model = "large-v3"

# Quick profile - Fast transcription, CPU only
[profiles.quick]
[profiles.quick.general]
model = "tiny"
device = "cpu"
[profiles.quick.recording]
mode = "tap-mode"

Configuration Priority

Settings are applied in this order (highest to lowest priority):

Command line arguments
Profile settings
Base configuration file
Default values

System Tray

The system tray icon shows:

Gray microphone: Ready to record
Red microphone: Currently recording
Right-click menu: View status and quit

How It Works

Single Instance Protection: Ensures only one instance runs at a time
Global Hotkey Detection: Monitors for configured trigger key across all applications
Audio Recording: Captures microphone input while key is held
Speech Processing: Uses FasterWhisper for local speech-to-text conversion
Keystroke Simulation: Types the transcribed text directly into the active application
System Integration: Shows status in system tray with visual feedback

Performance Notes

First Run: May take longer as the Whisper model downloads (~1-3GB)
GPU Acceleration: CUDA significantly improves transcription speed
Model Sizes:
- tiny: Fastest, least accurate (~39MB)
- base: Good balance (~74MB)
- small: Better accuracy (~244MB)
- medium: High accuracy (~769MB)
- large-v3: Best accuracy (~1550MB, default)
Audio Quality: Better microphone input improves transcription accuracy

Key Combinations

You can use key combinations as trigger keys:

# Single keys
whisper-to-me --key "<scroll_lock>"
whisper-to-me --key "<caps_lock>"
whisper-to-me --key "a"           # Single character

# Key combinations  
whisper-to-me --key "<ctrl>+<shift>+r"
whisper-to-me --key "<alt>+<space>"
whisper-to-me --key "<ctrl>+-"    # Ctrl + minus
whisper-to-me --key "<shift>+1"   # Shift + 1

Uses standard pynput format:

Named keys: Wrap in angle brackets <ctrl>, <alt>, <shift>, <esc>, <tab>, etc.
Single characters: Use directly a, 1, -, +, etc.
Combinations: Join with + symbol

Troubleshooting

Common Issues

"Already running" error: Only one instance allowed - check system tray or use pkill whisper-to-me
Permission errors: May need permissions for global key capture and microphone access
Audio issues: Check microphone permissions with --list-devices
CUDA errors: Install CUDA drivers or use --device cpu
Trigger key not working: Try different keys like --key "<caps_lock>"

Debug Mode

Use --debug to save recorded audio files for troubleshooting:

whisper-to-me --debug

System Requirements Check

# Check audio devices
whisper-to-me --list-devices

# Test with smaller model
whisper-to-me --model tiny --device cpu

Uninstallation

# If installed with pip
pip uninstall whisper-to-me

# If installed with uv tool
uv tool uninstall whisper-to-me

Development

Setup Development Environment

git clone https://github.com/marnunez/whisper-to-me.git
cd whisper-to-me
uv sync --all-extras --dev

Run Tests

uv run pytest

Code Quality

uv run ruff check
uv run ruff format

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Add tests if applicable
Ensure code quality (uv run ruff check && uv run pytest)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

FasterWhisper for fast speech recognition
OpenAI Whisper for the underlying model
PyNput for cross-platform input control

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nebelwerfer

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.0

Jul 22, 2025

0.4.0

Jul 22, 2025

This version

0.3.0

Jul 21, 2025

0.2.0

Jul 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_to_me-0.3.0.tar.gz (71.3 kB view details)

Uploaded Jul 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whisper_to_me-0.3.0-py3-none-any.whl (12.2 kB view details)

Uploaded Jul 21, 2025 Python 3

File details

Details for the file whisper_to_me-0.3.0.tar.gz.

File metadata

Download URL: whisper_to_me-0.3.0.tar.gz
Upload date: Jul 21, 2025
Size: 71.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.8.0

File hashes

Hashes for whisper_to_me-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`b39ed822b1a662b2c492941e18bc30b3e94215685b5e8514886c85c1467ab924`
MD5	`b51a4f7f838ba222a09b9d9d02917595`
BLAKE2b-256	`0e0dcfe27880ed553fe6a2b1f4a6b8cf96deebc98bcaf6936b9756685e106039`

See more details on using hashes here.

File details

Details for the file whisper_to_me-0.3.0-py3-none-any.whl.

File metadata

Download URL: whisper_to_me-0.3.0-py3-none-any.whl
Upload date: Jul 21, 2025
Size: 12.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.8.0

File hashes

Hashes for whisper_to_me-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a1ea364e590bd2d300b05f12325a022d5250db2b3131a34c6bcfb4ec73f6292`
MD5	`b1a57fff5eb909cd3df82ca3a80b43e8`
BLAKE2b-256	`1d3cdf7a5c7a00ba38794ab9a1ccd951bbf5f6ea25aaffc0c040da95922eb8b6`

See more details on using hashes here.

whisper-to-me 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Whisper-to-Me

Features

Requirements

Installation

From PyPI (Recommended)

From Source

Usage

Basic Usage

Command Line Options

Examples

Configuration

Configuration File

Configuration Sections

General Settings ([general])

Recording Settings ([recording])

UI Settings ([ui])

Advanced Settings ([advanced])

Configuration Profiles

Profile Management

Example Profile Configuration

Configuration Priority

System Tray

How It Works

Performance Notes

Key Combinations

Troubleshooting

Common Issues

Debug Mode

System Requirements Check

Uninstallation

Development

Setup Development Environment

Run Tests

Code Quality

Contributing

License

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

General Settings (`[general]`)

Recording Settings (`[recording]`)

UI Settings (`[ui]`)

Advanced Settings (`[advanced]`)