Hotkey-Activated Voice-to-Clipboard Transcriber
Project description
HoldTranscribe
Hotkey-Activated Voice-to-Clipboard Transcriber
A lightweight tool that records audio while you hold a configurable hotkey, transcribes speech using OpenAI's Whisper model, and copies the result to your clipboard.
Features
- Hold-to-record using a customizable hotkey combination
- GPU acceleration with automatic CUDA detection and CPU fallback
- Instant copy of transcribed text to the clipboard
- Persistent model instance for low-latency transcription
- Configurable model size and beam search settings
- Detailed debug output and performance metrics
- Cross-platform support (Linux, macOS, Windows)
- Voice Activity Detection (VAD) for clean audio capture
- Auto-start service integration for all platforms
Platform-Specific Requirements
Linux
- Python 3.8 or later
- Bash-compatible shell (for installer script)
- A CUDA-capable GPU (optional, for hardware acceleration)
- PulseAudio or equivalent audio system
- Permissions to read input events (user in
inputgroup) - X11 or Wayland desktop environment
macOS
- Python 3.8 or later
- macOS 10.14 (Mojave) or later
- Microphone access permissions
- Accessibility permissions for global hotkey monitoring
- Optional: CUDA-capable GPU (limited support on newer Macs)
Windows
- Python 3.8 or later
- Windows 10 or later (Windows 11 recommended)
- Microphone access permissions
- Optional: CUDA-capable GPU with appropriate drivers
- PowerShell 5.0 or later (for service installation)
Installation
Option 1: Pip Installation (Recommended)
From GitHub (all platforms):
pip install git+https://github.com/binaryninja/holdtranscribe.git
From PyPI (when available):
pip install holdtranscribe
Option 2: Manual Installation
-
Clone the repository:
git clone https://github.com/binaryninja/holdtranscribe.git cd holdtranscribe
-
Install Python dependencies:
pip install faster-whisper sounddevice pynput webrtcvad pyperclip notify2 numpy psutil
-
Optional GPU acceleration:
Linux/Windows with CUDA:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
macOS with Metal Performance Shaders:
pip install torch torchvision torchaudio
Platform-Specific Setup
Linux Setup
-
Add user to input group (if needed):
sudo usermod -aG input $USER
Log out and back in for changes to take effect.
-
Install system dependencies (Ubuntu/Debian):
sudo apt update sudo apt install python3-pip portaudio19-dev pulseaudio
-
Install system dependencies (Fedora/RHEL):
sudo dnf install python3-pip portaudio-devel pulseaudio
macOS Setup
-
Install dependencies via Homebrew:
# Install Homebrew if not already installed /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" # Install PortAudio brew install portaudio
-
Grant permissions:
- Microphone Access: System Preferences → Security & Privacy → Privacy → Microphone → Enable for Terminal/your Python environment
- Accessibility Access: System Preferences → Security & Privacy → Privacy → Accessibility → Enable for Terminal/your Python environment
- Input Monitoring: System Preferences → Security & Privacy → Privacy → Input Monitoring → Enable for Terminal/your Python environment
-
For Apple Silicon Macs:
# Install Python dependencies with conda for better compatibility conda install python=3.9 pip install faster-whisper sounddevice pynput webrtcvad pyperclip notify2 numpy psutil
Windows Setup
-
Install via Microsoft Store or python.org:
- Download Python from python.org or install via Microsoft Store
- Ensure "Add Python to PATH" is checked during installation
-
Install Visual C++ Build Tools (if compilation errors occur):
- Download and install Microsoft C++ Build Tools
- Or install Visual Studio Community with C++ workload
-
Grant microphone permissions:
- Settings → Privacy → Microphone → Allow apps to access microphone → Enable for Python/Terminal
Usage
Basic Usage (All Platforms)
# Run with default settings (if installed via pip)
holdtranscribe
# Or if using the script directly
python voice_hold_to_clip.py
Command Line Options
--model <size> Whisper model size (tiny, base, small, medium, large-v3). Default: large-v3
--beam-size <n> Beam search width (1 for fastest). Default: 5
--fast Shorthand for `--model base --beam-size 1`
--debug Enable verbose timing and resource metrics
--device <cpu|cuda> Force CPU or GPU mode
Platform-Specific Examples
Linux/macOS:
holdtranscribe --model tiny --beam-size 1
Windows (Command Prompt):
holdtranscribe --model tiny --beam-size 1
Windows (PowerShell):
holdtranscribe --model tiny --beam-size 1
Auto-Start Service Setup
Linux (systemd)
-
Create service directory:
mkdir -p ~/.config/systemd/user
-
Create service file:
cat > ~/.config/systemd/user/holdtranscribe.service << 'EOF' [Unit] Description=HoldTranscribe Voice Transcriber After=graphical-session.target [Service] Type=simple ExecStart=/usr/bin/holdtranscribe --model large-v3 --beam-size 1 Restart=always RestartSec=5 Environment=DISPLAY=:0 Environment=XDG_RUNTIME_DIR=/run/user/%i WorkingDirectory=%h [Install] WantedBy=default.target EOF
-
Enable and start:
systemctl --user daemon-reload systemctl --user enable holdtranscribe.service systemctl --user start holdtranscribe.service
macOS (launchd)
-
Create launch agent directory:
mkdir -p ~/Library/LaunchAgents
-
Create plist file:
cat > ~/Library/LaunchAgents/com.holdtranscribe.plist << 'EOF' <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>com.holdtranscribe</string> <key>ProgramArguments</key> <array> <string>/usr/local/bin/holdtranscribe</string> <string>--model</string> <string>large-v3</string> <string>--beam-size</string> <string>1</string> </array> <key>RunAtLoad</key> <true/> <key>KeepAlive</key> <true/> </dict> </plist> EOF
-
Load the service:
launchctl load ~/Library/LaunchAgents/com.holdtranscribe.plist launchctl start com.holdtranscribe
Windows (Task Scheduler)
-
Create batch file for easier management:
@echo off holdtranscribe --model large-v3 --beam-size 1
Save as
holdtranscribe.bat -
Using Task Scheduler GUI:
- Open Task Scheduler (taskschd.msc)
- Create Basic Task → Name: "HoldTranscribe"
- Trigger: When I log on
- Action: Start a program → Browse to your batch file
- Finish and test
-
Using PowerShell (run as Administrator):
$action = New-ScheduledTaskAction -Execute "C:\path\to\holdtranscribe.bat" $trigger = New-ScheduledTaskTrigger -AtLogon $settings = New-ScheduledTaskSettingsSet -AllowStartIfOnBatteries -DontStopIfGoingOnBatteries Register-ScheduledTask -TaskName "HoldTranscribe" -Action $action -Trigger $trigger -Settings $settings
Configuration
Hotkey Customization
Edit the HOTKEY set in the script to change key combinations:
# Default: Ctrl + Mouse Forward Button
HOTKEY = {keyboard.Key.ctrl, mouse.Button.button9}
# Alternative examples:
# HOTKEY = {keyboard.Key.ctrl, keyboard.Key.space} # Ctrl + Space
# HOTKEY = {keyboard.Key.alt, mouse.Button.left} # Alt + Left Click
# HOTKEY = {mouse.Button.button8} # Mouse Back Button only
Platform-Specific Mouse Button Notes
- Windows: Button numbers may vary by mouse driver
- macOS: Some mouse buttons may require additional permissions
- Linux: Button numbers can be checked with
xevcommand
Environment Variables
CUDA_VISIBLE_DEVICES- Control GPU usageTRANSFORMERS_CACHE- Customize model cache locationDISABLE_NOTIFY=1- Suppress desktop notificationsPULSE_SERVER(Linux) - Specify PulseAudio serverPORTAUDIO_DEVICE- Force specific audio device
Monitoring and Logs
Linux (systemd)
# View logs
journalctl --user -u holdtranscribe.service -f
# Check status
systemctl --user status holdtranscribe.service
macOS (launchd)
# View logs
tail -f ~/Library/Logs/com.holdtranscribe.log
# Check status
launchctl list | grep holdtranscribe
Windows (Task Scheduler)
- Task Scheduler → Task Scheduler Library → HoldTranscribe → History tab
- Or check Windows Event Viewer → Applications and Services Logs
Troubleshooting
Common Issues (All Platforms)
Model loading errors:
# Clear cache and retry
rm -rf ~/.cache/huggingface/transformers/
holdtranscribe --model tiny # Start with smaller model
Audio device issues:
# List available devices
python -c "import sounddevice as sd; print(sd.query_devices())"
Linux-Specific Issues
Permission denied on input events:
sudo usermod -aG input $USER
# Log out and back in
Audio issues with PulseAudio:
# Restart PulseAudio
pulseaudio -k
pulseaudio --start
X11 forwarding issues:
export DISPLAY=:0
xhost +local:
macOS-Specific Issues
Accessibility permissions denied:
- System Preferences → Security & Privacy → Privacy → Accessibility
- Add Terminal or your Python executable
- May need to remove and re-add if issues persist
Microphone access denied:
- System Preferences → Security & Privacy → Privacy → Microphone
- Enable for Terminal/Python
"Operation not permitted" errors:
# Try running with sudo temporarily to identify permission issue
sudo holdtranscribe --debug
Python/PortAudio conflicts:
# Reinstall with Homebrew
brew uninstall portaudio
brew install portaudio
pip uninstall sounddevice
pip install sounddevice
Windows-Specific Issues
DLL load failures:
# Install Visual C++ Redistributable
# Download from Microsoft website
Microphone access denied:
- Settings → Privacy → Microphone → Allow apps to access microphone
- Ensure Python/Terminal is enabled
CUDA issues:
# Check CUDA installation
nvidia-smi
python -c "import torch; print(torch.cuda.is_available())"
PowerShell execution policy:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Antivirus blocking:
- Add Python executable to antivirus exclusions
- Add HoldTranscribe directory to exclusions
Performance Optimization
For slower systems:
# Use fastest settings
holdtranscribe --model tiny --beam-size 1 --fast
For better accuracy:
# Use larger model with more processing
holdtranscribe --model large-v3 --beam-size 5
Memory management:
# Monitor memory usage
holdtranscribe --debug
Contributing
Contributions, issues, and feature requests are welcome! Please:
- Fork the repository
- Create a feature branch
- Test on multiple platforms when possible
- Submit a pull request
When reporting issues, please include:
- Operating system and version
- Python version
- Full error message
- Steps to reproduce
License
This project is licensed under the MIT License. See LICENSE for details.
Acknowledgments
- OpenAI Whisper team for the excellent speech recognition model
- Contributors to the faster-whisper implementation
- All the open-source libraries that make this project possible
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file holdtranscribe-1.0.1.tar.gz.
File metadata
- Download URL: holdtranscribe-1.0.1.tar.gz
- Upload date:
- Size: 26.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b1c63f5a9480c93a4648b344ede67fb5342b0e4c5c361a2431bd1b27e1ff865
|
|
| MD5 |
2fe9738cbfb9e690a8505b72379c0d4a
|
|
| BLAKE2b-256 |
25355f7c82aefff237db3f7c1c160541043918c2da5a9dc08ee53a2ffd981982
|
File details
Details for the file holdtranscribe-1.0.1-py3-none-any.whl.
File metadata
- Download URL: holdtranscribe-1.0.1-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af6c5b22a3af62efbfccd4ae28a8c12df774d593efda2d2581321590b0f9e6e2
|
|
| MD5 |
ea35c8bd833e2098790998d5c3628bfa
|
|
| BLAKE2b-256 |
f83a396824b62b27eb02c0e1f6acf16ddf7e9b213cf13be1f87cc9790b81e9c1
|