Hotkey-Activated Voice-to-Clipboard Transcriber

These details have not been verified by PyPI

Project links

Project description

HoldTranscribe

Hotkey-Activated Voice-to-Clipboard Transcriber

A lightweight tool that records audio while you hold a configurable hotkey, transcribes speech using OpenAI's Whisper model, and copies the result to your clipboard.

Features

Hold-to-record using a customizable hotkey combination
GPU acceleration with automatic CUDA detection and CPU fallback
Instant copy of transcribed text to the clipboard
Persistent model instance for low-latency transcription
Configurable model size and beam search settings
Detailed debug output and performance metrics
Cross-platform support (Linux, macOS, Windows)
Voice Activity Detection (VAD) for clean audio capture
Auto-start service integration for all platforms

Platform-Specific Requirements

Linux

Python 3.8 or later
Bash-compatible shell (for installer script)
A CUDA-capable GPU (optional, for hardware acceleration)
PulseAudio or equivalent audio system
Permissions to read input events (user in input group)
X11 or Wayland desktop environment

macOS

Python 3.8 or later
macOS 10.14 (Mojave) or later
Microphone access permissions
Accessibility permissions for global hotkey monitoring
Optional: CUDA-capable GPU (limited support on newer Macs)

Windows

Python 3.8 or later
Windows 10 or later (Windows 11 recommended)
Microphone access permissions
Optional: CUDA-capable GPU with appropriate drivers
PowerShell 5.0 or later (for service installation)

Installation

Option 1: Pip Installation (Recommended)

From GitHub (all platforms):

pip install git+https://github.com/binaryninja/holdtranscribe.git

From PyPI (when available):

pip install holdtranscribe

Option 2: Manual Installation

Clone the repository:

git clone https://github.com/binaryninja/holdtranscribe.git
cd holdtranscribe

Install Python dependencies:

pip install faster-whisper sounddevice pynput webrtcvad pyperclip notify2 numpy psutil

Optional GPU acceleration:

Linux/Windows with CUDA:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

macOS with Metal Performance Shaders:

pip install torch torchvision torchaudio

Platform-Specific Setup

Linux Setup

Add user to input group (if needed):
```
sudo usermod -aG input $USER
```
Log out and back in for changes to take effect.

Install system dependencies (Ubuntu/Debian):

sudo apt update
sudo apt install python3-pip portaudio19-dev pulseaudio

Install system dependencies (Fedora/RHEL):

sudo dnf install python3-pip portaudio-devel pulseaudio

macOS Setup

Install dependencies via Homebrew:

# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install PortAudio
brew install portaudio

Grant permissions:
- Microphone Access: System Preferences → Security & Privacy → Privacy → Microphone → Enable for Terminal/your Python environment
- Accessibility Access: System Preferences → Security & Privacy → Privacy → Accessibility → Enable for Terminal/your Python environment
- Input Monitoring: System Preferences → Security & Privacy → Privacy → Input Monitoring → Enable for Terminal/your Python environment

For Apple Silicon Macs:

# Install Python dependencies with conda for better compatibility
conda install python=3.9
pip install faster-whisper sounddevice pynput webrtcvad pyperclip notify2 numpy psutil

Windows Setup

Install via Microsoft Store or python.org:
- Download Python from python.org or install via Microsoft Store
- Ensure "Add Python to PATH" is checked during installation
Install Visual C++ Build Tools (if compilation errors occur):
- Download and install Microsoft C++ Build Tools
- Or install Visual Studio Community with C++ workload
Grant microphone permissions:
- Settings → Privacy → Microphone → Allow apps to access microphone → Enable for Python/Terminal

Usage

Basic Usage (All Platforms)

# Run with default settings (if installed via pip)
holdtranscribe

# Or if using the script directly
python voice_hold_to_clip.py

Command Line Options

--model <size>       Whisper model size (tiny, base, small, medium, large-v3). Default: large-v3
--beam-size <n>      Beam search width (1 for fastest). Default: 5
--fast               Shorthand for `--model base --beam-size 1`
--debug              Enable verbose timing and resource metrics
--device <cpu|cuda>  Force CPU or GPU mode

Platform-Specific Examples

Linux/macOS:

holdtranscribe --model tiny --beam-size 1

Windows (Command Prompt):

holdtranscribe --model tiny --beam-size 1

Windows (PowerShell):

holdtranscribe --model tiny --beam-size 1

Auto-Start Service Setup

Linux (systemd)

Create service directory:
```
mkdir -p ~/.config/systemd/user
```

Create service file:

cat > ~/.config/systemd/user/holdtranscribe.service << 'EOF'
[Unit]
Description=HoldTranscribe Voice Transcriber
After=graphical-session.target

[Service]
Type=simple
ExecStart=/usr/bin/holdtranscribe --model large-v3 --beam-size 1
Restart=always
RestartSec=5
Environment=DISPLAY=:0
Environment=XDG_RUNTIME_DIR=/run/user/%i
WorkingDirectory=%h

[Install]
WantedBy=default.target
EOF

Enable and start:

systemctl --user daemon-reload
systemctl --user enable holdtranscribe.service
systemctl --user start holdtranscribe.service

macOS (launchd)

Create launch agent directory:
```
mkdir -p ~/Library/LaunchAgents
```

Create plist file:

cat > ~/Library/LaunchAgents/com.holdtranscribe.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.holdtranscribe</string>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/local/bin/holdtranscribe</string>
        <string>--model</string>
        <string>large-v3</string>
        <string>--beam-size</string>
        <string>1</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
</dict>
</plist>
EOF

Load the service:

launchctl load ~/Library/LaunchAgents/com.holdtranscribe.plist
launchctl start com.holdtranscribe

Windows (Task Scheduler)

Create batch file for easier management:
```
@echo off
holdtranscribe --model large-v3 --beam-size 1
```
Save as holdtranscribe.bat
Using Task Scheduler GUI:
- Open Task Scheduler (taskschd.msc)
- Create Basic Task → Name: "HoldTranscribe"
- Trigger: When I log on
- Action: Start a program → Browse to your batch file
- Finish and test

Using PowerShell (run as Administrator):

$action = New-ScheduledTaskAction -Execute "C:\path\to\holdtranscribe.bat"
$trigger = New-ScheduledTaskTrigger -AtLogon
$settings = New-ScheduledTaskSettingsSet -AllowStartIfOnBatteries -DontStopIfGoingOnBatteries
Register-ScheduledTask -TaskName "HoldTranscribe" -Action $action -Trigger $trigger -Settings $settings

Configuration

Hotkey Customization

Edit the HOTKEY set in the script to change key combinations:

# Default: Ctrl + Mouse Forward Button
HOTKEY = {keyboard.Key.ctrl, mouse.Button.button9}

# Alternative examples:
# HOTKEY = {keyboard.Key.ctrl, keyboard.Key.space}  # Ctrl + Space
# HOTKEY = {keyboard.Key.alt, mouse.Button.left}    # Alt + Left Click
# HOTKEY = {mouse.Button.button8}                   # Mouse Back Button only

Platform-Specific Mouse Button Notes

Windows: Button numbers may vary by mouse driver
macOS: Some mouse buttons may require additional permissions
Linux: Button numbers can be checked with xev command

Environment Variables

CUDA_VISIBLE_DEVICES - Control GPU usage
TRANSFORMERS_CACHE - Customize model cache location
DISABLE_NOTIFY=1 - Suppress desktop notifications
PULSE_SERVER (Linux) - Specify PulseAudio server
PORTAUDIO_DEVICE - Force specific audio device

Monitoring and Logs

Linux (systemd)

# View logs
journalctl --user -u holdtranscribe.service -f

# Check status
systemctl --user status holdtranscribe.service

macOS (launchd)

# View logs
tail -f ~/Library/Logs/com.holdtranscribe.log

# Check status
launchctl list | grep holdtranscribe

Windows (Task Scheduler)

Task Scheduler → Task Scheduler Library → HoldTranscribe → History tab
Or check Windows Event Viewer → Applications and Services Logs

Troubleshooting

Common Issues (All Platforms)

Model loading errors:

# Clear cache and retry
rm -rf ~/.cache/huggingface/transformers/
holdtranscribe --model tiny  # Start with smaller model

Audio device issues:

# List available devices
python -c "import sounddevice as sd; print(sd.query_devices())"

Linux-Specific Issues

Permission denied on input events:

sudo usermod -aG input $USER
# Log out and back in

Audio issues with PulseAudio:

# Restart PulseAudio
pulseaudio -k
pulseaudio --start

X11 forwarding issues:

export DISPLAY=:0
xhost +local:

macOS-Specific Issues

Accessibility permissions denied:

System Preferences → Security & Privacy → Privacy → Accessibility
Add Terminal or your Python executable
May need to remove and re-add if issues persist

Microphone access denied:

System Preferences → Security & Privacy → Privacy → Microphone
Enable for Terminal/Python

"Operation not permitted" errors:

# Try running with sudo temporarily to identify permission issue
sudo holdtranscribe --debug

Python/PortAudio conflicts:

# Reinstall with Homebrew
brew uninstall portaudio
brew install portaudio
pip uninstall sounddevice
pip install sounddevice

Windows-Specific Issues

DLL load failures:

# Install Visual C++ Redistributable
# Download from Microsoft website

Microphone access denied:

Settings → Privacy → Microphone → Allow apps to access microphone
Ensure Python/Terminal is enabled

CUDA issues:

# Check CUDA installation
nvidia-smi
python -c "import torch; print(torch.cuda.is_available())"

PowerShell execution policy:

Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

Antivirus blocking:

Add Python executable to antivirus exclusions
Add HoldTranscribe directory to exclusions

Performance Optimization

For slower systems:

# Use fastest settings
holdtranscribe --model tiny --beam-size 1 --fast

For better accuracy:

# Use larger model with more processing
holdtranscribe --model large-v3 --beam-size 5

Memory management:

# Monitor memory usage
holdtranscribe --debug

Contributing

Contributions, issues, and feature requests are welcome! Please:

Fork the repository
Create a feature branch
Test on multiple platforms when possible
Submit a pull request

When reporting issues, please include:

Operating system and version
Python version
Full error message
Steps to reproduce

License

This project is licensed under the MIT License. See LICENSE for details.

Acknowledgments

OpenAI Whisper team for the excellent speech recognition model
Contributors to the faster-whisper implementation
All the open-source libraries that make this project possible

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.1

Jul 18, 2025

1.0.0

Jul 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

holdtranscribe-1.0.1.tar.gz (26.0 kB view details)

Uploaded Jul 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

holdtranscribe-1.0.1-py3-none-any.whl (13.8 kB view details)

Uploaded Jul 18, 2025 Python 3

File details

Details for the file holdtranscribe-1.0.1.tar.gz.

File metadata

Download URL: holdtranscribe-1.0.1.tar.gz
Upload date: Jul 18, 2025
Size: 26.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for holdtranscribe-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`5b1c63f5a9480c93a4648b344ede67fb5342b0e4c5c361a2431bd1b27e1ff865`
MD5	`2fe9738cbfb9e690a8505b72379c0d4a`
BLAKE2b-256	`25355f7c82aefff237db3f7c1c160541043918c2da5a9dc08ee53a2ffd981982`

See more details on using hashes here.

File details

Details for the file holdtranscribe-1.0.1-py3-none-any.whl.

File metadata

Download URL: holdtranscribe-1.0.1-py3-none-any.whl
Upload date: Jul 18, 2025
Size: 13.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for holdtranscribe-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`af6c5b22a3af62efbfccd4ae28a8c12df774d593efda2d2581321590b0f9e6e2`
MD5	`ea35c8bd833e2098790998d5c3628bfa`
BLAKE2b-256	`f83a396824b62b27eb02c0e1f6acf16ddf7e9b213cf13be1f87cc9790b81e9c1`

See more details on using hashes here.

holdtranscribe 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HoldTranscribe

Features

Platform-Specific Requirements

Linux

macOS

Windows

Installation

Option 1: Pip Installation (Recommended)

Option 2: Manual Installation

Platform-Specific Setup

Linux Setup

macOS Setup

Windows Setup

Usage

Basic Usage (All Platforms)

Command Line Options

Platform-Specific Examples

Auto-Start Service Setup

Linux (systemd)

macOS (launchd)

Windows (Task Scheduler)

Configuration

Hotkey Customization

Platform-Specific Mouse Button Notes

Environment Variables

Monitoring and Logs

Linux (systemd)

macOS (launchd)

Windows (Task Scheduler)

Troubleshooting

Common Issues (All Platforms)

Linux-Specific Issues

macOS-Specific Issues

Windows-Specific Issues

Performance Optimization

Contributing

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes