A speech-to-text application for Arch Linux + Hyprland
Project description
SpeechShift
A speech-to-text application made for desktop environments running Wayland compositor (DE's like hyprland etc...).
Records audio when a hotkey is pressed, transcribes it using faster-whisper, and automatically types the transcribed text.
System Requirements
We'll expand compatibility in the coming days.
- Window manager: Wayland
- Python: 3.8+
- Package manager: UV
Installation
1. Automatic Installation (Recommended)
# Clone or download the project
git clone <repository-url> speechshift
cd speechshift
# Install all dependencies automatically
uv build
pip install dist/speechshift*.whl --force-reinstall
Run test to make sure pipewire, wl clipboard is present. It also downloads the whisper (small - ~80mb) model for transcription.
speechshift --test
Add these lines to your ~/.config/hypr/hyprland.conf:
The recommended default is Super+Shift+R, but you can set it to anything you like
# SpeechShift POC Keybinds
bind = SUPER_SHIFT, R, exec, /path/to/speechshift --toggle
and setup speechshift daemon to startup on default by adding these lines to ~/.config/hypr/hyprland.conf
exec-once = /path/to/speechshift --deamon
Then either restart, so that the deamon is automatically run, or start running the speechshift deamon manually for this session by running
speechshift --deamon
Usage
Basic Usage
- Start recording (Super+Shift+R): You'll see a notification: "🎤 Recording started..."
- Stop recording (Super+Shift+R): Audio is automatically transcribed using faster-whisper, Transcribed text is typed into the focused window. Notifications show: "🔄 Transcribing audio..." → "✅ Transcribed: [preview]"
How It Works
Architecture Overview
Keybind (Super+Shift+R)
↓
Main Python Script
├── PipeWire Audio Recording (sounddevice)
├── AI Transcription (faster-whisper)
├── Temporary File Management
├── Wayland Text Input (wl-clipboard + wtype)
├── Smart Notifications (notify-send)
└── Hyprland IPC (optional window detection)
Recording Workflow
- Keybind Press: Hyprland detects Super+Shift+R press
- Recording Start:
- Python script starts PipeWire audio capture
- Notification: "🎤 Recording started..."
- Audio streams to temporary WAV file in /tmp
- Keybind Release: Hyprland detects key release
- Recording Stop & Transcription:
- Audio capture stops
- Notification: "🔄 Transcribing audio..."
- faster-whisper transcribes the audio
- Transcribed text inserted via wtype
- Temporary file automatically deleted
- Success notification: "✅ Transcribed: [preview]"
Technical Details
- Audio Format: 16-bit WAV, 44.1kHz, mono
- Transcription Model: faster-whisper "base" model (configurable)
- File Handling: Temporary files in
/tmp, auto-cleanup after transcription - Text Insertion: Direct typing via wtype, fallback to clipboard paste
- Notifications: Smart status updates via notify-send
- Error Handling: Graceful fallback with error notifications
Troubleshooting
Common Issues
-
"sounddevice not available":
# Install manually: pip install --user sounddevice numpy -
"Audio recording failed":
- Check PipeWire is running:
systemctl --user status pipewire - Test microphone:
pw-record --list-targets - Verify permissions: ensure user is in
audiogroup
- Check PipeWire is running:
-
"Hyprland socket not found":
- Ensure running under Hyprland
- Check environment variables:
echo $HYPRLAND_INSTANCE_SIGNATURE
-
"Text insertion not working":
- Verify wtype is installed:
wtype --version - Test manually:
wtype "test" - Check focused window accepts text input
- Verify wtype is installed:
-
"Notifications not showing":
- Test manually:
notify-send "test" "message"
- Test manually:
Debug Mode
Enable detailed logging by checking ~/.speechshift.log:
tail -f ~/.speechshift.log
System Testing
Run comprehensive system tests:
python main.py --test
This checks:
- System dependencies availability
- Audio device detection
- Wayland tools functionality
- PipeWire integration
License
This is a proof-of-concept implementation. Use and modify as needed for your projects.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speechshift-0.1.0.tar.gz.
File metadata
- Download URL: speechshift-0.1.0.tar.gz
- Upload date:
- Size: 12.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc94306ab89c5de42dde81acd4aa8d03a71c87f9b4aac2ad6d08a78a924bae36
|
|
| MD5 |
f759838798b95495d47c0aa138d8c16a
|
|
| BLAKE2b-256 |
ef6bd2d8f0fac49f6040998b9087e41e3eba398291108aff35dfa0e64b7c627c
|
File details
Details for the file speechshift-0.1.0-py3-none-any.whl.
File metadata
- Download URL: speechshift-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5895e269a67d27d4b6492ad8b899950421dfcbd0a881858f8798b5bf18f50d60
|
|
| MD5 |
dce8b45a6f3e3b1307eb5b1f8f789479
|
|
| BLAKE2b-256 |
954132969fa8110955e66122b5883aa70b3703550818d31631c6bb60d736d607
|