Skip to main content

A speech-to-text application for Arch Linux + Hyprland

Project description

SpeechShift

A speech-to-text application made for desktop environments running Wayland compositor (DE's like hyprland etc...).

Records audio when a hotkey is pressed, transcribes it using faster-whisper, and automatically types the transcribed text.

System Requirements

We'll expand compatibility in the coming days.

  • Window manager: Wayland
  • Python: 3.8+
  • Package manager: UV

Installation

1. Automatic Installation (Recommended)

# Clone or download the project
git clone <repository-url> speechshift
cd speechshift

# Install all dependencies automatically
uv build
pip install dist/speechshift*.whl --force-reinstall

Run test to make sure pipewire, wl clipboard is present. It also downloads the whisper (small - ~80mb) model for transcription.

speechshift --test

Add these lines to your ~/.config/hypr/hyprland.conf:

The recommended default is Super+Shift+R, but you can set it to anything you like

# SpeechShift POC Keybinds
bind = SUPER_SHIFT, R, exec, /path/to/speechshift --toggle

and setup speechshift daemon to startup on default by adding these lines to ~/.config/hypr/hyprland.conf

exec-once = /path/to/speechshift --deamon

Then either restart, so that the deamon is automatically run, or start running the speechshift deamon manually for this session by running

speechshift --deamon

Usage

Basic Usage

  1. Start recording (Super+Shift+R): You'll see a notification: "🎤 Recording started..."
  2. Stop recording (Super+Shift+R): Audio is automatically transcribed using faster-whisper, Transcribed text is typed into the focused window. Notifications show: "🔄 Transcribing audio..." → "✅ Transcribed: [preview]"

How It Works

Architecture Overview

Keybind (Super+Shift+R)
    ↓
Main Python Script
    ├── PipeWire Audio Recording (sounddevice)
    ├── AI Transcription (faster-whisper)
    ├── Temporary File Management
    ├── Wayland Text Input (wl-clipboard + wtype)
    ├── Smart Notifications (notify-send)
    └── Hyprland IPC (optional window detection)

Recording Workflow

  1. Keybind Press: Hyprland detects Super+Shift+R press
  2. Recording Start:
    • Python script starts PipeWire audio capture
    • Notification: "🎤 Recording started..."
    • Audio streams to temporary WAV file in /tmp
  3. Keybind Release: Hyprland detects key release
  4. Recording Stop & Transcription:
    • Audio capture stops
    • Notification: "🔄 Transcribing audio..."
    • faster-whisper transcribes the audio
    • Transcribed text inserted via wtype
    • Temporary file automatically deleted
    • Success notification: "✅ Transcribed: [preview]"

Technical Details

  • Audio Format: 16-bit WAV, 44.1kHz, mono
  • Transcription Model: faster-whisper "base" model (configurable)
  • File Handling: Temporary files in /tmp, auto-cleanup after transcription
  • Text Insertion: Direct typing via wtype, fallback to clipboard paste
  • Notifications: Smart status updates via notify-send
  • Error Handling: Graceful fallback with error notifications

Troubleshooting

Common Issues

  1. "sounddevice not available":

    # Install manually: pip install --user sounddevice numpy
    
  2. "Audio recording failed":

    • Check PipeWire is running: systemctl --user status pipewire
    • Test microphone: pw-record --list-targets
    • Verify permissions: ensure user is in audio group
  3. "Hyprland socket not found":

    • Ensure running under Hyprland
    • Check environment variables: echo $HYPRLAND_INSTANCE_SIGNATURE
  4. "Text insertion not working":

    • Verify wtype is installed: wtype --version
    • Test manually: wtype "test"
    • Check focused window accepts text input
  5. "Notifications not showing":

    • Test manually: notify-send "test" "message"

Debug Mode

Enable detailed logging by checking ~/.speechshift.log:

tail -f ~/.speechshift.log

System Testing

Run comprehensive system tests:

python main.py --test

This checks:

  • System dependencies availability
  • Audio device detection
  • Wayland tools functionality
  • PipeWire integration

License

This is a proof-of-concept implementation. Use and modify as needed for your projects.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechshift-0.1.0.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speechshift-0.1.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file speechshift-0.1.0.tar.gz.

File metadata

  • Download URL: speechshift-0.1.0.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.19

File hashes

Hashes for speechshift-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fc94306ab89c5de42dde81acd4aa8d03a71c87f9b4aac2ad6d08a78a924bae36
MD5 f759838798b95495d47c0aa138d8c16a
BLAKE2b-256 ef6bd2d8f0fac49f6040998b9087e41e3eba398291108aff35dfa0e64b7c627c

See more details on using hashes here.

File details

Details for the file speechshift-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for speechshift-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5895e269a67d27d4b6492ad8b899950421dfcbd0a881858f8798b5bf18f50d60
MD5 dce8b45a6f3e3b1307eb5b1f8f789479
BLAKE2b-256 954132969fa8110955e66122b5883aa70b3703550818d31631c6bb60d736d607

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page