Skip to main content

A fully local, offline first speech-to-text application made for Linux.

Project description

SpeechShift

A fully local, offline first speech-to-text application made for desktop environments running Wayland compositor (DE's like hyprland etc...).

Records audio when a hotkey is pressed, transcribes it using faster-whisper, and automatically types the transcribed text.

Demo

Demo done on omarchy running hyprland

Demo

Roadmap

  • Support for even faster transcription local methods like nvidia parakeet
  • Custom vocabulary support
  • Use LLM's like ChatGPT to auto format text before pasting

System Requirements

We'll expand compatibility in the coming days.

  • Window manager: Wayland
  • Python: 3.8+
  • Package manager: UV

Installation

1. Automatic Installation (Recommended)

uv tool install speechshift

Run test to make sure pipewire, wl clipboard is present. It also downloads the whisper (small - ~80mb) model for transcription.

speechshift --test

Add these lines to your ~/.config/hypr/hyprland.conf:

The recommended default is Super+Shift+R, but you can set it to anything you like

# SpeechShift POC Keybinds
bind = SUPER_SHIFT, R, exec, /path/to/speechshift --toggle

and setup speechshift daemon to startup on default by adding these lines to ~/.config/hypr/hyprland.conf

exec-once = /path/to/speechshift --deamon

Then either restart, so that the deamon is automatically run. Or start running the speechshift deamon manually for this session by running

speechshift --deamon

Usage

  1. Start recording (Super+Shift+R): You'll see a notification: "🎤 Recording started..."
  2. Stop recording (Super+Shift+R): Audio is automatically transcribed using faster-whisper or AssemblyAI. Transcribed text is typed into the focused window. Notifications show: "🔄 Transcribing audio..." → "✅ Transcribed: [preview]"

Configuration

SpeechShift can be configured by creating a config.json file in ~/.config/speechshift/. If the file doesn't exist, it will be created with default settings upon first run.

Here's an example configuration to override the default whisper model and language:

{
  "transcription": {
    "engine": "whisper"
  },
  "whisper": {
    "model": "medium",
    "language": "en"
  },
  "audio": {
    "recording_device": null,
    "notification_timeout": 3000
  }
}

to use Assembly AI, make sure to set the ASSEMBLYAI_API_KEY environment variable and set the transcription engine to assemblyai.

Assembly AI is highly recommended since its much better on accuracy & speed.

How It Works

Architecture Overview

Keybind (Super+Shift+R)
    ↓
Main Python Script
    ├── PipeWire Audio Recording (sounddevice)
    ├── AI Transcription (faster-whisper)
    ├── Temporary File Management
    ├── Wayland Text Input (wl-clipboard + hyprctl simulate control+V)
    ├── Smart Notifications (notify-send)

Workflow

  1. Keybind Press: Hyprland detects Super+Shift+R press
  2. Recording Start:
    • Python script starts PipeWire audio capture
    • Notification: "🎤 Recording started..."
    • Audio streams to temporary WAV file in /tmp
  3. Keybind Release: Hyprland detects key release
  4. Recording Stop & Transcription:
    • Audio capture stops
    • Notification: "🔄 Transcribing audio..."
    • faster-whisper transcribes the audio
    • Transcribed text pasted into active window
    • Temporary file automatically deleted
    • Success notification: "✅ Transcribed: [preview]"

Technical Details

  • Audio Format: 16-bit WAV, 44.1kHz, mono
  • Transcription Model: faster-whisper "base" model (configurable)
  • File Handling: Temporary files in /tmp, auto-cleanup after transcription
  • Text Insertion: Direct typing via wtype, fallback to clipboard paste
  • Notifications: Smart status updates via notify-send
  • Error Handling: Graceful fallback with error notifications

Troubleshooting

Common Issues

  1. "sounddevice not available":

    # Install manually: pip install --user sounddevice numpy
    
  2. "Audio recording failed":

    • Check PipeWire is running: systemctl --user status pipewire
    • Test microphone: pw-record --list-targets
    • Verify permissions: ensure user is in audio group
  3. "Hyprland socket not found":

    • Ensure running under Hyprland
    • Check environment variables: echo $HYPRLAND_INSTANCE_SIGNATURE
  4. "Text insertion not working":

    • Verify wtype is installed: wtype --version
    • Test manually: wtype "test"
    • Check focused window accepts text input
  5. "Notifications not showing":

    • Test manually: notify-send "test" "message"

Debug Mode

Enable detailed logging by checking ~/.speechshift.log:

tail -f ~/.speechshift.log

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speechshift-0.1.4.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speechshift-0.1.4-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file speechshift-0.1.4.tar.gz.

File metadata

  • Download URL: speechshift-0.1.4.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.19

File hashes

Hashes for speechshift-0.1.4.tar.gz
Algorithm Hash digest
SHA256 c40307d9b403879b9c2b496b6c28704d39f664b6c02b0174f69168c6216b2286
MD5 baca7282390b5ec98b11474fac2f53a5
BLAKE2b-256 c1cd3b18c153079d0ed2a3341872839bcc02ffed1e962eec040039ef5a126732

See more details on using hashes here.

File details

Details for the file speechshift-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for speechshift-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 dc6ad1a86d82a9902a52afd4ef611b097c0127fc600bc0705856424f46893c82
MD5 1f4724d37377efb8d971a29e9c29a7e5
BLAKE2b-256 557be5fdfd218ea9b5c0f9d393c675779a342e888d77de190e1f2f25f249a416

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page