Skip to main content

Local speech-to-text for desktop using faster-whisper

Project description

stt2desktop

tests codecov stt2desktop @ PyPi Python Versions License GPL-3.0-or-later

Local speech-to-text for desktop using faster-whisper.

Let's you dictate text into any application without sending audio to any cloud services. Everything runs locally on your machine — no internet connection required after the initial model was download.

Currently only tested under Linux with KDE ;)

How it works

  1. Run ./cli.py listen (Whisper model downloaded on first run, cached on disk)
  2. Hold Scroll Lock to record from your microphone
  3. Release Scroll Lock — the audio is transcribed locally by faster-whisper
  4. The transcribed text is copied to the clipboard via wl-copy and pasted into the focused window via ydotool key ctrl+v

Used tools:

  • faster-whisper for local speech recognition
  • ydotool to simulate keyboard input (works on Wayland and X11)
  • wl-clipboard (wl-copy) to paste text via clipboard — avoids keyboard layout issues
  • chime to play notification sounds

Prepare installation

Requirements: Python 3.12+, a working microphone, wl-clipboard and ydotool and ydotoold:

sudo apt install ydotool ydotoold wl-clipboard
sudo usermod -aG input $USER
echo 'KERNEL=="uinput", GROUP="input", MODE="0660"' | sudo tee /etc/udev/rules.d/60-uinput.rules
sudo udevadm control --reload-rules && sudo udevadm trigger

Then re-login (or run newgrp input in the current shell) so the group change takes effect.

Install via pipx

You can install "stt2desktop" with pipx:

sudo apt install pipx
pipx install stt2desktop

Then run:

stt2desktop listen

The default global hotkey is Scroll Lock (In german: "rollen"). You can change it via the --hotkey option (see below). Proposal for alternative key: ctrl_r, alt_r, cmd_r, shift_r ;)

CLI listen

usage: stt2desktop listen [-h] [LISTEN OPTIONS]

Start the STT listener. Hold the hotkey to record, release to transcribe and insert.

╭─ options ────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ -h, --help                show this help message and exit                                                            │
│ -v, --verbosity           Verbosity level; e.g.: -v, -vv, -vvv, etc. (repeatable)                                    │
│ --model {tiny_en,tiny,base_en,base,small_en,small,medium_en,medium,large_v1,large_v2,large_v3,large,distil_large_v2, │
│ distil_medium_en,distil_small_en,distil_large_v3,distil_large_v3_5,large_v3_turbo,turbo}                             │
│                           Whisper model to use for transcription. (default: small)                                   │
│ --hotkey STR              evdev key name to hold for recording. Release to transcribe and insert text. Examples:     │
│                           KEY_SCROLLLOCK, KEY_RIGHTCTRL, KEY_RIGHTALT. (default: KEY_SCROLLLOCK)                     │
│ --sample-rate INT         Audio sample rate in Hz. Whisper expects 16000. (default: 16000)                           │
│ --device STR              Device to run inference on, e.g. cpu or cuda. (default: auto)                              │
│ --compute-type STR        Quantization type, e.g. int8, float16, float32. (default: int8)                            │
│ --num-workers {None}|INT  Number of parallel transcription workers. Defaults to CPU count. (default: None)           │
│ --sounds, --no-sounds     Play notification sounds via chime. (default: True)                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Whisper models

Just a selection and approximate values:

Model Size Speed Accuracy
tiny ~75 MB fastest lowest
base ~145 MB fast good
small ~460 MB slower better (default)
medium ~1.5 GB slow high

Larger models produce more accurate transcriptions but take longer to process ;)

Troubleshooting

Use pavucontrol to check your audio setup and make sure the correct microphone is selected and working.

Test audio recording:

./cli.py test-recording

Some terminal commands to check your audio setup:

# List capture devices in PulseAudio sound server:
pactl list sources short

# Check current volume:
pactl list sources | grep -A1 "Name: .*input\|Volume:"

# Displays the current state in PipeWire:
wpctl status

Setup loopback mode to hear youself:

# Start:
pactl load-module module-loopback
# Undo:
pactl unload-module module-loopback

start development

At least uv is needed. Install e.g.: via pipx:

apt-get install pipx
pipx install uv

Clone the project and just start the CLI help commands. A virtual environment will be created/updated automatically.

~$ git clone https://github.com/jedie/stt2desktop.git
~$ cd stt2desktop
~/stt2desktop$ ./cli.py --help
~/stt2desktop$ ./dev-cli.py --help
usage: ./dev-cli.py [-h] {coverage,install,lint,mypy,nox,pip-audit,publish,test,update,update-readme-history,update-test-snapshot-files,version}



╭─ options ────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ -h, --help    show this help message and exit                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ subcommands ────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ (required)                                                                                                           │
│   • coverage  Run tests and show coverage report.                                                                    │
│   • install   Install requirements and 'stt2desktop' via pip as editable.                                            │
│   • lint      Check/fix code style by run: "ruff check --fix"                                                        │
│   • mypy      Run Mypy (configured in pyproject.toml)                                                                │
│   • nox       Run nox                                                                                                │
│   • pip-audit                                                                                                        │
│               Run pip-audit check against current requirements files                                                 │
│   • publish   Build and upload this project to PyPi                                                                  │
│   • test      Run unittests                                                                                          │
│   • update    Update dependencies (uv.lock) and git pre-commit hooks                                                 │
│   • update-readme-history                                                                                            │
│               Update project history base on git commits/tags in README.md Will be exited with 1 if the README.md    │
│               was updated otherwise with 0.                                                                          │
│                                                                                                                      │
│               Also, callable via e.g.:                                                                               │
│                   python -m cli_base update-readme-history -v                                                        │
│   • update-test-snapshot-files                                                                                       │
│               Update all test snapshot files (by remove and recreate all snapshot files)                             │
│   • version   Print version and exit                                                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

History

  • v0.3.0
    • 2026-04-23 - avoid double hotkey processing
    • 2026-04-23 - nicer exit
    • 2026-04-23 - fix code style
    • 2026-04-23 - Use a lock file to ensure that only one instance is running
    • 2026-04-23 - restore old clipboard after pasting the STT text
  • v0.2.0
    • 2026-04-22 - paste text via clipboard to avoid keyboard layout issues
    • 2026-04-16 - Add test commands and migrate to ydotool
  • v0.1.2
    • 2026-03-30 - print warning when not running on Linux
    • 2026-03-30 - Update requirements
    • 2026-03-27 - Update README
  • v0.1.1
    • 2026-03-27 - +Proposal for alternative hotkey
    • 2026-03-27 - fix color outputs
    • 2026-03-27 - Update requirements
    • 2026-03-27 - add missing license file.
Expand older history entries ...
  • v0.1.0
    • 2026-03-27 - Use chime to play notification sounds
    • 2026-03-27 - Try to fix github CI run
    • 2026-03-27 - Cleanup README
    • 2026-03-27 - pipx usage
  • v0.0.1
    • 2026-03-26 - Add POC
    • 2026-03-26 - init

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stt2desktop-0.3.0.tar.gz (128.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stt2desktop-0.3.0-py3-none-any.whl (34.8 kB view details)

Uploaded Python 3

File details

Details for the file stt2desktop-0.3.0.tar.gz.

File metadata

  • Download URL: stt2desktop-0.3.0.tar.gz
  • Upload date:
  • Size: 128.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for stt2desktop-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3d7b26bc16f7a0638ba262941ed8accdab6faa197fcb06f1d2880cd0b59ca45f
MD5 c06a2706927b076963e430366f7c07b6
BLAKE2b-256 1b4db7dc35a57d412300bfcfbfcd32d209436e0e7bea3f194200a0869b656519

See more details on using hashes here.

File details

Details for the file stt2desktop-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: stt2desktop-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 34.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for stt2desktop-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4f2c39a6f7d08e210c3acd7a90d0441c73c97af8b41acf516513e55131ce32aa
MD5 4802c48b32c8e33c5b69e8371645d985
BLAKE2b-256 8558e7716ed75591be4195cfb975650a8567ce173bc94aa1a93902bf3e454cfe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page