Skip to main content

A local, private Linux text-to-speech tool. Select text in any app, press a hotkey, hear it read by Kokoro-82M on your GPU.

Project description

Lexaloud

A local, private text-to-speech tool for Linux. Select text, press a hotkey, hear it read by a neural voice running on your own machine.

License: MIT Python test lint

How it works

  1. Select text in any application
  2. Press a global hotkey (e.g., Ctrl+0)
  3. Hear it spoken sentence by sentence, with pause / skip / rewind controls

Lexaloud runs a small daemon on your machine that synthesizes speech using Kokoro-82M, an open-weights neural voice model. Nothing leaves your computer — no cloud API, no account, no telemetry.

To hear what Kokoro sounds like before installing, try the live demo on Hugging Face.

Features

  • Global hotkey on any desktop — works on GNOME, KDE Plasma, Sway, Hyprland, XFCE, Cinnamon, and any window manager that supports custom keybindings. GNOME is the primary tested path with integrated tray + hotkey UI; other desktops bind the same CLI commands manually. See docs/hotkeys/.
  • MPRIS2 / media keys — desktop media keys, GNOME's top-bar media indicator, KDE's media widget, Bluetooth headphone buttons, and playerctl all control Lexaloud playback with zero configuration. Uses dbus-fast (optional dependency).
  • Floating overlay — an always-on-top sentence caption bar (off by default). Enable via [advanced] overlay = true in config.toml or the control window's Settings tab. Supports both gtk-layer-shell (wlroots/KWin) and X11/GNOME Wayland fallback.
  • XDG GlobalShortcuts portal — Wayland-native global hotkey binding on KDE Plasma 6+, Sway, and Hyprland via the org.freedesktop.portal.GlobalShortcuts portal. GNOME does not support this portal and continues using the gsettings path.
  • GPU-accelerated neural TTS — Kokoro-82M via kokoro-onnx on onnxruntime-gpu with NVIDIA CUDA. CPU fallback runs at ~10x real-time, which is fine for reading along.
  • Sentence-granularity streaming with bounded backpressure and cooperative cancellation. Pause, skip, rewind, or stop mid-article without audio clipping.
  • 12 built-in voices — American and British, male and female, from warm to serious. The control window lets you preview and switch voices; see the full list in docs/models.md.
  • GTK3 tray indicator + control window — visible on any desktop that supports AppIndicator (GNOME with the ubuntu-appindicators extension, KDE, Budgie, etc.). Voice, speed, and hotkey settings. The CLI works without the tray on minimal setups.
  • Privacy-first — see the Privacy section.
  • Open-source — MIT-licensed code, Apache-2.0-licensed model weights. See THIRD_PARTY_LICENSES.md.

Requirements

Requirement Details
OS Linux only. Tier 1: Ubuntu 24.04, Debian 13. Tier 2: Fedora 41, Arch, Mint, Pop!_OS. Not supported: Windows, macOS.
Init system systemd (for the --user daemon unit). Non-systemd distros (Artix, Void) can run lexaloud daemon manually.
Python 3.11 or newer
GPU (optional) NVIDIA with CUDA 12-compatible driver. AMD ROCm and Intel Arc are not yet supported — the daemon falls back to CPU, which runs at ~10x real-time and is fine for reading along.
Audio PipeWire, PulseAudio, or bare ALSA (via PortAudio/libportaudio2). Most desktop Linux distros ship PipeWire by default.
Disk ~400 MB for model weights (downloaded once on first setup)
Desktop (optional) GNOME for the integrated tray + hotkey UI. KDE, Sway, XFCE, Cinnamon, and others work via manual hotkey binding — see docs/hotkeys/. The CLI works headless.

Install

Ubuntu / Debian (Tier 1)

sudo apt install python3-venv wl-clipboard xclip libportaudio2 libnotify-bin \
                 python3-gi gir1.2-gtk-3.0 gir1.2-ayatanaappindicator3-0.1

git clone https://github.com/Gustavjiversen01/lexaloud.git
cd lexaloud
./scripts/install.sh

~/.local/share/lexaloud/venv/bin/lexaloud setup
systemctl --user daemon-reload
systemctl --user enable --now lexaloud.service

Then bind a hotkey — see docs/hotkeys/gnome.md or the walkthrough lexaloud setup prints.

Full walkthrough: docs/install/ubuntu-debian.md

Fedora (Tier 2)

sudo dnf install python3 python3-pip python3-gobject gtk3 \
                 wl-clipboard xclip portaudio libnotify

Then the same git clone./scripts/install.shlexaloud setupsystemctl flow. Full walkthrough: docs/install/fedora.md

Arch / Manjaro (Tier 2)

sudo pacman -S python python-gobject gtk3 wl-clipboard xclip portaudio libnotify

Then git clone./scripts/install.shlexaloud setupsystemctl. Full walkthrough: docs/install/arch.md

Other distros

The installer auto-detects your distro via /etc/os-release and prints the right package names if any are missing. For distros not in the table, file a PR against docs/install/.

GPU backend

The installer detects NVIDIA via nvidia-smi and picks the right lockfile automatically. To force a backend:

./scripts/install.sh --backend cuda12   # NVIDIA GPU
./scripts/install.sh --backend cpu      # CPU only (AMD, Intel, or no GPU)

Wayland users: read this

On GNOME Wayland (the default on Ubuntu 24.04), speak-selection may return empty for some apps (VS Code, Obsidian, Slack) because Electron apps don't always publish to the PRIMARY selection. The reliable workflow is:

  1. Ctrl+C to copy the selection to the clipboard
  2. Press your speak-clipboard hotkey

Both commands are in the CLI — bind whichever suits your workflow, or bind both to different keys. Details in docs/gotchas.md.

Not via pip install

pip install lexaloud does not give you a working installation. The TTS runtime requires a specific install sequence for kokoro-onnx

  • onnxruntime-gpu that pip cannot express in one command (the two packages share an internal directory and silently break each other if both are installed normally — see docs/design-rationale.md for the full story). scripts/install.sh is the only supported install path.

CLI

lexaloud speak-selection      # capture PRIMARY selection, speak it
lexaloud speak-clipboard      # capture CLIPBOARD (after Ctrl+C), speak it
lexaloud pause                # pause at the next sentence boundary
lexaloud resume
lexaloud toggle               # pause if speaking, resume if paused
lexaloud skip                 # skip the current sentence
lexaloud back                 # rewind one sentence
lexaloud stop                 # stop and clear the queue
lexaloud status               # daemon state as JSON
lexaloud download-models      # fetch model weights (~340 MB, once)
lexaloud setup                # first-time configuration walkthrough
lexaloud bug-report           # system diagnostics for filing issues
lexaloud daemon               # run the daemon (normally via systemd)

Exit codes: 0 success, 1 error, 2 empty selection, 3 daemon down, 4 oversized payload, 5 capture tool missing/timeout.

Full reference: docs/cli-reference.md

Privacy

Lexaloud performs no telemetry. No text, metadata, or usage statistics are transmitted anywhere. The only outbound network calls are the one-time model downloads on first setup, fetched over HTTPS from the kokoro-onnx GitHub releases page and SHA256-verified against pins in src/lexaloud/models.py.

The daemon listens on a Unix domain socket at $XDG_RUNTIME_DIR/lexaloud/lexaloud.sock (mode 0700 enforced by systemd's RuntimeDirectoryMode=). Only processes running as your user can reach it. There is no open TCP port.

Selection text is never written to disk. Log entries that mention a sentence replace the content with a SHA-1 fingerprint + length, so journalctl never contains readable user text.

Known limitations (v0.3.0)

  • NVIDIA only for GPU acceleration — AMD ROCm and Intel Arc are not supported. CPU fallback works on any x86_64 Linux.
  • No karaoke word-level highlighting — deferred (Kokoro doesn't expose word timings).
  • No browser extension — deferred.
  • Sentence-level pause granularity — the last ~100 ms of the current sub-chunk may play out after pressing pause.
  • GNOME Wayland primary-selection gaps — some Electron apps don't publish to PRIMARY. Workaround: use speak-clipboard + Ctrl+C. See docs/gotchas.md.
  • GlobalShortcuts portal not supported on GNOME — GNOME 46/47 does not implement the XDG GlobalShortcuts portal. GNOME users continue using the gsettings-based hotkey path.

Full list: ROADMAP.md

Architecture

A FastAPI daemon (systemd --user) owns the TTS provider and audio sink. A thin CLI sends HTTP requests over the Unix socket. A GTK3 tray indicator polls daemon state for visual feedback.

Component diagram + data-flow walkthrough: docs/architecture.md. Design decisions: docs/design-rationale.md.

Tests

# Set up a dev environment (one-time)
python3 -m venv .venv && source .venv/bin/activate
pip install -e .[test]

# Run the suite
python -m pytest tests/ --ignore=tests/test_real_kokoro_smoke.py -q

206 tests, ~2.5 seconds. No GPU or audio device required — tests use FakeProvider + NullSink + ASGITransport.

There is also an optional integration test that uses the real Kokoro model and sounddevice (1 extra test, 207 total):

LEXALOUD_REAL_TTS=1 python -m pytest tests/test_real_kokoro_smoke.py -s

Contributing

See CONTRIBUTING.md. Pull requests should be signed off with git commit -s (DCO).

Please read CODE_OF_CONDUCT.md before participating.

Security vulnerabilities: use GitHub private vulnerability reporting rather than public issues. See SECURITY.md.

Acknowledgments

Significant portions of this codebase were developed in collaboration with Claude (Anthropic) via Claude Code. Code review and final editorial decisions are the author's.

License

MIT. See LICENSE for the full text and THIRD_PARTY_LICENSES.md for runtime dependency disclosures (the TTS stack includes GPL-3.0 dynamic dependencies via phonemizer-forkespeakng-loaderespeak-ng).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lexaloud-0.3.0.tar.gz (230.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lexaloud-0.3.0-py3-none-any.whl (89.6 kB view details)

Uploaded Python 3

File details

Details for the file lexaloud-0.3.0.tar.gz.

File metadata

  • Download URL: lexaloud-0.3.0.tar.gz
  • Upload date:
  • Size: 230.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lexaloud-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3ae9b9efc3bd30895a21183c84b8e2579c589328707b65c6d75c24de2327a63a
MD5 87880ef4b385513c49c9afc4112f2c0b
BLAKE2b-256 8b16e58734144de0acaf744823f0797c582ee97b009c1e44f6aa9b3e5aa14544

See more details on using hashes here.

Provenance

The following attestation bundles were made for lexaloud-0.3.0.tar.gz:

Publisher: release.yml on Gustavjiversen01/lexaloud

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lexaloud-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: lexaloud-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 89.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lexaloud-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd19c3bb611738aa6de6ed53f15c431f7fdb26b95b4b3a6ab27c889250ca1ec6
MD5 47b379bcdf4cacc936df99ae506e0bb5
BLAKE2b-256 56ab128687d3a90502f2e1eec81372afeb54e744bc50915e4e86df0daeee3f9e

See more details on using hashes here.

Provenance

The following attestation bundles were made for lexaloud-0.3.0-py3-none-any.whl:

Publisher: release.yml on Gustavjiversen01/lexaloud

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page