A local, private Linux text-to-speech tool. Select text in any app, press a hotkey, hear it read by Kokoro-82M on your GPU.
Project description
Lexaloud
A local, private text-to-speech tool for Linux. Select text, press a hotkey, hear it read by a neural voice running on your own machine.
How it works
- Select text in any application
- Press a global hotkey (e.g.,
Ctrl+0) - Hear it spoken sentence by sentence, with pause / skip / rewind controls
Lexaloud runs a small daemon on your machine that synthesizes speech using Kokoro-82M, an open-weights neural voice model. Nothing leaves your computer — no cloud API, no account, no telemetry.
To hear what Kokoro sounds like before installing, try the live demo on Hugging Face.
Features
- Global hotkey on any desktop — works on GNOME, KDE Plasma,
Sway, Hyprland, XFCE, Cinnamon, and any window manager that
supports custom keybindings. GNOME is the primary tested path with
integrated tray + hotkey UI; other desktops bind the same CLI
commands manually. See
docs/hotkeys/. - MPRIS2 / media keys — desktop media keys, GNOME's top-bar
media indicator, KDE's media widget, Bluetooth headphone buttons,
and
playerctlall control Lexaloud playback with zero configuration. Usesdbus-fast(optional dependency). - Floating overlay — an always-on-top sentence caption bar (off
by default). Enable via
[advanced] overlay = trueinconfig.tomlor the control window's Settings tab. Supports bothgtk-layer-shell(wlroots/KWin) and X11/GNOME Wayland fallback. - XDG GlobalShortcuts portal — Wayland-native global hotkey
binding on KDE Plasma 6+, Sway, and Hyprland via the
org.freedesktop.portal.GlobalShortcutsportal. GNOME does not support this portal and continues using the gsettings path. - GPU-accelerated neural TTS — Kokoro-82M via
kokoro-onnxononnxruntime-gpuwith NVIDIA CUDA. CPU fallback runs at ~10x real-time, which is fine for reading along. - Sentence-granularity streaming with bounded backpressure and cooperative cancellation. Pause, skip, rewind, or stop mid-article without audio clipping.
- 12 built-in voices — American and British, male and female,
from warm to serious. The control window lets you preview and switch
voices; see the full list in
docs/models.md. - GTK3 tray indicator + control window — visible on any desktop
that supports AppIndicator (GNOME with the
ubuntu-appindicatorsextension, KDE, Budgie, etc.). Voice, speed, and hotkey settings. The CLI works without the tray on minimal setups. - Privacy-first — see the Privacy section.
- Open-source — MIT-licensed code, Apache-2.0-licensed model
weights. See
THIRD_PARTY_LICENSES.md.
Requirements
| Requirement | Details |
|---|---|
| OS | Linux only. Tier 1: Ubuntu 24.04, Debian 13. Tier 2: Fedora 41, Arch, Mint, Pop!_OS. Not supported: Windows, macOS. |
| Init system | systemd (for the --user daemon unit). Non-systemd distros (Artix, Void) can run lexaloud daemon manually. |
| Python | 3.11 or newer |
| GPU (optional) | NVIDIA with CUDA 12-compatible driver. AMD ROCm and Intel Arc are not yet supported — the daemon falls back to CPU, which runs at ~10x real-time and is fine for reading along. |
| Audio | PipeWire, PulseAudio, or bare ALSA (via PortAudio/libportaudio2). Most desktop Linux distros ship PipeWire by default. |
| Disk | ~400 MB for model weights (downloaded once on first setup) |
| Desktop (optional) | GNOME for the integrated tray + hotkey UI. KDE, Sway, XFCE, Cinnamon, and others work via manual hotkey binding — see docs/hotkeys/. The CLI works headless. |
Install
Ubuntu / Debian (Tier 1)
sudo apt install python3-venv wl-clipboard xclip libportaudio2 libnotify-bin \
python3-gi gir1.2-gtk-3.0 gir1.2-ayatanaappindicator3-0.1
git clone https://github.com/Gustavjiversen01/lexaloud.git
cd lexaloud
./scripts/install.sh
~/.local/share/lexaloud/venv/bin/lexaloud setup
systemctl --user daemon-reload
systemctl --user enable --now lexaloud.service
Then bind a hotkey — see docs/hotkeys/gnome.md
or the walkthrough lexaloud setup prints.
Full walkthrough: docs/install/ubuntu-debian.md
Fedora (Tier 2)
sudo dnf install python3 python3-pip python3-gobject gtk3 \
wl-clipboard xclip portaudio libnotify
Then the same git clone → ./scripts/install.sh → lexaloud setup →
systemctl flow. Full walkthrough:
docs/install/fedora.md
Arch / Manjaro (Tier 2)
sudo pacman -S python python-gobject gtk3 wl-clipboard xclip portaudio libnotify
Then git clone → ./scripts/install.sh → lexaloud setup → systemctl.
Full walkthrough: docs/install/arch.md
Other distros
The installer auto-detects your distro via /etc/os-release and prints
the right package names if any are missing. For distros not in the table,
file a PR against docs/install/.
GPU backend
The installer detects NVIDIA via nvidia-smi and picks the right
lockfile automatically. To force a backend:
./scripts/install.sh --backend cuda12 # NVIDIA GPU
./scripts/install.sh --backend cpu # CPU only (AMD, Intel, or no GPU)
Wayland users: read this
On GNOME Wayland (the default on Ubuntu 24.04), speak-selection may
return empty for some apps (VS Code, Obsidian, Slack) because Electron
apps don't always publish to the PRIMARY selection. The reliable
workflow is:
- Ctrl+C to copy the selection to the clipboard
- Press your
speak-clipboardhotkey
Both commands are in the CLI — bind whichever suits your workflow, or
bind both to different keys. Details in
docs/gotchas.md.
Not via pip install
pip install lexaloud does not give you a working installation.
The TTS runtime requires a specific install sequence for kokoro-onnx
onnxruntime-gputhatpipcannot express in one command (the two packages share an internal directory and silently break each other if both are installed normally — seedocs/design-rationale.mdfor the full story).scripts/install.shis the only supported install path.
CLI
lexaloud speak-selection # capture PRIMARY selection, speak it
lexaloud speak-clipboard # capture CLIPBOARD (after Ctrl+C), speak it
lexaloud pause # pause at the next sentence boundary
lexaloud resume
lexaloud toggle # pause if speaking, resume if paused
lexaloud skip # skip the current sentence
lexaloud back # rewind one sentence
lexaloud stop # stop and clear the queue
lexaloud status # daemon state as JSON
lexaloud download-models # fetch model weights (~340 MB, once)
lexaloud setup # first-time configuration walkthrough
lexaloud bug-report # system diagnostics for filing issues
lexaloud daemon # run the daemon (normally via systemd)
Exit codes: 0 success, 1 error, 2 empty selection, 3 daemon down, 4 oversized payload, 5 capture tool missing/timeout.
Full reference: docs/cli-reference.md
Privacy
Lexaloud performs no telemetry. No text, metadata, or usage
statistics are transmitted anywhere. The only outbound network calls
are the one-time model downloads on first setup, fetched over HTTPS
from the kokoro-onnx
GitHub releases page and SHA256-verified against pins in
src/lexaloud/models.py.
The daemon listens on a Unix domain socket at
$XDG_RUNTIME_DIR/lexaloud/lexaloud.sock (mode 0700 enforced by
systemd's RuntimeDirectoryMode=). Only processes running as your user
can reach it. There is no open TCP port.
Selection text is never written to disk. Log entries that mention a
sentence replace the content with a SHA-1 fingerprint + length, so
journalctl never contains readable user text.
Known limitations (v0.3.0)
- NVIDIA only for GPU acceleration — AMD ROCm and Intel Arc are not supported. CPU fallback works on any x86_64 Linux.
- No karaoke word-level highlighting — deferred (Kokoro doesn't expose word timings).
- No browser extension — deferred.
- Sentence-level pause granularity — the last ~100 ms of the current sub-chunk may play out after pressing pause.
- GNOME Wayland primary-selection gaps — some Electron apps don't
publish to PRIMARY. Workaround: use
speak-clipboard+ Ctrl+C. Seedocs/gotchas.md. - GlobalShortcuts portal not supported on GNOME — GNOME 46/47 does not implement the XDG GlobalShortcuts portal. GNOME users continue using the gsettings-based hotkey path.
Full list: ROADMAP.md
Architecture
A FastAPI daemon (systemd --user) owns the TTS provider and audio
sink. A thin CLI sends HTTP requests over the Unix socket. A GTK3
tray indicator polls daemon state for visual feedback.
Component diagram + data-flow walkthrough:
docs/architecture.md. Design decisions:
docs/design-rationale.md.
Tests
# Set up a dev environment (one-time)
python3 -m venv .venv && source .venv/bin/activate
pip install -e .[test]
# Run the suite
python -m pytest tests/ --ignore=tests/test_real_kokoro_smoke.py -q
206 tests, ~2.5 seconds. No GPU or audio device required — tests use
FakeProvider + NullSink + ASGITransport.
There is also an optional integration test that uses the real Kokoro
model and sounddevice (1 extra test, 207 total):
LEXALOUD_REAL_TTS=1 python -m pytest tests/test_real_kokoro_smoke.py -s
Contributing
See CONTRIBUTING.md. Pull requests should be
signed off with git commit -s (DCO).
Please read CODE_OF_CONDUCT.md before
participating.
Security vulnerabilities: use
GitHub private vulnerability reporting
rather than public issues. See SECURITY.md.
Acknowledgments
- Kokoro-82M by hexgrad — the open-weights neural TTS model.
kokoro-onnxby thewh1teagle — the ONNX wrapper.- ONNX Runtime + NVIDIA CUDA for GPU-accelerated inference from Python.
phonemizer-fork, pysbd, andsounddevice.- The GNOME and freedesktop.org communities for GTK, libnotify, systemd-user, and AppIndicator.
Significant portions of this codebase were developed in collaboration with Claude (Anthropic) via Claude Code. Code review and final editorial decisions are the author's.
License
MIT. See LICENSE for the full text and
THIRD_PARTY_LICENSES.md for runtime
dependency disclosures (the TTS stack includes GPL-3.0 dynamic
dependencies via phonemizer-fork → espeakng-loader → espeak-ng).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lexaloud-0.3.0.tar.gz.
File metadata
- Download URL: lexaloud-0.3.0.tar.gz
- Upload date:
- Size: 230.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ae9b9efc3bd30895a21183c84b8e2579c589328707b65c6d75c24de2327a63a
|
|
| MD5 |
87880ef4b385513c49c9afc4112f2c0b
|
|
| BLAKE2b-256 |
8b16e58734144de0acaf744823f0797c582ee97b009c1e44f6aa9b3e5aa14544
|
Provenance
The following attestation bundles were made for lexaloud-0.3.0.tar.gz:
Publisher:
release.yml on Gustavjiversen01/lexaloud
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lexaloud-0.3.0.tar.gz -
Subject digest:
3ae9b9efc3bd30895a21183c84b8e2579c589328707b65c6d75c24de2327a63a - Sigstore transparency entry: 1287315586
- Sigstore integration time:
-
Permalink:
Gustavjiversen01/lexaloud@15174d85368c7ae3da538d9e5c87c789a2eebcb6 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/Gustavjiversen01
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@15174d85368c7ae3da538d9e5c87c789a2eebcb6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file lexaloud-0.3.0-py3-none-any.whl.
File metadata
- Download URL: lexaloud-0.3.0-py3-none-any.whl
- Upload date:
- Size: 89.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd19c3bb611738aa6de6ed53f15c431f7fdb26b95b4b3a6ab27c889250ca1ec6
|
|
| MD5 |
47b379bcdf4cacc936df99ae506e0bb5
|
|
| BLAKE2b-256 |
56ab128687d3a90502f2e1eec81372afeb54e744bc50915e4e86df0daeee3f9e
|
Provenance
The following attestation bundles were made for lexaloud-0.3.0-py3-none-any.whl:
Publisher:
release.yml on Gustavjiversen01/lexaloud
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
lexaloud-0.3.0-py3-none-any.whl -
Subject digest:
fd19c3bb611738aa6de6ed53f15c431f7fdb26b95b4b3a6ab27c889250ca1ec6 - Sigstore transparency entry: 1287315632
- Sigstore integration time:
-
Permalink:
Gustavjiversen01/lexaloud@15174d85368c7ae3da538d9e5c87c789a2eebcb6 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/Gustavjiversen01
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@15174d85368c7ae3da538d9e5c87c789a2eebcb6 -
Trigger Event:
push
-
Statement type: