Voice-to-smart-paste pipeline for Linux
Reason this release was yanked:
BYO-key users hit ModuleNotFoundError: No module named 'baml_client' on first rewrite — fixed in next release
Project description
Talk-to-Tux
Voice-to-smart-paste pipeline for Linux. Hold a mouse button, speak, and the transcribed + LLM-rewritten text is pasted into your active application — formatted for the context you're in.
flowchart LR
A["Hold Button"] --> B["Record Audio"]
B --> C["STT (Whisper)"]
C --> D["LLM Rewrite"]
D --> E["Smart Paste"]
F["App Context\n(window + screenshot)"] --> D
How It Works
- Hold your mouse side button (or keyboard hotkey) and speak. The microphone is always listening in a 2-second ring buffer, so any speech immediately before the button press is also captured
- Release — audio is checked for speech by VAD, then sent to the STT provider chain
- The transcription is rewritten by an LLM using your active window's context (app name, window title, AT-SPI widget text, screenshot)
- The result is pasted into the focused application using the correct shortcut
- Double-tap the side button after a paste to send Enter (e.g., submit a chat message)
Works on both X11 and Wayland (GNOME, tested on Ubuntu 24.04+).
Architecture
flowchart TB
subgraph Trigger["Trigger Layer"]
SB["Side Button / Keyboard Hotkey"]
end
subgraph Recording["Recording Phase"]
SB -->|hold| REC["Audio Recorder\n(sounddevice + 2s ring buffer)"]
SB -->|press| CTX0["Capture Context\nwindow + AT-SPI + screenshot"]
end
subgraph Processing["Processing Phase"]
REC -->|release| VAD["VAD Gate\n(Silero)"]
VAD -->|raw WAV| STT["STT Chain\ngpu_server -> elevenlabs -> groq -> openai -> google"]
STT --> TX["Transcription"]
CTX0 --> RP["Rewrite Prompt\n6-layer context"]
TX --> RP
RP --> LLM["BAML SmartRewrite\n(Ollama / Gemini / Groq)"]
end
subgraph Output["Paste Phase"]
LLM --> TT["Tooltip / Notification\n(confirm or auto-paste)"]
TT --> PASTE["Paster\nxclip/wl-copy + xdotool/wtype"]
end
Prerequisites
OS: Linux with GNOME (X11 or Wayland). Python: 3.11+. STT backend: GPU server with CUDA, or an API key for ElevenLabs/Groq/OpenAI/Google.
System packages
| Tool(s) | Group | Ubuntu/Debian (apt) |
Arch (pacman) |
Fedora (dnf) |
|---|---|---|---|---|
libportaudio2 |
Audio | libportaudio2 |
portaudio |
portaudio |
grim |
Screenshots (Wayland) | grim |
grim |
grim |
scrot |
Screenshots (X11/XWayland) | scrot |
scrot |
scrot |
xclip |
Clipboard (X11) | xclip |
xclip |
xclip |
wl-copy, wl-paste |
Clipboard (Wayland) | wl-clipboard |
wl-clipboard |
wl-clipboard |
xdotool |
Keystroke injection (X11) | xdotool |
xdotool |
xdotool |
wtype |
Keystroke injection (Wayland/wlroots) | wtype |
wtype |
wtype |
ydotool + ydotoold |
Keystroke injection (GNOME Wayland) | see note below | ydotool |
ydotool |
dbus-send, busctl, gdbus |
D-Bus utilities | dbus / systemd |
dbus / systemd |
dbus / systemd |
notify-send |
Desktop notifications | libnotify-bin |
libnotify |
libnotify |
evtest |
Input device debug | evtest |
evtest |
evtest |
pgrep |
Process checks | procps |
procps-ng |
procps-ng |
ydotool on Ubuntu/Debian — build v1.0+ from source
Ubuntu's apt ships ydotool 0.1.8, which has no daemon and produces garbage
key injection. You need v1.0+ built from source:
# Build dependencies
sudo apt install cmake libevdev-dev libudev-dev
# Clone and build
git clone https://github.com/ReimuNotMoe/ydotool
cd ydotool && cmake -B build && cmake --build build && sudo cmake --install build
# Enable the daemon and grant /dev/uinput access
systemctl --user enable --now ydotoold
sudo usermod -aG input $USER # re-login required
# or add udev rule: echo 'KERNEL=="uinput", GROUP="input", MODE="0660"' | sudo tee /etc/udev/rules.d/99-uinput.rules
Arch and Fedora ship a working ydotool via their package managers.
Self-verify
uv run talk-to-tux --doctor
Quick Start
git clone https://github.com/viperjuice/talk-to-tux.git
cd talk-to-tux
uv sync --all-groups
# Copy and edit config — secrets live under ~/.config (CWD .env is not loaded)
mkdir -p ~/.config/talk-to-tux
cp .env.example ~/.config/talk-to-tux/secrets.env
# Run
uv run talk-to-tux
On first run, the app auto-detects your mouse and starts listening for side button presses. A system tray indicator shows the current state.
Trigger Modes and Key Mapping
Default: Mouse Side Buttons (hold-to-record)
| Button | evdev Code | Action |
|---|---|---|
| BTN_SIDE (thumb back) | 275 | Either button starts recording |
| BTN_EXTRA (thumb forward) | 276 | Release ALL buttons to stop |
The device is grabbed exclusively so side buttons don't trigger browser back/forward. All other mouse events (movement, clicks, scroll) are forwarded transparently via uinput.
Alternative: Keyboard Hotkey
| Key Combo | evdev Names | Action |
|---|---|---|
| Ctrl + Super (left) | KEY_LEFTCTRL+KEY_LEFTMETA |
Toggle recording |
Note: The Super key may trigger GNOME Activities. Disable with:
gsettings set org.gnome.mutter overlay-key ''
Customizing the Trigger
Option 1: TOML config (~/.config/talk-to-tux/config.toml)
[trigger]
mode = "mouse" # "auto", "mouse", or "keyboard"
record_mode = "hold" # "hold" (release to stop) or "toggle" (tap/tap)
[trigger.mouse]
button_codes = [275, 276] # any evdev button codes
device_name = "Logitech G502" # match by name substring (stable across reboots and USB replug)
# device_path = "/dev/input/event5" # or explicit path (fragile)
grab = true
[trigger.keyboard]
hotkey = "KEY_LEFTCTRL+KEY_LEFTMETA"
Option 2: Environment variables
TTT_TRIGGER_MODE=keyboard
TTT_HOTKEY=KEY_RIGHTCTRL
TTT_RECORD_MODE=toggle
# Or nested format:
TTT_TRIGGER__MOUSE__BUTTON_CODES='[275, 276]'
TTT_TRIGGER__MOUSE__DEVICE_NAME="Logitech"
Option 3: CLI flags
uv run talk-to-tux --trigger keyboard --record-mode toggle
Finding Your Button Codes
# List input devices
sudo evtest
# Pick your mouse, press buttons, note the codes:
# Event: type 1 (EV_KEY), code 275 (BTN_SIDE), value 1
Configuration
Configuration is loaded with this precedence (highest first):
- CLI arguments (
--trigger mouse,--debug, etc.) - Environment variables (
TTT_*prefix) ~/.config/talk-to-tux/secrets.env(CWD.envintentionally not loaded — prevents rogue.envin a project dir from overriding secrets)- TOML config (
~/.config/talk-to-tux/config.toml)
Key Settings
| Section | Setting | Default | Description |
|---|---|---|---|
stt |
providers |
gpu_server,elevenlabs,groq,openai,google |
STT fallback chain (tried in order) |
stt.gpu_server |
url |
http://localhost:8000 |
Self-hosted Whisper server URL |
rewrite |
enabled |
true |
Enable LLM smart rewrite |
rewrite |
ollama_base_url |
http://localhost:11434/v1 |
Ollama / vLLM endpoint |
context |
screenshot_enabled |
true |
Include screenshot in LLM context |
ducking |
enabled |
true |
Reduce other apps' volume while recording |
ducking |
factor |
0.15 |
Duck to 15% of original volume |
tooltip |
enabled |
false |
Show confirm-before-paste tooltip (disabled = auto-paste) |
tooltip |
use_notifications |
true |
Use desktop notifications (vs GTK tooltip) |
paste |
enabled |
true |
Auto-paste into active window |
indicator |
enabled |
true |
Show system tray indicator |
Per-App Rules
Customize behavior per application in config.toml:
[[app]]
match = "Google-chrome"
match_title = "*ChatGPT*" # optional title filter (glob or ~regex)
paste_shortcut = "ctrl+shift+v" # override paste shortcut
rewrite_hint = "Conversational tone, no markdown"
[[app]]
match = "Code"
rewrite_hint = "Generate code in the language of the active file"
[[app]]
match = "kitty"
is_terminal = true
paste_shortcut = "ctrl+shift+v"
Default rules for common apps (browsers, terminals, editors, chat apps) are
shipped in src/talk_to_tux/data/default_app_rules.toml. User rules in
config.toml take priority.
API Keys
Store API keys in ~/.config/talk-to-tux/secrets.env (CWD .env is not loaded):
TTT_OPENAI_API_KEY=sk-...
TTT_GROQ_API_KEY=gsk_...
TTT_GOOGLE_API_KEY=AIza...
TTT_ELEVENLABS_API_KEY=...
GPU Server Deployment
The STT server runs faster-whisper on NVIDIA GPUs.
cd server
uv sync
uv run ttt-server --host 0.0.0.0 --port 8000
# Or Docker:
docker build -f deploy/Dockerfile.server -t ttt-server .
docker run --gpus all -p 8000:8000 ttt-server
Systemd service files are in deploy/.
Development
uv sync --all-groups # install all deps including dev
uv run pytest tests/ -q # run all tests (1611)
make lint # ruff check
make format # ruff format
uv run baml-cli generate # regenerate BAML client after .baml changes
See CONTRIBUTING.md for the full developer guide.
CLI Reference
uv run talk-to-tux [COMMAND] [OPTIONS]
Daemon options (no subcommand):
--trigger {auto,mouse,keyboard} Trigger mode
--record-mode {toggle,hold} Recording mode
--no-indicator Disable system tray
--no-tooltip Disable tooltip/notifications
--no-validation Disable recording validation sound
--show-config Print resolved config and exit
--doctor Run diagnostics and exit
--setup Run interactive first-run setup wizard
--migrate-config Convert .env to config.toml
--debug Enable debug logging
Hosted-mode subcommands (beta):
login Sign in to hosted mode (GitHub OAuth)
logout Clear saved hosted-mode token
whoami Show signed-in account + tier + token expiry
usage Show current quota (STT hours, rewrite calls)
switch-mode {hosted, byo-key} Switch between hosted and BYO-key API modes
On a fresh install with no config.toml, no TTT_API_MODE env var, no
secrets.env, and no stored hosted token, the first invocation auto-launches
the setup wizard (same as running --setup). The wizard asks you to pick
hosted (GitHub OAuth, quota-managed) or byo-key (provide your own
OpenAI/Groq/ElevenLabs keys in secrets.env). switch-mode byo-key never
creates or modifies secrets.env — you edit it yourself.
Running as a Service
To start Talk-to-Tux automatically on login and restart on crash:
# Copy the service file
cp deploy/talk-to-tux.service ~/.config/systemd/user/
# Enable and start
systemctl --user enable --now talk-to-tux
# Check status / logs
systemctl --user status talk-to-tux
journalctl --user -u talk-to-tux -f
The service auto-restarts within 3 seconds if the app crashes. The GNOME Shell extension also auto-hides the recording overlay after 30 seconds if the app stops responding.
License
AGPL-3.0 — see LICENSE. Commercial licensing available for organizations that need to use Talk-to-Tux without the copyleft requirements.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file talk_to_tux-0.1.0.tar.gz.
File metadata
- Download URL: talk_to_tux-0.1.0.tar.gz
- Upload date:
- Size: 155.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc06b5686903a68403c92b36b476dff720b8b0e00ef58874ccc40f0fba1018e9
|
|
| MD5 |
656aefb06f9c2219246f306b11cd3b59
|
|
| BLAKE2b-256 |
e3db4c602db72bae72f49b2de72a77e4307dd954e604058224ca55e51c1df05f
|
Provenance
The following attestation bundles were made for talk_to_tux-0.1.0.tar.gz:
Publisher:
release.yml on ViperJuice/talk-to-tux
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
talk_to_tux-0.1.0.tar.gz -
Subject digest:
bc06b5686903a68403c92b36b476dff720b8b0e00ef58874ccc40f0fba1018e9 - Sigstore transparency entry: 1340891936
- Sigstore integration time:
-
Permalink:
ViperJuice/talk-to-tux@5e7efd269b6e0e1076cb11525aca73852ce0577a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ViperJuice
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5e7efd269b6e0e1076cb11525aca73852ce0577a -
Trigger Event:
push
-
Statement type:
File details
Details for the file talk_to_tux-0.1.0-py3-none-any.whl.
File metadata
- Download URL: talk_to_tux-0.1.0-py3-none-any.whl
- Upload date:
- Size: 187.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f227d7709b7df527ef61f69e882ecb5a00bbec255ec1d4dd50584bde3656536c
|
|
| MD5 |
fabfd3965eb4185b327f80ff97e2a9f0
|
|
| BLAKE2b-256 |
31871f5d6a0061968f00dd93ff5af59e87c749f0af59db1a9029803fec51991a
|
Provenance
The following attestation bundles were made for talk_to_tux-0.1.0-py3-none-any.whl:
Publisher:
release.yml on ViperJuice/talk-to-tux
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
talk_to_tux-0.1.0-py3-none-any.whl -
Subject digest:
f227d7709b7df527ef61f69e882ecb5a00bbec255ec1d4dd50584bde3656536c - Sigstore transparency entry: 1340891945
- Sigstore integration time:
-
Permalink:
ViperJuice/talk-to-tux@5e7efd269b6e0e1076cb11525aca73852ce0577a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ViperJuice
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5e7efd269b6e0e1076cb11525aca73852ce0577a -
Trigger Event:
push
-
Statement type: