Skip to main content

Real-time speech-to-text for Linux. Hold a hotkey, speak, release — your words appear wherever your cursor is.

Project description

PushToType

Hold a hotkey, speak, release — your words appear wherever your cursor is.

PyPI version Python License: MIT CI

PushToType is a local, real-time speech-to-text tool for Linux. It transcribes your voice using a local Whisper model and types the result directly into whatever application has focus — no clipboard, no cloud, no API keys.

An open-source alternative to OpenAI's Whisper Flow, which has no Linux support.


Features

  • Works everywhere — types into any focused app: browsers, editors, terminals, search bars
  • Local-onlyfaster-whisper runs on your GPU (CUDA) with automatic CPU fallback
  • No cloud — no API keys, no network required after the one-time model download
  • Fast — ~250ms from hotkey release to text appearing
  • Configurable — TOML config file, interactive setup wizard, CLI flags
  • Wayland + X11 — works on both display servers via evdev

Quick Start

# Install
uv add pushtotype        # or: pip install pushtotype

# System dependencies (X11)
sudo apt install libportaudio2 xdotool

# Add yourself to the input group (required for hotkey detection)
sudo usermod -aG input $USER
# Log out and back in for this to take effect

# Run the setup wizard
pushtotype config

# Start
pushtotype

Hold your configured hotkey (default: right Ctrl), speak, release. Text appears at the cursor.


How It Works

[Hold hotkey] → [Record audio] → [Whisper transcription] → [Type into focused app]
     evdev            sounddevice       faster-whisper           xdotool type

PushToType runs as a background daemon. A global hotkey listener (via evdev, reading directly from /dev/input/) fires a recording callback. When you release the hotkey, the audio is sent to faster-whisper for transcription, then xdotool type injects the text into whatever window is focused.


Installation

Recommended: uv

uv tool install pushtotype

pip / pipx

pip install pushtotype
# or
pipx install pushtotype

From source

git clone https://github.com/danielgraviet/pushtotype.git
cd pushtotype
uv pip install -e ".[dev]"

System Requirements

Requirement Notes
Linux X11 or Wayland
Python 3.10+
libportaudio2 sudo apt install libportaudio2
xdotool X11 only — sudo apt install xdotool
wtype + wl-clipboard Wayland only — sudo apt install wtype wl-clipboard
input group sudo usermod -aG input $USER
NVIDIA GPU Recommended for speed — CPU works but is slower

Configuration

Config file lives at ~/.config/pushtotype/config.toml. Run pushtotype config to create it interactively.

[hotkey]
keys = ["KEY_RIGHTCTRL"]

[audio]
device = "default"
sample_rate = 16000

[model]
name = "base.en"
device = "auto"
compute_type = "float16"

[feedback]
enabled = true
volume = 0.5

[output]
method = "auto"   # "auto", "x11", or "wayland"

Config priority (highest to lowest)

  1. CLI flags (e.g. --model small.en)
  2. Environment variables (e.g. PUSHTOTYPE_MODEL=small.en)
  3. Config file (~/.config/pushtotype/config.toml)
  4. Built-in defaults

Environment variables

Variable Config key
PUSHTOTYPE_MODEL model.name
PUSHTOTYPE_DEVICE model.device
PUSHTOTYPE_AUDIO_DEV audio.device
PUSHTOTYPE_FEEDBACK feedback.enabled
PUSHTOTYPE_HOTKEY hotkey.keys (comma-separated)

CLI Reference

pushtotype                  Start the push-to-talk daemon
pushtotype config           Run the interactive setup wizard
pushtotype config --show    Print the current effective config
pushtotype devices          List available audio input devices
pushtotype test             Record 5 seconds and transcribe (verify setup)
pushtotype download [MODEL] Pre-download a Whisper model

Global flags:

-v, --verbose     Enable debug logging (shows per-step timings)
-q, --quiet       Suppress all output except errors
--log-file PATH   Write logs to a file
--model NAME      Override model (e.g. small.en)
--hotkey COMBO    Override hotkey (e.g. ctrl+shift+s)
--device INDEX    Override audio device index
--no-feedback     Disable start/stop beeps

Troubleshooting

Permission denied on /dev/input/

You need to be in the input group:

sudo usermod -aG input $USER
# Log out and back in

xdotool not found

sudo apt install xdotool

Text doesn't appear in my terminal

Terminals use Ctrl+Shift+V to paste, but PushToType uses xdotool type which bypasses the clipboard entirely — it should work in all terminals without any special config.

CUDA not available

PushToType automatically falls back to CPU. Transcription will be slower (~1-3s per 5s of audio vs ~0.2s on GPU). Check pushtotype -v startup output to see which device is being used.

Model download fails / slow

Models are cached in ~/.cache/huggingface/hub/ after the first download. Pre-download manually:

pushtotype download base.en

wtype or wl-copy not found (Wayland)

sudo apt install wtype wl-clipboard

Known Limitations

  • English only (base.en model)
  • No AMD GPU (ROCm) support
  • Wayland session detection relies on XDG_SESSION_TYPE or WAYLAND_DISPLAY
  • No GUI — terminal only

Contributing

See CONTRIBUTING.md. Issues and PRs welcome.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pushtotype-0.1.0.tar.gz (291.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pushtotype-0.1.0-py3-none-any.whl (200.6 kB view details)

Uploaded Python 3

File details

Details for the file pushtotype-0.1.0.tar.gz.

File metadata

  • Download URL: pushtotype-0.1.0.tar.gz
  • Upload date:
  • Size: 291.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for pushtotype-0.1.0.tar.gz
Algorithm Hash digest
SHA256 16f8f72959cdd2295dc9ade1c01eda9fbd9d32dfb201c1eb075ff7c03ec61077
MD5 df458622dfe71658c0ddbcd6eae1d687
BLAKE2b-256 d73fb9e12237c7e7eaa4db116f5a2d1b452e1a9154b781fb7c2d4489819b9827

See more details on using hashes here.

File details

Details for the file pushtotype-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pushtotype-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 200.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for pushtotype-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bfc68bb7977d5621c8aecc6c95b98424316c5fe91d0d8c952f2bf4e28e1da761
MD5 f901810351929f01c36af49975f199e7
BLAKE2b-256 9ee860c6b868e6811348399337e87c91ec16f35ca5593a7d33f79f9d7525568e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page