Skip to main content

跨平台语音输入工具 —— 按住快捷键说话,松开自动输入(SenseVoice ONNX 本地推理,支持中英日韩粤混合)

Project description

English | 中文

Whisper Input

Build codecov PyPI

Cross-platform voice input tool — hold a hotkey, speak, release to have speech transcribed and typed into the focused window.

Uses the official DAMO Academy SenseVoice-Small ONNX quantized model (direct inference via Microsoft onnxruntime), fully offline after first download. Supports Chinese, English, Japanese, Korean, and Cantonese with built-in punctuation, inverse text normalization, and casing. The model is downloaded from ModelScope CDN (~231 MB) on first launch, then works permanently offline.

Supports Linux (X11) and macOS.

Features

  • Local speech recognition, works offline
  • Multi-language mixed input (Chinese, English, etc.)
  • Configurable hotkey (distinguishes left/right modifier keys)
  • Browser-based settings UI + system tray
  • Auto-start on login
  • Automatic platform detection with matching backend

System Requirements

Common

  • Python 3.12 + uv (recommended) or pipx:

    curl -LsSf https://astral.sh/uv/install.sh | sh
    

Linux

  • Ubuntu 24.04+ / Debian 13+ (X11 desktop environment)
  • Any x86_64 CPU (onnxruntime CPU inference, RTF ~ 0.1, latency < 1s for short utterances)

macOS

  • macOS 12+ (Monterey or later)
  • Apple Silicon (recommended) or Intel Mac, both use CPU ONNX inference

Installation

macOS

# Install system dependency
brew install portaudio

# Install the tool (--compile-bytecode skips the first-run .pyc compile step)
uv tool install --compile-bytecode whisper-input
# or: pipx install whisper-input

# One-time setup: install .app bundle + download STT model (~231 MB)
whisper-input --init

# Run
whisper-input

First-run permissions required in System Settings > Privacy & Security:

  1. Accessibility (for global hotkey listening and text input)
  2. Microphone (for voice recording; the system will prompt on first recording)

Note: On first run (or via whisper-input --init), the tool installs a minimal .app bundle at ~/Applications/Whisper Input.app. macOS permission dialogs and System Settings entries will show "Whisper Input" — grant Accessibility to that entry. To fully uninstall, run whisper-input --uninstall before uv tool uninstall whisper-input.

Linux

# Install system dependencies (see table below for details)
sudo apt install xdotool xclip pulseaudio-utils libportaudio2 \
                 libgirepository-2.0-dev libcairo2-dev gir1.2-gtk-3.0 \
                 gir1.2-ayatanaappindicator3-0.1

# Add yourself to the input group (evdev needs /dev/input/* access)
sudo usermod -aG input $USER && newgrp input

# Install the tool (--compile-bytecode skips the first-run .pyc compile step)
uv tool install --compile-bytecode whisper-input
# or: pipx install whisper-input

# One-time setup: download STT model (~231 MB)
whisper-input --init

# Run
whisper-input

System dependency reference:

Package Purpose Notes
xdotool, xclip Text input xclip for X11 clipboard, xdotool to simulate Shift+Insert paste
libportaudio2 Audio recording PortAudio library, runtime dependency of Python sounddevice
pulseaudio-utils Sound notifications Provides paplay for start/stop recording sounds
libgirepository-2.0-dev, libcairo2-dev Build dependencies Headers for compiling pygobject and pycairo C extensions
gir1.2-gtk-3.0 Recording overlay GTK 3 typelib for the recording status overlay
gir1.2-ayatanaappindicator3-0.1 System tray icon AppIndicator typelib, runtime dependency of pystray on Linux

On first run, whisper-input downloads the SenseVoice ONNX model (~231 MB) via modelscope.snapshot_download to ~/.cache/modelscope/hub/. After one successful download, the app is fully offline.

From Source (Contributors)

git clone https://github.com/pkulijing/whisper-input
cd whisper-input
bash scripts/setup_macos.sh   # or setup_linux.sh
uv run whisper-input

Usage

# Specify hotkey
whisper-input -k KEY_FN          # macOS: Fn/Globe key
whisper-input -k KEY_RIGHTALT    # Linux: Right Alt key

# More options
whisper-input --help

A browser settings page opens automatically on startup; you can also access it via the system tray icon.

How to use

  1. Start the app, then hold the hotkey to begin recording
    • macOS default: Right Command key
    • Linux default: Right Ctrl key
  2. Speak into the microphone
  3. Release the hotkey, wait for recognition
  4. The recognized text is automatically typed at the cursor position

Release Flow (Maintainers)

PyPI distribution via GitHub Actions tag trigger + Trusted Publishing (OIDC):

  1. Bump version in pyproject.toml
  2. git commit -am "release: v0.5.1" and push to master
  3. git tag v0.5.1 && git push --tags
  4. .github/workflows/release.yml triggers automatically: verify tag matches version -> uv build -> publish to PyPI via pypa/gh-action-pypi-publish -> create GitHub Release

Configuration

Config file config.yaml, also editable via the browser settings UI:

Setting Description macOS Default Linux Default
hotkey Trigger hotkey KEY_RIGHTMETA KEY_RIGHTCTRL
sensevoice.use_itn Inverse text normalization true true
sound.enabled Recording sound notification true true
ui.language Interface language (zh/en/fr) zh zh

Known Limitations

  • Linux supports X11 only; Wayland is not yet supported
  • Super/Win key is intercepted by GNOME desktop, not recommended as hotkey
  • macOS requires Accessibility permission for global hotkey monitoring
  • First run downloads the SenseVoice ONNX model (~231 MB from DAMO Academy ModelScope)

Technical Architecture

The project uses src layout with all Python code under src/whisper_input/, installable as a standard package. The entry point is the whisper-input console script (equivalent to python -m whisper_input).

Hold hotkey -> HotkeyListener (whisper_input.backends) -> AudioRecorder (sounddevice)
Release     -> stt.SenseVoiceSTT (onnxruntime) -> InputMethod -> Text typed into focused window

Platform backends (whisper_input.backends) auto-select at runtime via sys.platform:

  • Linux: evdev for keyboard events + xclip/xdotool clipboard paste
  • macOS: pynput global keyboard listener + pbcopy/pbpaste + Cmd+V paste

STT inference (whisper_input.stt):

  • Model: DAMO Academy official iic/SenseVoiceSmall-onnx (quantized), downloaded via modelscope.snapshot_download to ~/.cache/modelscope/hub/
  • Runtime: Microsoft official onnxruntime, no torch dependency
  • Feature extraction, BPE decoding, meta-tag post-processing: ported from DAMO's funasr_onnx (MIT license, ~250 lines pure Python), bit-aligned with FunASR
  • Dependency tree: onnxruntime + kaldi-native-fbank + sentencepiece + numpy + modelscope (modelscope base is only 36 MB, no torch/transformers)

Common features:

  • 300ms delay on modifier key press to distinguish combos (e.g., Ctrl+C) from single triggers
  • Clipboard paste instead of key simulation, avoiding CJK encoding issues
  • Unified CPU inference path, zero code difference between macOS/Linux

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_input-0.6.1.tar.gz (736.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisper_input-0.6.1-py3-none-any.whl (158.9 kB view details)

Uploaded Python 3

File details

Details for the file whisper_input-0.6.1.tar.gz.

File metadata

  • Download URL: whisper_input-0.6.1.tar.gz
  • Upload date:
  • Size: 736.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for whisper_input-0.6.1.tar.gz
Algorithm Hash digest
SHA256 185d97cf5c305525dc07e5302c20328a0c8a893a2c8d09838c5ea9c272dd8e5f
MD5 a6ec6cdc95bde0fabca77ba3a1e5b78b
BLAKE2b-256 1e4667dfeed3760c0b8d9b8e71ded0342d36d4c2b16e4411ebf6cea54be07b5b

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_input-0.6.1.tar.gz:

Publisher: release.yml on pkulijing/whisper-input

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file whisper_input-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: whisper_input-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 158.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for whisper_input-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3f320ca08be2e3ca50e1a759bb3ae8638daeb75c873a0669e1516b17b80ca165
MD5 9acc7e4f3541cf4e512784b29b4e365e
BLAKE2b-256 4f401819191fe7e086b6f3cf884e0eae4ce9fb88748327ef305b6cfb84ba77d0

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_input-0.6.1-py3-none-any.whl:

Publisher: release.yml on pkulijing/whisper-input

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page