跨平台语音输入工具 —— 按住快捷键说话,松开自动输入(Qwen3-ASR ONNX 本地推理,中英日韩粤 + 技术术语)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pkuyplijing

These details have not been verified by PyPI

Project description

English | 中文

Whisper Input

Cross-platform voice input tool — hold a hotkey, speak, release to have speech transcribed and typed into the focused window.

Uses Alibaba Qwen team's Qwen3-ASR as the STT engine — an encoder-decoder LLM-style ASR with strong multilingual coverage (Chinese, English, Japanese, Korean, Cantonese, and more), built-in punctuation, inverse text normalization, and casing. Direct inference via Microsoft onnxruntime, fully offline after first download. Two variants are available via the settings page: 0.6B (default, ~990 MB, ~1.5s for a 10s utterance on Apple Silicon) and 1.7B (~2.4 GB, highest accuracy).

Supports Linux (X11) and macOS.

Features

Local speech recognition, works offline
Multi-language mixed input (Chinese, English, etc.)
Configurable hotkey (distinguishes left/right modifier keys)
Browser-based settings UI + system tray
Auto-start on login
Automatic platform detection with matching backend

System Requirements

Linux

Ubuntu 24.04+ / Debian 13+ (X11 desktop environment)
Any x86_64 CPU (onnxruntime CPU inference, RTF ~ 0.1, latency < 1s for short utterances)

macOS

macOS 12+ (Monterey or later)
Apple Silicon (recommended) or Intel Mac, both use CPU ONNX inference

Installation

One-liner (recommended)

On macOS or Linux:

curl -LsSf https://raw.githubusercontent.com/pkulijing/whisper-input/master/install.sh | sh

The script interactively picks a language (中文 / English), then installs uv, Python 3.12, required system libraries, and whisper-input itself. It runs whisper-input --init (pre-downloads the ~990 MB Qwen3-ASR 0.6B ONNX model; on macOS also installs ~/Applications/Whisper Input.app) and finally asks whether to launch the app immediately. It's safe to re-run — already-installed pieces are skipped, and uv tool install --upgrade upgrades whisper-input to the latest version.

On Linux the script will offer to add the current user to the input group (requires sudo; takes effect after a logout/login cycle).

Note: curl | sh trusts this repo. If you want to review the script first, download it with curl -LsSf <URL> -o install.sh and inspect it before running.

Manual installation

macOS

# Install system dependency
brew install portaudio

# Install the tool (--compile-bytecode skips the first-run .pyc compile step)
uv tool install --compile-bytecode whisper-input

# One-time setup: install .app bundle + download STT model (~990 MB for Qwen3-ASR 0.6B)
whisper-input --init

# Run
whisper-input

First-run permissions required in System Settings > Privacy & Security:

Accessibility (for global hotkey listening and text input)
Microphone (for voice recording; the system will prompt on first recording)

Note: On first run (or via whisper-input --init), the tool installs a minimal .app bundle at ~/Applications/Whisper Input.app. macOS permission dialogs and System Settings entries will show "Whisper Input" — grant Accessibility to that entry. To fully uninstall, run whisper-input --uninstall before uv tool uninstall whisper-input.

Linux

# Install system dependencies (see table below for details)
sudo apt install xdotool xclip pulseaudio-utils libportaudio2 \
                 libgirepository-2.0-dev libcairo2-dev gir1.2-gtk-3.0 \
                 gir1.2-ayatanaappindicator3-0.1

# Add yourself to the input group (evdev needs /dev/input/* access)
sudo usermod -aG input $USER && newgrp input

# Install the tool (--compile-bytecode skips the first-run .pyc compile step)
uv tool install --compile-bytecode whisper-input

# One-time setup: download STT model (~990 MB for Qwen3-ASR 0.6B)
whisper-input --init

# Run
whisper-input

System dependency reference:

Package	Purpose	Notes
`xdotool`, `xclip`	Text input	xclip for X11 clipboard, xdotool to simulate Shift+Insert paste
`libportaudio2`	Audio recording	PortAudio library, runtime dependency of Python `sounddevice`
`pulseaudio-utils`	Sound notifications	Provides `paplay` for start/stop recording sounds
`libgirepository-2.0-dev`, `libcairo2-dev`	Build dependencies	Headers for compiling `pygobject` and `pycairo` C extensions
`gir1.2-gtk-3.0`	Recording overlay	GTK 3 typelib for the recording status overlay
`gir1.2-ayatanaappindicator3-0.1`	System tray icon	AppIndicator typelib, runtime dependency of `pystray` on Linux

On first run, whisper-input downloads the Qwen3-ASR ONNX model (~990 MB for the 0.6B default) via modelscope.snapshot_download to ~/.cache/modelscope/hub/. After one successful download, the app is fully offline. You can switch to the 1.7B variant later from the in-app settings page (pulls an additional ~2.4 GB).

From Source (Contributors)

git clone https://github.com/pkulijing/whisper-input
cd whisper-input
bash scripts/setup.sh
uv run whisper-input

Usage

# Specify hotkey
whisper-input -k KEY_FN          # macOS: Fn/Globe key
whisper-input -k KEY_RIGHTALT    # Linux: Right Alt key

# More options
whisper-input --help

A browser settings page opens automatically on startup; you can also access it via the system tray icon.

How to use

Start the app, then hold the hotkey to begin recording
- macOS default: Right Command key
- Linux default: Right Ctrl key
Speak into the microphone
Release the hotkey, wait for recognition
The recognized text is automatically typed at the cursor position

Release Flow (Maintainers)

PyPI distribution via GitHub Actions tag trigger + Trusted Publishing (OIDC):

Bump version in pyproject.toml
git commit -am "release: v0.5.1" and push to master
git tag v0.5.1 && git push --tags
.github/workflows/release.yml triggers automatically: verify tag matches version -> uv build -> publish to PyPI via pypa/gh-action-pypi-publish -> create GitHub Release

Configuration

Config file config.yaml, also editable via the browser settings UI:

Setting	Description	macOS Default	Linux Default
`hotkey`	Trigger hotkey	`KEY_RIGHTMETA`	`KEY_RIGHTCTRL`
`qwen3.variant`	STT model size (`0.6B` / `1.7B`)	`0.6B`	`0.6B`
`sound.enabled`	Recording sound notification	`true`	`true`
`ui.language`	Interface language (zh/en/fr)	`zh`	`zh`

Known Limitations

Linux supports X11 only; Wayland is not yet supported
Super/Win key is intercepted by GNOME desktop, not recommended as hotkey
macOS requires Accessibility permission for global hotkey monitoring
First run downloads the Qwen3-ASR 0.6B ONNX model (~990 MB from ModelScope); switching to 1.7B later pulls another ~2.4 GB
Current flow is press-to-talk / release-to-transcribe (batch mode) — real-time streaming is planned for a future release

Technical Architecture

The project uses src layout with all Python code under src/whisper_input/, installable as a standard package. The entry point is the whisper-input console script (equivalent to python -m whisper_input).

Hold hotkey -> HotkeyListener (whisper_input.backends) -> AudioRecorder (sounddevice)
Release     -> stt.Qwen3ASRSTT (onnxruntime) -> InputMethod -> Text typed into focused window

Platform backends (whisper_input.backends) auto-select at runtime via sys.platform:

Linux: evdev for keyboard events + xclip/xdotool clipboard paste
macOS: pynput global keyboard listener + pbcopy/pbpaste + Cmd+V paste

STT inference (whisper_input.stt.qwen3):

Model: Qwen3-ASR ONNX int8 from zengshuishui/Qwen3-ASR-onnx on ModelScope, downloaded via modelscope.snapshot_download to ~/.cache/modelscope/hub/. Two variants side-by-side (0.6B / 1.7B), switchable via the settings page
Runtime: Microsoft official onnxruntime, no torch / transformers dependency
3-stage pipeline: conv_frontend.onnx → encoder.int8.onnx → decoder.int8.onnx (28-layer KV-cache autoregressive decoder)
Log-mel feature extraction: ~100 lines of pure numpy, bit-aligned with Whisper's reference extractor (rtol=1e-4)
Tokenization: HuggingFace tokenizers (Rust byte-level BPE, ~10 MB) loading Qwen3-ASR's vocab.json + merges.txt directly — no transformers dependency
Dependency tree: onnxruntime + tokenizers + modelscope + numpy (modelscope base is only 36 MB, no torch/transformers)

Common features:

300ms delay on modifier key press to distinguish combos (e.g., Ctrl+C) from single triggers
Clipboard paste instead of key simulation, avoiding CJK encoding issues
Unified CPU inference path, zero code difference between macOS/Linux

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pkuyplijing

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.9.0

Apr 25, 2026

0.8.0

Apr 25, 2026

0.8.0a2 pre-release

Apr 22, 2026

This version

0.8.0a1 pre-release

Apr 22, 2026

0.7.3

Apr 21, 2026

0.7.2

Apr 19, 2026

0.7.1

Apr 17, 2026

0.7.0

Apr 17, 2026

0.6.1

Apr 17, 2026

0.6.0

Apr 16, 2026

0.6.0b2 pre-release

Apr 16, 2026

0.6.0b1 pre-release

Apr 16, 2026

0.6.0a3 pre-release

Apr 16, 2026

0.6.0a2 pre-release

Apr 16, 2026

0.5.2

Apr 16, 2026

0.5.1 yanked

Apr 16, 2026

Reason this release was yanked:

failed CI

0.5.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_input-0.8.0a1.tar.gz (1.3 MB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whisper_input-0.8.0a1-py3-none-any.whl (178.1 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file whisper_input-0.8.0a1.tar.gz.

File metadata

Download URL: whisper_input-0.8.0a1.tar.gz
Upload date: Apr 22, 2026
Size: 1.3 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for whisper_input-0.8.0a1.tar.gz
Algorithm	Hash digest
SHA256	`cdd6efe314cfdacf21747327810da5460ef46f06467578db1380bbcd333c7ccd`
MD5	`1f340cb12e655afbc370945f19f4ede5`
BLAKE2b-256	`7393186450070ecb1f50e183877a17ea173469d05eee9a7cfb209ff84d1e3b75`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_input-0.8.0a1.tar.gz:

Publisher: release.yml on pkulijing/whisper-input

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whisper_input-0.8.0a1.tar.gz
- Subject digest: cdd6efe314cfdacf21747327810da5460ef46f06467578db1380bbcd333c7ccd
- Sigstore transparency entry: 1355723080
- Sigstore integration time: Apr 22, 2026
Source repository:
- Permalink: pkulijing/whisper-input@95921f81234e120028de976bb51fc2143222e237
- Branch / Tag: refs/tags/v0.8.0a1
- Owner: https://github.com/pkulijing
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@95921f81234e120028de976bb51fc2143222e237
- Trigger Event: push

File details

Details for the file whisper_input-0.8.0a1-py3-none-any.whl.

File metadata

Download URL: whisper_input-0.8.0a1-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 178.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for whisper_input-0.8.0a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`adc171204718a6793bba5695d9ff2e05f59517b85f701bf3c42ea578cb8b584d`
MD5	`f8ebdfc48769a15cc2c286b41a947823`
BLAKE2b-256	`b7ad4ec991485c1481e563318c3790eecfede4cd8b652b6537d31ab06c2a6bac`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_input-0.8.0a1-py3-none-any.whl:

Publisher: release.yml on pkulijing/whisper-input

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whisper_input-0.8.0a1-py3-none-any.whl
- Subject digest: adc171204718a6793bba5695d9ff2e05f59517b85f701bf3c42ea578cb8b584d
- Sigstore transparency entry: 1355723099
- Sigstore integration time: Apr 22, 2026
Source repository:
- Permalink: pkulijing/whisper-input@95921f81234e120028de976bb51fc2143222e237
- Branch / Tag: refs/tags/v0.8.0a1
- Owner: https://github.com/pkulijing
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@95921f81234e120028de976bb51fc2143222e237
- Trigger Event: push

whisper-input 0.8.0a1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Whisper Input

Features

System Requirements

Linux

macOS

Installation

One-liner (recommended)

Manual installation

macOS

Linux

From Source (Contributors)

Usage

How to use

Release Flow (Maintainers)

Configuration

Known Limitations

Technical Architecture

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance