跨平台语音输入工具 —— 按住快捷键说话,松开自动输入(SenseVoice ONNX 本地推理,支持中英日韩粤混合)
Reason this release was yanked:
failed CI
Project description
Whisper Input
跨平台语音输入工具 —— 按住快捷键说话,松开后自动将识别结果输入到当前焦点窗口。
使用达摩院官方 SenseVoice-Small ONNX 量化版(通过 Microsoft onnxruntime 直接推理),本地离线可用,支持中英日韩粤语混合识别,自带标点 / 反向文本规范化 / 大小写。模型首次启动从 ModelScope 国内 CDN 拉取(~231 MB),之后永久离线。
支持 Linux (X11) 和 macOS。
功能特性
- 本地语音识别,离线可用
- 中英文等多语种混合输入
- 可配置快捷键(支持区分左右修饰键)
- 浏览器设置界面 + 系统托盘
- 支持开机自启动
- 自动识别平台,选择对应后端
系统要求
通用
Linux
- Ubuntu 24.04+ / Debian 13+(X11 桌面环境,较老发行版因缺少
libgirepository-2.0-dev无法安装) - 任意 x86_64 CPU(推理用
onnxruntimeCPU,RTF ≈ 0.1,短句识别延迟 < 1 秒)
macOS
- macOS 12+ (Monterey 或更高)
- Apple Silicon(推荐)或 Intel Mac 均可,都走 CPU ONNX 推理
安装
macOS
# 装系统依赖
brew install portaudio
# 装工具本体
uv tool install whisper-input
# 或 pipx install whisper-input
# 运行
whisper-input
首次运行需要在「系统设置 > 隐私与安全性」中授予权限:
- 辅助功能 和 输入监听(全局热键监听和文字输入)
- 麦克风(语音录制,首次录音时系统会弹出授权对话框)
⚠️ 注意:PyPI 装出来的 whisper-input 实际运行的是 ~/.local/share/uv/tools/whisper-input/bin/python(pipx 装的路径是 ~/.local/pipx/venvs/whisper-input/bin/python),macOS 系统权限对话框弹出的是这个 Python 二进制,不是"Whisper Input.app"。请把对应路径的 Python 加入辅助功能 / 输入监听白名单。
Linux
# 装系统依赖
sudo apt install xdotool xclip pulseaudio-utils libportaudio2 \
libgirepository-2.0-dev libcairo2-dev gir1.2-gtk-3.0
# 把自己加进 input 组(evdev 读 /dev/input/* 需要)
sudo usermod -aG input $USER && newgrp input
# 装工具本体
uv tool install whisper-input
# 或 pipx install whisper-input
# 运行
whisper-input
首次运行 whisper-input 会通过 modelscope.snapshot_download 自动从达摩院 ModelScope CDN 拉取 SenseVoice ONNX 模型(~231 MB),缓存到 ~/.cache/modelscope/hub/。一次成功后永久离线。
从源码安装(贡献者)
git clone https://github.com/pkulijing/whisper-input
cd whisper-input
bash scripts/setup_macos.sh # 或 setup_linux.sh
uv run whisper-input
运行选项
# 指定快捷键
whisper-input -k KEY_FN # macOS: Fn/Globe 键
whisper-input -k KEY_RIGHTALT # Linux: 右 Alt 键
# 更多选项
whisper-input --help
启动后会自动打开浏览器设置页面,也可通过系统托盘图标访问。
发版流程(维护者)
PyPI 分发走 GitHub Actions tag 触发 + Trusted Publishing (OIDC):
- 在
pyproject.toml中 bumpversion字段 git commit -am "release: v0.5.1"并 push 到 mastergit tag v0.5.1 && git push --tags.github/workflows/release.yml自动触发:校验 tag 和 version 一致 →uv build→pypa/gh-action-pypi-publish发到 PyPI → 创建 GitHub Release
使用方法
- 启动程序后,按住快捷键开始录音
- macOS 默认:Fn (Globe) 键
- Linux 默认:右 Ctrl 键
- 对着麦克风说话
- 松开快捷键,等待识别完成
- 识别结果自动输入到当前光标位置
配置
配置文件 config.yaml,也可通过浏览器设置界面修改:
| 配置项 | 说明 | macOS 默认 | Linux 默认 |
|---|---|---|---|
hotkey |
触发快捷键 | KEY_RIGHTMETA |
KEY_RIGHTCTRL |
sensevoice.language |
识别语种 | auto |
auto |
sensevoice.use_itn |
反向文本规范化 | true |
true |
input_method |
输入方式 | clipboard |
clipboard |
sound.enabled |
录音提示音 | true |
true |
已知限制
- Linux 仅支持 X11,暂不支持 Wayland
- Super/Win 键在 GNOME 下会被桌面拦截,不建议使用
- macOS 需要辅助功能权限才能监听全局热键
- 首次运行需下载 SenseVoice ONNX 模型(约 231MB,从达摩院 ModelScope 官方仓库直连)
技术架构
整个项目采用 src layout,所有 Python 代码在 src/whisper_input/ 下,是一个
可 pip install -e . 安装的真 package。入口点是 console script
whisper-input(等价于 python -m whisper_input)。
按住快捷键 → HotkeyListener (whisper_input.backends) → AudioRecorder (sounddevice)
松开快捷键 → stt.SenseVoiceSTT (onnxruntime) → InputMethod → 文本输入到焦点窗口
平台后端(whisper_input.backends)运行时按 sys.platform 自动选择:
- Linux: evdev 读键盘事件 + xclip/xdotool 剪贴板粘贴
- macOS: pynput 全局键盘监听 + pbcopy/pbpaste + Cmd+V 粘贴
STT 推理层(whisper_input.stt):
- 模型:达摩院官方
iic/SenseVoiceSmall-onnx(量化版),通过modelscope.snapshot_download从 ModelScope 国内 CDN 下载,缓存到~/.cache/modelscope/hub/ - 运行时:Microsoft 官方
onnxruntime,不依赖 torch - 特征提取、BPE 解码、meta 标签后处理:从达摩院官方
funasr_onnx包移植(MIT 协议,~250 行纯 Python),和 FunASR 位对齐 - 依赖树:
onnxruntime + kaldi-native-fbank + sentencepiece + numpy + modelscope(modelscope base 仅 36 MB,不含 torch/transformers)
共同特性:
- 修饰键按下后有 300ms 延迟,用于区分组合键(如 Ctrl+C)和单独触发
- 剪贴板粘贴而非模拟按键,避免中文输入乱码
- 统一 CPU 推理路径,macOS/Linux 代码零差异
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whisper_input-0.5.1.tar.gz.
File metadata
- Download URL: whisper_input-0.5.1.tar.gz
- Upload date:
- Size: 621.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5fee4df9ddc1a73ec8ed72f932311a93d2208005ad0efa0f11c8cd71502d61ec
|
|
| MD5 |
11448739cda0a72f3b5ae966584caed9
|
|
| BLAKE2b-256 |
821c6444ae73515135ff244b3e6478ed371774cd1756b6291d2e813716b88106
|
Provenance
The following attestation bundles were made for whisper_input-0.5.1.tar.gz:
Publisher:
release.yml on pkulijing/whisper-input
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
whisper_input-0.5.1.tar.gz -
Subject digest:
5fee4df9ddc1a73ec8ed72f932311a93d2208005ad0efa0f11c8cd71502d61ec - Sigstore transparency entry: 1315534896
- Sigstore integration time:
-
Permalink:
pkulijing/whisper-input@3b63958c9164630ade737afd3a27ae7afafcf835 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/pkulijing
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3b63958c9164630ade737afd3a27ae7afafcf835 -
Trigger Event:
push
-
Statement type:
File details
Details for the file whisper_input-0.5.1-py3-none-any.whl.
File metadata
- Download URL: whisper_input-0.5.1-py3-none-any.whl
- Upload date:
- Size: 61.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52097f5c8067c513dfac72722f7d2b60a01cdeb4c07362068b5063bf860be44f
|
|
| MD5 |
0e6c6b81f4828daad467d156a2bfb9f2
|
|
| BLAKE2b-256 |
31b0c4eed0fd28d630fbe9b017657863fe4f6afd13cccfd20062647a009fed5b
|
Provenance
The following attestation bundles were made for whisper_input-0.5.1-py3-none-any.whl:
Publisher:
release.yml on pkulijing/whisper-input
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
whisper_input-0.5.1-py3-none-any.whl -
Subject digest:
52097f5c8067c513dfac72722f7d2b60a01cdeb4c07362068b5063bf860be44f - Sigstore transparency entry: 1315534975
- Sigstore integration time:
-
Permalink:
pkulijing/whisper-input@3b63958c9164630ade737afd3a27ae7afafcf835 -
Branch / Tag:
refs/tags/v0.5.1 - Owner: https://github.com/pkulijing
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3b63958c9164630ade737afd3a27ae7afafcf835 -
Trigger Event:
push
-
Statement type: