跨平台语音输入工具 —— 按住快捷键说话,松开自动输入(SenseVoice ONNX 本地推理,支持中英日韩粤混合)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pkuyplijing

These details have not been verified by PyPI

Project description

Whisper Input

跨平台语音输入工具 —— 按住快捷键说话，松开后自动将识别结果输入到当前焦点窗口。

使用达摩院官方 SenseVoice-Small ONNX 量化版（通过 Microsoft onnxruntime 直接推理），本地离线可用，支持中英日韩粤语混合识别，自带标点 / 反向文本规范化 / 大小写。模型首次启动从 ModelScope 国内 CDN 拉取（~231 MB），之后永久离线。

支持 Linux (X11) 和 macOS。

功能特性

本地语音识别，离线可用
中英文等多语种混合输入
可配置快捷键（支持区分左右修饰键）
浏览器设置界面 + 系统托盘
支持开机自启动
自动识别平台，选择对应后端

系统要求

通用

Python 3.12 + uv（推荐）或 pipx，任选其一预装：
```
curl -LsSf https://astral.sh/uv/install.sh | sh
```

Linux

Ubuntu 24.04+ / Debian 13+（X11 桌面环境，较老发行版因缺少 libgirepository-2.0-dev 无法安装）
任意 x86_64 CPU（推理用 onnxruntime CPU，RTF ≈ 0.1，短句识别延迟 < 1 秒）

macOS

macOS 12+ (Monterey 或更高)
Apple Silicon（推荐）或 Intel Mac 均可，都走 CPU ONNX 推理

安装

macOS

# 装系统依赖
brew install portaudio

# 装工具本体
uv tool install whisper-input
# 或 pipx install whisper-input

# 运行
whisper-input

首次运行需要在「系统设置 > 隐私与安全性」中授予权限：

辅助功能 和 输入监听（全局热键监听和文字输入）
麦克风（语音录制，首次录音时系统会弹出授权对话框）

⚠️ 注意：PyPI 装出来的 whisper-input 实际运行的是 ~/.local/share/uv/tools/whisper-input/bin/python（pipx 装的路径是 ~/.local/pipx/venvs/whisper-input/bin/python），macOS 系统权限对话框弹出的是这个 Python 二进制，不是"Whisper Input.app"。请把对应路径的 Python 加入辅助功能 / 输入监听白名单。

Linux

# 装系统依赖（各包用途见下方表格）
sudo apt install xdotool xclip pulseaudio-utils libportaudio2 \
                 libgirepository-2.0-dev libcairo2-dev gir1.2-gtk-3.0 \
                 gir1.2-ayatanaappindicator3-0.1

# 把自己加进 input 组(evdev 读 /dev/input/* 需要)
sudo usermod -aG input $USER && newgrp input

# 装工具本体
uv tool install whisper-input
# 或 pipx install whisper-input

# 运行
whisper-input

系统依赖说明：

包名	项目功能	说明
`xdotool`、`xclip`	文字输入	xclip 读写 X11 剪贴板，xdotool 模拟 Shift+Insert 触发粘贴
`libportaudio2`	语音录制	PortAudio 音频库，Python 包 `sounddevice` 的运行时依赖
`pulseaudio-utils`	提示音	提供 `paplay` 命令，播放录音开始/结束提示音
`libgirepository-2.0-dev`、`libcairo2-dev`	编译依赖	`pygobject`（Python 的 GTK 绑定，录音浮窗用）和 `pycairo`（pygobject 的底层依赖）编译 C 扩展时需要的头文件，安装完成后不再使用
`gir1.2-gtk-3.0`	录音浮窗	GTK 3 类型库，`pygobject` 通过它调用 GTK 绘制录音状态浮窗
`gir1.2-ayatanaappindicator3-0.1`	系统托盘图标	AppIndicator 类型库，Python 包 `pystray` 在 Linux 上绘制托盘图标的运行时依赖

首次运行 whisper-input 会通过 modelscope.snapshot_download 自动从达摩院 ModelScope CDN 拉取 SenseVoice ONNX 模型（~231 MB），缓存到 ~/.cache/modelscope/hub/。一次成功后永久离线。

从源码安装（贡献者）

git clone https://github.com/pkulijing/whisper-input
cd whisper-input
bash scripts/setup_macos.sh   # 或 setup_linux.sh
uv run whisper-input

运行选项

# 指定快捷键
whisper-input -k KEY_FN          # macOS: Fn/Globe 键
whisper-input -k KEY_RIGHTALT    # Linux: 右 Alt 键

# 更多选项
whisper-input --help

启动后会自动打开浏览器设置页面，也可通过系统托盘图标访问。

发版流程（维护者）

PyPI 分发走 GitHub Actions tag 触发 + Trusted Publishing (OIDC)：

在 pyproject.toml 中 bump version 字段
git commit -am "release: v0.5.1" 并 push 到 master
git tag v0.5.1 && git push --tags
.github/workflows/release.yml 自动触发：校验 tag 和 version 一致 → uv build → pypa/gh-action-pypi-publish 发到 PyPI → 创建 GitHub Release

使用方法

启动程序后，按住快捷键开始录音
- macOS 默认：Fn (Globe) 键
- Linux 默认：右 Ctrl 键
对着麦克风说话
松开快捷键，等待识别完成
识别结果自动输入到当前光标位置

配置

配置文件 config.yaml，也可通过浏览器设置界面修改：

配置项	说明	macOS 默认	Linux 默认
`hotkey`	触发快捷键	`KEY_RIGHTMETA`	`KEY_RIGHTCTRL`
`sensevoice.language`	识别语种	`auto`	`auto`
`sensevoice.use_itn`	反向文本规范化	`true`	`true`
`input_method`	输入方式	`clipboard`	`clipboard`
`sound.enabled`	录音提示音	`true`	`true`

已知限制

Linux 仅支持 X11，暂不支持 Wayland
Super/Win 键在 GNOME 下会被桌面拦截，不建议使用
macOS 需要辅助功能权限才能监听全局热键
首次运行需下载 SenseVoice ONNX 模型（约 231MB，从达摩院 ModelScope 官方仓库直连）

技术架构

整个项目采用 src layout,所有 Python 代码在 src/whisper_input/ 下,是一个可 pip install -e . 安装的真 package。入口点是 console script whisper-input(等价于 python -m whisper_input)。

按住快捷键 → HotkeyListener (whisper_input.backends) → AudioRecorder (sounddevice)
松开快捷键 → stt.SenseVoiceSTT (onnxruntime) → InputMethod → 文本输入到焦点窗口

平台后端（whisper_input.backends）运行时按 sys.platform 自动选择：

Linux: evdev 读键盘事件 + xclip/xdotool 剪贴板粘贴
macOS: pynput 全局键盘监听 + pbcopy/pbpaste + Cmd+V 粘贴

STT 推理层（whisper_input.stt）：

模型：达摩院官方 iic/SenseVoiceSmall-onnx（量化版），通过 modelscope.snapshot_download 从 ModelScope 国内 CDN 下载，缓存到 ~/.cache/modelscope/hub/
运行时：Microsoft 官方 onnxruntime，不依赖 torch
特征提取、BPE 解码、meta 标签后处理：从达摩院官方 funasr_onnx 包移植（MIT 协议，~250 行纯 Python），和 FunASR 位对齐
依赖树：onnxruntime + kaldi-native-fbank + sentencepiece + numpy + modelscope（modelscope base 仅 36 MB，不含 torch/transformers）

共同特性：

修饰键按下后有 300ms 延迟，用于区分组合键（如 Ctrl+C）和单独触发
剪贴板粘贴而非模拟按键，避免中文输入乱码
统一 CPU 推理路径，macOS/Linux 代码零差异

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pkuyplijing

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.0

Apr 16, 2026

0.6.0b2 pre-release

Apr 16, 2026

0.6.0b1 pre-release

Apr 16, 2026

0.6.0a3 pre-release

Apr 16, 2026

0.6.0a2 pre-release

Apr 16, 2026

This version

0.5.2

Apr 16, 2026

0.5.1 yanked

Apr 16, 2026

Reason this release was yanked:

failed CI

0.5.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_input-0.5.2.tar.gz (622.0 kB view details)

Uploaded Apr 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whisper_input-0.5.2-py3-none-any.whl (62.3 kB view details)

Uploaded Apr 16, 2026 Python 3

File details

Details for the file whisper_input-0.5.2.tar.gz.

File metadata

Download URL: whisper_input-0.5.2.tar.gz
Upload date: Apr 16, 2026
Size: 622.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for whisper_input-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`e8ab4259c37062cbf43cce4af8df2cbf45ca80673564aabc44bd20173d8a50da`
MD5	`044f7a153472e97c01f87d87e53f2ae1`
BLAKE2b-256	`fb8861b4d3ef1e8335fc563e0d8f9b7fe4ad20a567dc08d82a8b39c964b8517a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_input-0.5.2.tar.gz:

Publisher: release.yml on pkulijing/whisper-input

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whisper_input-0.5.2.tar.gz
- Subject digest: e8ab4259c37062cbf43cce4af8df2cbf45ca80673564aabc44bd20173d8a50da
- Sigstore transparency entry: 1316915720
- Sigstore integration time: Apr 16, 2026
Source repository:
- Permalink: pkulijing/whisper-input@9c236f5d679c7db0b97b92d6fc560c4622960986
- Branch / Tag: refs/tags/v0.5.2
- Owner: https://github.com/pkulijing
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@9c236f5d679c7db0b97b92d6fc560c4622960986
- Trigger Event: push

File details

Details for the file whisper_input-0.5.2-py3-none-any.whl.

File metadata

Download URL: whisper_input-0.5.2-py3-none-any.whl
Upload date: Apr 16, 2026
Size: 62.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for whisper_input-0.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a28884e676e1d88d1b98d9193170e34887030536b1df6a8d86fcd09e09c30bd`
MD5	`161ad33c603dbe5829b2fa69203225b7`
BLAKE2b-256	`950869ee6128026618f8331ddb627e9917ec096a6848945436022001b55f396e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for whisper_input-0.5.2-py3-none-any.whl:

Publisher: release.yml on pkulijing/whisper-input

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: whisper_input-0.5.2-py3-none-any.whl
- Subject digest: 7a28884e676e1d88d1b98d9193170e34887030536b1df6a8d86fcd09e09c30bd
- Sigstore transparency entry: 1316915728
- Sigstore integration time: Apr 16, 2026
Source repository:
- Permalink: pkulijing/whisper-input@9c236f5d679c7db0b97b92d6fc560c4622960986
- Branch / Tag: refs/tags/v0.5.2
- Owner: https://github.com/pkulijing
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@9c236f5d679c7db0b97b92d6fc560c4622960986
- Trigger Event: push

whisper-input 0.5.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Whisper Input

功能特性

系统要求

通用

Linux

macOS

安装

macOS

Linux

从源码安装（贡献者）

运行选项

发版流程（维护者）

使用方法

配置

已知限制

技术架构

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance