叨逼叨 - 跨平台本地语音输入工具(中英日韩粤,Qwen3-ASR ONNX 本地推理)
Project description
English | 中文
Daobidao
🎉 Renamed: this project used to be called
whisper-input. Starting with v1.0.0 it has been renamed todaobidao(叨逼叨, Chinese onomatopoeia for non-stop talking — fits a voice input tool). The legacy package name still works (pip install whisper-inputredirects todaobidao), but new releases land on the new name. Useuv tool install daobidaogoing forward. See docs/29-改名为daobidao/SUMMARY.md.
Cross-platform voice input tool — hold a hotkey, speak, release to have speech transcribed and typed into the focused window.
Uses Alibaba Qwen team's Qwen3-ASR as the STT engine — an encoder-decoder LLM-style ASR with strong multilingual coverage (Chinese, English, Japanese, Korean, Cantonese, and more), built-in punctuation, inverse text normalization, and casing. Direct inference via Microsoft onnxruntime, fully offline after first download. Two variants are available via the settings page: 0.6B (default, ~990 MB, ~1.5s for a 10s utterance on Apple Silicon) and 1.7B (~2.4 GB, highest accuracy).
Supports Linux (X11) and macOS.
Features
- Local speech recognition, works offline
- Multi-language mixed input (Chinese, English, etc.)
- Configurable hotkey (distinguishes left/right modifier keys)
- Browser-based settings UI + system tray
- Auto-start on login
- Automatic platform detection with matching backend
System Requirements
Linux
- Ubuntu 24.04+ / Debian 13+ (X11 desktop environment)
- Any x86_64 CPU (
onnxruntimeCPU inference, RTF ~ 0.1, latency < 1s for short utterances)
macOS
- macOS 12+ (Monterey or later)
- Apple Silicon (recommended) or Intel Mac, both use CPU ONNX inference
Installation
One-liner (recommended)
On macOS or Linux:
curl -LsSf https://raw.githubusercontent.com/pkulijing/daobidao/master/install.sh | sh
The script interactively picks a language (中文 / English), then installs uv, Python 3.12, required system libraries, and daobidao itself. It runs daobidao --init (pre-downloads the ~990 MB Qwen3-ASR 0.6B ONNX model; on macOS also installs ~/Applications/Daobidao.app) and finally asks whether to launch the app immediately. It's safe to re-run — already-installed pieces are skipped, and uv tool install --upgrade upgrades daobidao to the latest version.
On Linux the script will offer to add the current user to the input group (requires sudo; takes effect after a logout/login cycle).
Note:
curl | shtrusts this repo. If you want to review the script first, download it withcurl -LsSf <URL> -o install.shand inspect it before running.
Manual installation
macOS
# Install system dependency
brew install portaudio
# Install the tool (--compile-bytecode skips the first-run .pyc compile step)
uv tool install --compile-bytecode daobidao
# One-time setup: install .app bundle + download STT model (~990 MB for Qwen3-ASR 0.6B)
daobidao --init
# Run
daobidao
First-run permissions required in System Settings > Privacy & Security:
- Accessibility (for global hotkey listening and text input)
- Microphone (for voice recording; the system will prompt on first recording)
Note: On first run (or via
daobidao --init), the tool installs a minimal.appbundle at~/Applications/Daobidao.app. macOS permission dialogs and System Settings entries will show "Daobidao" — grant Accessibility to that entry. To fully uninstall, rundaobidao --uninstallbeforeuv tool uninstall daobidao.
Linux
# Install system dependencies (see table below for details)
sudo apt install xdotool xclip pulseaudio-utils libportaudio2 \
libgirepository-2.0-dev libcairo2-dev gir1.2-gtk-3.0 \
gir1.2-ayatanaappindicator3-0.1
# Add yourself to the input group (evdev needs /dev/input/* access)
sudo usermod -aG input $USER && newgrp input
# Install the tool (--compile-bytecode skips the first-run .pyc compile step)
uv tool install --compile-bytecode daobidao
# One-time setup: download STT model (~990 MB for Qwen3-ASR 0.6B)
daobidao --init
# Run
daobidao
System dependency reference:
| Package | Purpose | Notes |
|---|---|---|
xdotool, xclip |
Text input | xclip for X11 clipboard, xdotool to simulate Shift+Insert paste |
libportaudio2 |
Audio recording | PortAudio library, runtime dependency of Python sounddevice |
pulseaudio-utils |
Sound notifications | Provides paplay for start/stop recording sounds |
libgirepository-2.0-dev, libcairo2-dev |
Build dependencies | Headers for compiling pygobject and pycairo C extensions |
gir1.2-gtk-3.0 |
Recording overlay | GTK 3 typelib for the recording status overlay |
gir1.2-ayatanaappindicator3-0.1 |
System tray icon | AppIndicator typelib, runtime dependency of pystray on Linux |
On first run, daobidao downloads the Qwen3-ASR ONNX model (~990 MB for the 0.6B default) via modelscope.snapshot_download to ~/.cache/modelscope/hub/. After one successful download, the app is fully offline. You can switch to the 1.7B variant later from the in-app settings page (pulls an additional ~2.4 GB).
From Source (Contributors)
git clone https://github.com/pkulijing/daobidao
cd daobidao
bash scripts/setup.sh
uv run daobidao
Usage
# Specify hotkey
daobidao -k KEY_FN # macOS: Fn/Globe key
daobidao -k KEY_RIGHTALT # Linux: Right Alt key
# More options
daobidao --help
A browser settings page opens automatically on startup; you can also access it via the system tray icon.
How to use
- Start the app, then hold the hotkey to begin recording
- macOS default: Right Command key
- Linux default: Right Ctrl key
- Speak into the microphone
- Release the hotkey, wait for recognition
- The recognized text is automatically typed at the cursor position
Release Flow (Maintainers)
PyPI distribution via GitHub Actions tag trigger + Trusted Publishing (OIDC):
- Bump
versioninpyproject.toml git commit -am "release: v0.5.1"and push to mastergit tag v0.5.1 && git push --tags.github/workflows/release.ymltriggers automatically: verify tag matches version ->uv build-> publish to PyPI viapypa/gh-action-pypi-publish-> create GitHub Release
Configuration
Config file config.yaml, also editable via the browser settings UI:
| Setting | Description | macOS Default | Linux Default |
|---|---|---|---|
hotkey |
Trigger hotkey | KEY_RIGHTMETA |
KEY_RIGHTCTRL |
qwen3.variant |
STT model size (0.6B / 1.7B) |
0.6B |
0.6B |
sound.enabled |
Recording sound notification | true |
true |
ui.language |
Interface language (zh/en/fr) | zh |
zh |
Known Limitations
- Linux supports X11 only; Wayland is not yet supported
- Super/Win key is intercepted by GNOME desktop, not recommended as hotkey
- macOS requires Accessibility permission for global hotkey monitoring
- First run downloads the Qwen3-ASR 0.6B ONNX model (~990 MB from ModelScope); switching to 1.7B later pulls another ~2.4 GB
- Current flow is press-to-talk / release-to-transcribe (batch mode) — real-time streaming is planned for a future release
Technical Architecture
The project uses src layout with all Python code under src/daobidao/, installable as a standard package. The entry point is the daobidao console script (equivalent to python -m daobidao).
Hold hotkey -> HotkeyListener (daobidao.backends) -> AudioRecorder (sounddevice)
Release -> stt.Qwen3ASRSTT (onnxruntime) -> InputMethod -> Text typed into focused window
Platform backends (daobidao.backends) auto-select at runtime via sys.platform:
- Linux: evdev for keyboard events + xclip/xdotool clipboard paste
- macOS: pynput global keyboard listener + pbcopy/pbpaste + Cmd+V paste
STT inference (daobidao.stt.qwen3):
- Model: Qwen3-ASR ONNX int8 from
zengshuishui/Qwen3-ASR-onnxon ModelScope, downloaded viamodelscope.snapshot_downloadto~/.cache/modelscope/hub/. Two variants side-by-side (0.6B / 1.7B), switchable via the settings page - Runtime: Microsoft official
onnxruntime, no torch / transformers dependency - 3-stage pipeline:
conv_frontend.onnx→encoder.int8.onnx→decoder.int8.onnx(28-layer KV-cache autoregressive decoder) - Log-mel feature extraction: ~100 lines of pure numpy, bit-aligned with Whisper's reference extractor (rtol=1e-4)
- Tokenization: HuggingFace
tokenizers(Rust byte-level BPE, ~10 MB) loading Qwen3-ASR'svocab.json+merges.txtdirectly — notransformersdependency - Dependency tree:
onnxruntime + tokenizers + modelscope + numpy(modelscope base is only 36 MB, no torch/transformers)
Common features:
- 300ms delay on modifier key press to distinguish combos (e.g., Ctrl+C) from single triggers
- Clipboard paste instead of key simulation, avoiding CJK encoding issues
- Unified CPU inference path, zero code difference between macOS/Linux
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file daobidao-1.0.0.tar.gz.
File metadata
- Download URL: daobidao-1.0.0.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be13603864d54e671ea6c4dd603d14affbac78aed21844a66ea635ae3761e4d6
|
|
| MD5 |
985107b04d9bb4d0854b93376bd2abbd
|
|
| BLAKE2b-256 |
dcfa9d20e7c4a91f44626e299008acc665bf2acf9eed5a77ffe3395fee107e79
|
Provenance
The following attestation bundles were made for daobidao-1.0.0.tar.gz:
Publisher:
release.yml on pkulijing/daobidao
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
daobidao-1.0.0.tar.gz -
Subject digest:
be13603864d54e671ea6c4dd603d14affbac78aed21844a66ea635ae3761e4d6 - Sigstore transparency entry: 1377565086
- Sigstore integration time:
-
Permalink:
pkulijing/daobidao@5aa5969d88c84724b76df59fed8999b56d4496c1 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/pkulijing
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5aa5969d88c84724b76df59fed8999b56d4496c1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file daobidao-1.0.0-py3-none-any.whl.
File metadata
- Download URL: daobidao-1.0.0-py3-none-any.whl
- Upload date:
- Size: 193.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11a13486c5c7bf7a4b0d5d9cb3347cb00f30dd42c8ca8262c02f3161e917d9a4
|
|
| MD5 |
552a6b8bbe9eccdfeeeeb6be7258195d
|
|
| BLAKE2b-256 |
429145d66e4c78c3be09312284e774364bdcdeeb387b1d50697327ede71c7e9a
|
Provenance
The following attestation bundles were made for daobidao-1.0.0-py3-none-any.whl:
Publisher:
release.yml on pkulijing/daobidao
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
daobidao-1.0.0-py3-none-any.whl -
Subject digest:
11a13486c5c7bf7a4b0d5d9cb3347cb00f30dd42c8ca8262c02f3161e917d9a4 - Sigstore transparency entry: 1377565224
- Sigstore integration time:
-
Permalink:
pkulijing/daobidao@5aa5969d88c84724b76df59fed8999b56d4496c1 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/pkulijing
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5aa5969d88c84724b76df59fed8999b56d4496c1 -
Trigger Event:
push
-
Statement type: