Skip to main content

Voice dictation daemon using NVIDIA Parakeet on Apple Silicon

Project description

🦜 Wordbird

CI PyPI

Contextual voice dictation for macOS. Powered by NVIDIA Parakeet running locally on Apple Silicon via MLX.

Press a hotkey, speak, and your words are transcribed and pasted into whatever app is focused. A small LLM post-processes the transcription to fix errors, using project-specific context from a WORDBIRD.md file.

demo video

Getting started

Requires macOS on Apple Silicon (M1+) and Python 3.10+.

# Run with uvx (no install needed)
uvx wordbird

# Or run in the background
uvx wordbird start
uvx wordbird stop
uvx wordbird status

Context-aware correction

You can improve transcription with a WORDBIRD.md file which lists project-specific terms that may be misheard.

Either create a standard template, or ask Claude to analyze your project and create the file for you.

uvx wordbird init
# or
uvx wordbird init --claude # uses haiku by default; you can specify model via --claude {haiku,sonnet,opus}

Context detection works with:

  • Terminal.app — detects the focused tab's shell working directory
  • VS Code / VS Code Insiders — via the Wordbird extension, which works with local and remote (SSH) workspaces
  • Zed - detects the focused window's project directory out of the box, no extension needed

Transcription and pasting work in any app.

A WORDBIRD.md file looks like this:

---
transcription_model: mlx-community/parakeet-tdt-0.6b-v2
fix_model: mlx-community/Qwen2.5-1.5B-Instruct-4bit
---

{# Your correction prompt and examples here #}

{# Key terms: MyApp, some_function, PostgreSQL #}
{# Names: Alice, Bob #}
{# Misheard words: "bird word" should be "Birdword" #}

Input: "{{ transcript }}"
Output:

The file is a Jinja template. {{ transcript }} is replaced with the raw transcription. The YAML front matter lets you override models per-project.

Hotkey

Action Default
Toggle recording Right ⌘ + Space
Transcribe and submit Right ⌘ + Return (opt-in)

The submit shortcut transcribes, pastes, and presses Return — useful for chat and terminal workflows. Enable it in the dashboard settings.

Configurable via CLI flags or the dashboard settings:

--modifier-key KEY   Modifier key (default: rcmd). Options: rcmd, lcmd, ralt, lalt, rshift, lshift, rctrl, lctrl, fn
--toggle-key KEY     Toggle key (default: space). Options: space, return, tab, escape

Options

--model MODEL        Transcription model (default: mlx-community/parakeet-tdt-0.6b-v2)
--fix-model MODEL    Post-processor model (default: mlx-community/Qwen2.5-1.5B-Instruct-4bit)
--no-fix             Disable LLM post-processing
--no-server          Don't spawn the API server (run it separately)

Dashboard

Wordbird runs a local web dashboard (default localhost:7870). Click the bird in the menu bar → Dashboard… to open it.

  • History — browse transcriptions with timestamps, app name, working directory, and duration. See both original and corrected text.
  • Settings — configure hotkey, models, and post-processing. Changes take effect within seconds.
  • Stats — words dictated, recording time, WPM, session count.
uvx wordbird history        # view history from the CLI
uvx wordbird config         # show or create the config file

Data

All data is stored in ~/.wordbird/:

File Purpose
wordbird.toml User configuration
wordbird.db Transcription history (SQLite)
server.json Server port discovery
wordbird.pid Singleton lock
wordbird.log Background mode logs

Menu bar

Wordbird shows a bird icon in the menu bar:

  • White — idle
  • 🟡 Yellow — connecting mic
  • 🔴 Red — listening
  • Sparkles — transcribing

Permissions

Wordbird needs three macOS permissions, granted to your terminal app:

  • 🎤 Microphone — to record your voice
  • 🔐 Accessibility — to paste text
  • ⌨️ Input Monitoring — to detect the global hotkey

Wordbird checks these on startup and tells you what's missing.

Architecture

Wordbird runs as two sibling processes managed by a thin CLI:

  • Server (wordbird-server) — FastAPI app handling transcription, post-processing, history, config, and serving the React dashboard
  • Daemon (wordbird-daemon) — macOS-native process handling hotkeys, mic recording, overlay HUD, menu bar, and clipboard pasting

The daemon sends recorded audio to the server via HTTP. The server runs ML inference in a thread pool so the dashboard stays responsive during transcription.

uvx wordbird          # starts both (recommended)
uvx wordbird-server   # just the API server
uvx wordbird-daemon   # just the daemon (expects server running)

Development

make backend-dev      # API server with hot reload
make daemon-dev       # daemon only (expects server running)
make frontend-dev     # Vite dev server with API proxy
make dev              # backend + frontend + daemon (all three)
make wordbird         # build frontend + run everything
make backend-test     # run pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordbird-0.9.1.tar.gz (712.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wordbird-0.9.1-py3-none-any.whl (259.9 kB view details)

Uploaded Python 3

File details

Details for the file wordbird-0.9.1.tar.gz.

File metadata

  • Download URL: wordbird-0.9.1.tar.gz
  • Upload date:
  • Size: 712.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for wordbird-0.9.1.tar.gz
Algorithm Hash digest
SHA256 fc00444aeb6da3117de0032d561b53b62e6f71b18300b5935dbbbfdb046ba68b
MD5 7311bfb8abf18d9f292c75536d865104
BLAKE2b-256 f403a9d26a8aa6cce4901c581af5287416eb3db1b78dd78de07be01235a5f0ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for wordbird-0.9.1.tar.gz:

Publisher: main.yaml on tillahoffmann/wordbird

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file wordbird-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: wordbird-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 259.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for wordbird-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e57d191b6b4124c00e4f0f6e94c04e057b8fee05ceb60579fbda4cbb279ac505
MD5 6d747fb6f296c90d8d52946d041634c3
BLAKE2b-256 25d211ebc219647e1c8c4ca15d82cd44a68663728f5a5618d560ea1ccfcf4e16

See more details on using hashes here.

Provenance

The following attestation bundles were made for wordbird-0.9.1-py3-none-any.whl:

Publisher: main.yaml on tillahoffmann/wordbird

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page