Skip to main content

Push-to-Talk Voice Dictation menu bar app for macOS using Apple Silicon MLX

Project description

Dictate

Push-to-talk voice dictation for macOS. Runs 100% on-device using Apple Silicon MLX models. No cloud, no API keys, no subscriptions.

Hold a key, speak, release — clean text appears wherever your cursor is.

Install

pip install dictate-mlx
dictate

That's it. Dictate launches in the background and appears in your menu bar. Close the terminal — it keeps running. Quit from the menu bar icon.

macOS will prompt for Accessibility and Microphone permissions on first run. Models download automatically in the background (~2-4GB total, cached in ~/.cache/huggingface/).

Install from source

git clone https://github.com/0xbrando/dictate.git
cd dictate
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
dictate

Requirements

  • macOS with Apple Silicon (any M-series chip)
  • Python 3.11+
  • ~4GB RAM minimum, ~6GB recommended

How It Works

Hold PTT → Speak → Release → Clean text pasted into active window

Under the hood:

  1. Push-to-talk captures audio via the microphone
  2. VAD segments speech from silence
  3. STT transcribes locally (Whisper or Parakeet)
  4. Smart skip detects clean short phrases and skips cleanup entirely
  5. LLM fixes grammar, punctuation, and formatting
  6. Auto-paste puts the result wherever your cursor is

Everything runs locally. Nothing leaves your machine.

Controls

Action Key
Record Hold Left Control
Lock recording (hands-free) Press Space while holding PTT
Stop locked recording Press PTT again

The PTT key is configurable from the menu bar: Left Control, Right Control, Right Command, or either Option key.

STT Engines

Both engines are included. Switch anytime from the menu bar.

Engine Speed Languages Notes
Parakeet TDT 0.6B ~50ms English Default. 4-8x faster than Whisper
Whisper Large V3 Turbo ~300ms 99+ Best for multilingual or non-English

Parakeet is the default for speed. Switch to Whisper from the menu bar if you need non-English STT.

Writing Styles

Style What it does
Clean Up Fixes punctuation and capitalization — keeps your words
Formal Rewrites in a professional tone
Bullet Points Distills your dictation into concise key points

Quality Presets

Preset Speed RAM Best for
Smart ~250ms 0 Auto-routes: fast local for short, API server for long
Speedy (1.5B) ~120ms 1GB Quick fixes, great for any chip
Fast (3B) ~250ms 2GB Quick cleanup, everyday use
Balanced (7B) ~350ms 5GB Longer dictation, formal rewriting
Quality (14B) ~500ms 9GB Best accuracy for bullet points and rewrites

Times measured on M3 Ultra. The app picks the best default for your chip — Ultra/Max get 3B, everything else gets 1.5B.

The Quality menu only shows models you've downloaded. To add a larger model:

python -c "from mlx_lm import load; load('mlx-community/Qwen2.5-7B-Instruct-4bit')"

Menu Bar

All settings accessible from the waveform icon in your menu bar:

Main menu:

  • Writing Style — Clean Up, Formal, or Bullet Points
  • Quality — model size (shows only downloaded models)
  • Input Device — select microphone
  • Recent — last 10 transcriptions, click to re-paste

Advanced settings:

  • STT Engine — Whisper or Parakeet
  • PTT Key — choose your push-to-talk modifier
  • Languages — input and output language (12 languages for translation)
  • Sounds — 6 tones or silent
  • LLM Endpoint — configure API server
  • LLM Cleanup — toggle on/off
  • Personal Dictionary — names, brands, technical terms always spelled correctly
  • Launch at Login — auto-start on boot

API Server Setup

If you run a local LLM server, Dictate can use it instead of loading its own model — zero additional RAM:

DICTATE_LLM_BACKEND=api DICTATE_LLM_API_URL=http://localhost:8005/v1/chat/completions dictate

Works with any OpenAI-compatible server: vllm-mlx, LM Studio, Ollama.

Smart Routing

The Smart preset auto-routes based on message length:

  • Short (15 words or fewer) → fast local model (~120ms)
  • Long (16+ words) → your API server for higher quality

Environment Variables

Variable Description Default
DICTATE_AUDIO_DEVICE Microphone device index System default
DICTATE_OUTPUT_MODE type or clipboard type
DICTATE_INPUT_LANGUAGE auto, en, ja, ko, etc. auto
DICTATE_OUTPUT_LANGUAGE Translation target (auto = same) auto
DICTATE_LLM_CLEANUP Enable LLM text cleanup true
DICTATE_LLM_MODEL qwen-1.5b, qwen, qwen-7b, qwen-14b qwen
DICTATE_LLM_BACKEND local or api local
DICTATE_LLM_API_URL OpenAI-compatible endpoint http://localhost:8005/v1/chat/completions
DICTATE_ALLOW_REMOTE_API Allow non-localhost API URLs unset

Agent Integration

Dictate works well as a voice input layer for AI assistants and agent frameworks. If you're building with tools like Claude Code, OpenClaw, or similar — Dictate gives your setup a local, private voice interface with zero cloud dependency.

Debugging

Run in the foreground to see logs:

dictate --foreground

Or check the background log:

tail -f ~/Library/Logs/Dictate/dictate.log

Security

  • All processing is local. Audio and text never leave your machine.
  • LLM endpoints are restricted to localhost by default. Set DICTATE_ALLOW_REMOTE_API=1 to override.
  • Preferences stored with 0o600 permissions (owner-only read/write).
  • No API keys, tokens, or accounts required.

License

MIT — See LICENSES.md for dependency licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dictate_mlx-2.2.2.tar.gz (63.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dictate_mlx-2.2.2-py3-none-any.whl (43.3 kB view details)

Uploaded Python 3

File details

Details for the file dictate_mlx-2.2.2.tar.gz.

File metadata

  • Download URL: dictate_mlx-2.2.2.tar.gz
  • Upload date:
  • Size: 63.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dictate_mlx-2.2.2.tar.gz
Algorithm Hash digest
SHA256 5732cdac316609ab157b63d5ac665fdcdb1b0ae241d996c117300300be361206
MD5 2ad8110897d25aa296e142121a30994f
BLAKE2b-256 e959a1fa9ff61dc15123f48eaf330eb7ba5bcf5e5c4e1746dba04d8a7618ad7c

See more details on using hashes here.

File details

Details for the file dictate_mlx-2.2.2-py3-none-any.whl.

File metadata

  • Download URL: dictate_mlx-2.2.2-py3-none-any.whl
  • Upload date:
  • Size: 43.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dictate_mlx-2.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6599002a33f7fbef00a102f1d4c5935a2437b02624bab94a2e099491612f10ab
MD5 cdd64a2333004536ca211d785bfc2c09
BLAKE2b-256 774c884cde3fbea68b53e2cd77b41e77f0153b09ad6085f7e4756db77961b0a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page