Skip to main content

Push-to-Talk Voice Dictation menu bar app for macOS using Apple Silicon MLX

Project description

Dictate

Push-to-talk voice dictation for macOS. Runs 100% on-device using Apple Silicon MLX models. No cloud, no API keys, no subscriptions.

Hold a key, speak, release — clean text appears wherever your cursor is.

Dictate launch banner

Install

pip install dictate-mlx
dictate

That's it. Dictate launches in the background and appears in your menu bar. Close the terminal — it keeps running. Quit from the menu bar icon.

Dictate in the menu bar

macOS will prompt for Accessibility and Microphone permissions on first run. Models download automatically in the background (~2-4GB total, cached in ~/.cache/huggingface/).

Install from source

git clone https://github.com/0xbrando/dictate.git
cd dictate
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
dictate

Requirements

  • macOS with Apple Silicon (any M-series chip)
  • Python 3.11+
  • ~4GB RAM minimum, ~6GB recommended

How It Works

Hold PTT → Speak → Release → Clean text pasted into active window

Under the hood:

  1. Push-to-talk captures audio via the microphone
  2. VAD segments speech from silence
  3. STT transcribes locally (Whisper or Parakeet)
  4. Smart skip detects clean short phrases and skips cleanup entirely
  5. LLM fixes grammar, punctuation, and formatting
  6. Auto-paste puts the result wherever your cursor is

Everything runs locally. Nothing leaves your machine.

Controls

Action Key
Record Hold Left Control
Lock recording (hands-free) Press Space while holding PTT
Stop locked recording Press PTT again

The PTT key is configurable from the menu bar: Left Control, Right Control, Right Command, or either Option key.

STT Engines

Both engines are included. Switch anytime from the menu bar.

Engine Speed Languages Notes
Parakeet TDT 0.6B ~50ms English Default. 4-8x faster than Whisper
Whisper Large V3 Turbo ~300ms 99+ Best for multilingual or non-English

Parakeet is the default for speed. Switch to Whisper from the menu bar if you need non-English STT.

Writing Styles

Writing styles menu
Style What it does
Clean Up Fixes punctuation and capitalization — keeps your words
Formal Rewrites in a professional tone
Bullet Points Distills your dictation into concise key points

Quality Presets

Quality presets menu
Preset Speed RAM Best for
API Server varies 0 Use an external LLM server (LM Studio, Ollama, etc.)
Speedy (1.5B) ~120ms 1.0GB Quick fixes, great for any chip
Fast (3B) ~250ms 1.8GB Quick cleanup, everyday use
Balanced (7B) ~350ms 4.2GB Longer dictation, formal rewriting
Quality (14B) ~500ms 8.8GB Best accuracy for bullet points and rewrites

Smart routing auto-routes based on message length: short phrases go to the fast local model, longer dictation goes to your API server.

Times measured on M3 Ultra. The app picks the best default for your chip — Ultra/Max get 3B, everything else gets 1.5B.

The Quality menu only shows models you've downloaded. To add a larger model:

python -c "from mlx_lm import load; load('mlx-community/Qwen2.5-7B-Instruct-4bit')"

Menu Bar

Main menu Advanced settings

All settings accessible from the waveform icon in your menu bar:

Main menu:

  • Writing Style — Clean Up, Formal, or Bullet Points
  • Quality — model size (shows only downloaded models)
  • Input Device — select microphone
  • Recent — last 10 transcriptions, click to re-paste

Advanced settings:

  • STT Engine — Whisper or Parakeet
  • PTT Key — choose your push-to-talk modifier
  • Languages — input and output language (12 languages for translation)
  • Sounds — 6 tones or silent
  • LLM Endpoint — configure API server
  • LLM Cleanup — toggle on/off
  • Personal Dictionary — names, brands, technical terms always spelled correctly
  • Launch at Login — auto-start on boot

API Server Setup

If you run a local LLM server, Dictate can use it instead of loading its own model — zero additional RAM:

DICTATE_LLM_BACKEND=api DICTATE_LLM_API_URL=http://localhost:8005/v1/chat/completions dictate

Works with any OpenAI-compatible server: vllm-mlx, LM Studio, Ollama.

Smart Routing

The Smart preset auto-routes based on message length:

  • Short (15 words or fewer) → fast local model (~120ms)
  • Long (16+ words) → your API server for higher quality

Environment Variables

Variable Description Default
DICTATE_AUDIO_DEVICE Microphone device index System default
DICTATE_OUTPUT_MODE type or clipboard type
DICTATE_INPUT_LANGUAGE auto, en, ja, ko, etc. auto
DICTATE_OUTPUT_LANGUAGE Translation target (auto = same) auto
DICTATE_LLM_CLEANUP Enable LLM text cleanup true
DICTATE_LLM_MODEL qwen-1.5b, qwen, qwen-7b, qwen-14b qwen
DICTATE_LLM_BACKEND local or api local
DICTATE_LLM_API_URL OpenAI-compatible endpoint http://localhost:8005/v1/chat/completions
DICTATE_ALLOW_REMOTE_API Allow non-localhost API URLs unset

Agent Integration

Dictate works well as a voice input layer for AI assistants and agent frameworks. If you're building with tools like Claude Code, OpenClaw, or similar — Dictate gives your setup a local, private voice interface with zero cloud dependency.

Debugging

Run in the foreground to see logs:

dictate --foreground

Or check the background log:

tail -f ~/Library/Logs/Dictate/dictate.log

Security

  • All processing is local. Audio and text never leave your machine.
  • LLM endpoints are restricted to localhost by default. Set DICTATE_ALLOW_REMOTE_API=1 to override.
  • Preferences stored with 0o600 permissions (owner-only read/write).
  • No API keys, tokens, or accounts required.

License

MIT — See LICENSES.md for dependency licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dictate_mlx-2.4.1.tar.gz (66.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dictate_mlx-2.4.1-py3-none-any.whl (46.0 kB view details)

Uploaded Python 3

File details

Details for the file dictate_mlx-2.4.1.tar.gz.

File metadata

  • Download URL: dictate_mlx-2.4.1.tar.gz
  • Upload date:
  • Size: 66.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dictate_mlx-2.4.1.tar.gz
Algorithm Hash digest
SHA256 85b94c6a7821ef4a6f82e6afa35f11c69566db18ee4adaa9f89697326381db3c
MD5 30c390394be9b1763ce6bd599afe5dd6
BLAKE2b-256 072a295ec133366c07ef5417dc2d60a15845fc4a611ed379a4cb8de79764f444

See more details on using hashes here.

File details

Details for the file dictate_mlx-2.4.1-py3-none-any.whl.

File metadata

  • Download URL: dictate_mlx-2.4.1-py3-none-any.whl
  • Upload date:
  • Size: 46.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dictate_mlx-2.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1eb8942f902847c7f96755ad64ac803d2b880dd26ed36eb68a6befd6c36f498d
MD5 a13ea50088fa5fefbc2722e7a26bb351
BLAKE2b-256 c4db6a7cdc3ea56b5477791a74f36eff04c18551966392f8a225a59c97f7ae45

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page