Skip to main content

Push-to-Talk Voice Dictation menu bar app for macOS using Apple Silicon MLX

Project description

Dictate

Push-to-talk voice dictation for macOS. Runs 100% on-device using Apple Silicon MLX models. No cloud, no API keys, no subscriptions.

Hold a key, speak, release — clean text appears wherever your cursor is.

Install

pip install dictate-mlx
dictate

That's it. Dictate launches in the background and appears in your menu bar. Close the terminal — it keeps running. Quit from the menu bar icon.

macOS will prompt for Accessibility and Microphone permissions on first run. Models download automatically in the background (~2-4GB total, cached in ~/.cache/huggingface/).

Install from source

git clone https://github.com/0xbrando/dictate.git
cd dictate
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
dictate

Requirements

  • macOS with Apple Silicon (any M-series chip)
  • Python 3.11+
  • ~4GB RAM minimum, ~6GB recommended

How It Works

Hold PTT → Speak → Release → Clean text pasted into active window

Under the hood:

  1. Push-to-talk captures audio via the microphone
  2. VAD segments speech from silence
  3. STT transcribes locally (Whisper or Parakeet)
  4. Smart skip detects clean short phrases and skips cleanup entirely
  5. LLM fixes grammar, punctuation, and formatting
  6. Auto-paste puts the result wherever your cursor is

Everything runs locally. Nothing leaves your machine.

Controls

Action Key
Record Hold Left Control
Lock recording (hands-free) Press Space while holding PTT
Stop locked recording Press PTT again

The PTT key is configurable from the menu bar: Left Control, Right Control, Right Command, or either Option key.

STT Engines

Both engines are included. Switch anytime from the menu bar.

Engine Speed Languages Notes
Parakeet TDT 0.6B ~50ms English Default. 4-8x faster than Whisper
Whisper Large V3 Turbo ~300ms 99+ Best for multilingual or non-English

Parakeet is the default for speed. Switch to Whisper from the menu bar if you need non-English STT.

Writing Styles

Style What it does
Clean Up Fixes punctuation and capitalization — keeps your words
Formal Rewrites in a professional tone
Bullet Points Distills your dictation into concise key points

Quality Presets

Preset Speed RAM Best for
Smart ~250ms 0 Auto-routes: fast local for short, API server for long
Speedy (1.5B) ~120ms 1GB Quick fixes, great for any chip
Fast (3B) ~250ms 2GB Quick cleanup, everyday use
Balanced (7B) ~350ms 5GB Longer dictation, formal rewriting
Quality (14B) ~500ms 9GB Best accuracy for bullet points and rewrites

Times measured on M3 Ultra. The app picks the best default for your chip — Ultra/Max get 3B, everything else gets 1.5B.

The Quality menu only shows models you've downloaded. To add a larger model:

python -c "from mlx_lm import load; load('mlx-community/Qwen2.5-7B-Instruct-4bit')"

Menu Bar

All settings accessible from the waveform icon in your menu bar:

Main menu:

  • Writing Style — Clean Up, Formal, or Bullet Points
  • Quality — model size (shows only downloaded models)
  • Input Device — select microphone
  • Recent — last 10 transcriptions, click to re-paste

Advanced settings:

  • STT Engine — Whisper or Parakeet
  • PTT Key — choose your push-to-talk modifier
  • Languages — input and output language (12 languages for translation)
  • Sounds — 6 tones or silent
  • LLM Endpoint — configure API server
  • LLM Cleanup — toggle on/off
  • Personal Dictionary — names, brands, technical terms always spelled correctly
  • Launch at Login — auto-start on boot

API Server Setup

If you run a local LLM server, Dictate can use it instead of loading its own model — zero additional RAM:

DICTATE_LLM_BACKEND=api DICTATE_LLM_API_URL=http://localhost:8005/v1/chat/completions dictate

Works with any OpenAI-compatible server: vllm-mlx, LM Studio, Ollama.

Smart Routing

The Smart preset auto-routes based on message length:

  • Short (15 words or fewer) → fast local model (~120ms)
  • Long (16+ words) → your API server for higher quality

Environment Variables

Variable Description Default
DICTATE_AUDIO_DEVICE Microphone device index System default
DICTATE_OUTPUT_MODE type or clipboard type
DICTATE_INPUT_LANGUAGE auto, en, ja, ko, etc. auto
DICTATE_OUTPUT_LANGUAGE Translation target (auto = same) auto
DICTATE_LLM_CLEANUP Enable LLM text cleanup true
DICTATE_LLM_MODEL qwen-1.5b, qwen, qwen-7b, qwen-14b qwen
DICTATE_LLM_BACKEND local or api local
DICTATE_LLM_API_URL OpenAI-compatible endpoint http://localhost:8005/v1/chat/completions
DICTATE_ALLOW_REMOTE_API Allow non-localhost API URLs unset

Agent Integration

Dictate works well as a voice input layer for AI assistants and agent frameworks. If you're building with tools like Claude Code, OpenClaw, or similar — Dictate gives your setup a local, private voice interface with zero cloud dependency.

Debugging

Run in the foreground to see logs:

dictate --foreground

Or check the background log:

tail -f ~/Library/Logs/Dictate/dictate.log

Security

  • All processing is local. Audio and text never leave your machine.
  • LLM endpoints are restricted to localhost by default. Set DICTATE_ALLOW_REMOTE_API=1 to override.
  • Preferences stored with 0o600 permissions (owner-only read/write).
  • No API keys, tokens, or accounts required.

License

MIT — See LICENSES.md for dependency licenses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dictate_mlx-2.2.1.tar.gz (63.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dictate_mlx-2.2.1-py3-none-any.whl (43.3 kB view details)

Uploaded Python 3

File details

Details for the file dictate_mlx-2.2.1.tar.gz.

File metadata

  • Download URL: dictate_mlx-2.2.1.tar.gz
  • Upload date:
  • Size: 63.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dictate_mlx-2.2.1.tar.gz
Algorithm Hash digest
SHA256 d7b64fa6288807e369d86e3aba024efd7d3843a6c62a52f41aa3f1a2db899bbc
MD5 113fcc5bba6251f91774fd377dbca576
BLAKE2b-256 de173e388dd5db180cdcb298e390184852a392cc8fb5770ff3c761ffc5d770f5

See more details on using hashes here.

File details

Details for the file dictate_mlx-2.2.1-py3-none-any.whl.

File metadata

  • Download URL: dictate_mlx-2.2.1-py3-none-any.whl
  • Upload date:
  • Size: 43.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for dictate_mlx-2.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2bf2f881e0c63a9713816f7f201589492da6c846b7e830e446bd742c5e3ba3f0
MD5 4db21b8a093d2d343c6c2d1523bb366b
BLAKE2b-256 ac521a0b99eda9e1ff65fcbe0c1f449ae96b2096488242c580633304376796ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page