65ms voice-to-text on Apple's Neural Engine. Push-to-talk dictation for macOS — 100% local, free, open source.
Project description
Push-to-talk voice dictation that runs entirely on your Mac.
No cloud. No API keys. No subscriptions.
Hold a key → Speak → Release → Clean text appears wherever your cursor is.
Why Dictate?
- 65ms voice-to-text on Apple's Neural Engine — faster than a keystroke
- Zero GPU RAM for STT — the Neural Engine has its own dedicated memory
- 100% local — audio and text never leave your Mac
- Free and open source — no subscriptions, no API keys, no accounts
- LLM text cleanup — local model fixes grammar and punctuation automatically
- 12 languages — real-time translation between any supported pair
Your M-series Mac has a 16-core Neural Engine doing nothing. Dictate puts it to work.
Install
pip install dictate-mlx
dictate
That's it. Dictate launches in the background and appears in your menu bar. Close the terminal — it keeps running.
macOS will prompt for Accessibility and Microphone permissions on first run. Models download automatically (~1-3GB depending on preset, cached in ~/.cache/huggingface/).
Install from source
git clone https://github.com/0xbrando/dictate.git
cd dictate
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
dictate
Requirements
- macOS with Apple Silicon (any M-series chip)
- Python 3.11+
- ~3GB RAM with ANE (STT runs on Neural Engine, only LLM needs GPU memory)
Features
🎙️ Push-to-Talk
Hold a key, speak, release. Text appears wherever your cursor is.
| Action | Key |
|---|---|
| Record | Hold Left Control |
| Lock recording (hands-free) | Press Space while holding PTT |
| Stop locked recording | Press PTT again |
The PTT key is configurable: Left Control, Right Control, Right Command, or either Option key.
🧠 LLM Text Cleanup
The thing that sets Dictate apart. Most dictation tools give you raw transcription. Dictate pipes through a local LLM that fixes grammar, adds punctuation, and formats properly.
Short phrases (≤15 words) skip cleanup for instant speed. Longer dictation gets the full treatment.
🗣️ Three STT Engines
All included. Switch anytime from the menu bar.
| Engine | Speed | Languages | Notes |
|---|---|---|---|
| ANE (Neural Engine) | ~65ms | 25 | Default — runs on Apple Neural Engine, frees GPU for LLM |
| Parakeet TDT 0.6B | ~50ms | 25 | Runs on GPU via MLX |
| Whisper Large V3 Turbo | ~300ms | 99+ | Auto-selected for Japanese, Chinese, Korean |
ANE is the default. It runs speech recognition on Apple's Neural Engine — a dedicated chip that sits idle during most tasks. This frees the GPU entirely for LLM text cleanup, so STT and LLM run concurrently with zero contention. The result: 65-106ms transcription on real speech.
When you select a language not supported by ANE/Parakeet (Japanese, Chinese, Korean), Dictate automatically switches to Whisper. Switch back to a supported language and it returns to ANE.
✍️ Writing Styles
| Style | What it does |
|---|---|
| Clean Up | Fixes punctuation and capitalization — keeps your words |
| Professional | Polished tone and grammar |
| Bullet Points | Rewrites as concise bullet points |
Toggle LLM cleanup off from the menu bar for raw transcription output.
🌐 Real-Time Translation
Speak in one language, get output in another. 12 languages supported: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese, Russian, Arabic, Hindi.
⚡ Quality Presets
| Preset | Speed | Size | Best for |
|---|---|---|---|
| Qwen2.5 1.5B | ~250ms | 950MB | Fast and reliable (default) |
| Qwen2.5 3B | ~400ms | 1.8GB | Best accuracy |
| API Server | varies | 0 | Use your own LLM server (LM Studio, Ollama, etc.) |
Short phrases (15 words or less) skip LLM cleanup entirely for instant output. The app picks the best default model for your chip.
End-to-End Pipeline
Full latency from voice → text on screen:
| Mode | GPU RAM | Latency |
|---|---|---|
| LLM off (raw transcription) | 0 | ~65ms |
| LLM on (Qwen2.5 1.5B) | ~950MB | ~315ms |
| LLM on (Qwen2.5 3B) | ~1.8GB | ~465ms |
With ANE, speech recognition runs on a dedicated chip with its own memory — zero GPU usage. Turn off LLM cleanup and the entire app uses no GPU RAM at all.
Menu Bar
Everything accessible from the waveform icon:
- Writing Style — Clean Up, Professional, Bullet Points
- Quality — Qwen2.5 1.5B or 3B (or API server)
- Input Device — select microphone
- Recent — last 10 transcriptions, click to re-paste
- STT Engine — ANE (default), Parakeet, or Whisper
- PTT Key — choose your push-to-talk modifier
- Languages — input and output language
- Sounds — 6 notification tones or silent
- Personal Dictionary — names, brands, technical terms always spelled correctly
- Launch at Login — auto-start on boot
ANE Engine Setup
The ANE (Apple Neural Engine) engine is the default and recommended STT engine. It requires a small Swift binary that Dictate calls behind the scenes. If the binary isn't installed, Dictate falls back to Parakeet (GPU-based STT).
# Build from source (requires Xcode command line tools)
cd swift-stt
swift build -c release
# The binary lands at swift-stt/.build/release/dictate-stt
# Either add it to your PATH or leave it — Dictate finds it automatically
First run: CoreML models download automatically (~2.7GB) and compile for your chip. This takes 1-2 minutes the first time. After that, models are cached and transcription starts instantly.
Requirements: macOS 14+ (Sonoma or later), Apple Silicon.
What it does: The dictate-stt binary uses FluidAudio to run Parakeet speech recognition on the Neural Engine via CoreML. All processing is local — no network calls after the initial model download.
How it works
When you select ANE in the menu bar, Dictate calls the dictate-stt binary as a subprocess:
- Dictate records audio and saves it as a temporary WAV file
- Calls
dictate-stt transcribe /tmp/audio.wav - The Swift binary runs the audio through CoreML on the Neural Engine
- Returns JSON to stdout:
{"text": "Hello world", "duration_ms": 68} - Dictate parses the result and pipes it through LLM cleanup as usual
The binary is a standalone executable with no Python dependency. You can also use it directly:
dictate-stt check # Verify ANE is available
dictate-stt transcribe recording.wav # Transcribe a WAV file
API Server
If you run a local LLM server, Dictate can use it instead of loading its own model — zero additional RAM:
DICTATE_LLM_BACKEND=api DICTATE_LLM_API_URL=http://localhost:8005/v1/chat/completions dictate
Works with any OpenAI-compatible server: vllm-mlx, LM Studio, Ollama.
The Smart preset auto-routes by length: short phrases → fast local model (~120ms), longer dictation → your API server.
Environment Variables
All environment variables
| Variable | Description | Default |
|---|---|---|
DICTATE_AUDIO_DEVICE |
Microphone device index | System default |
DICTATE_OUTPUT_MODE |
type or clipboard |
type |
DICTATE_STT_ENGINE |
ane, parakeet, or whisper |
ane |
DICTATE_INPUT_LANGUAGE |
auto, en, ja, ko, etc. |
auto |
DICTATE_OUTPUT_LANGUAGE |
Translation target (auto = same) |
auto |
DICTATE_LLM_CLEANUP |
Enable LLM text cleanup | true |
DICTATE_LLM_MODEL |
qwen25-1.5b, qwen-3b |
qwen25-1.5b |
DICTATE_LLM_BACKEND |
local or api |
local |
DICTATE_LLM_API_URL |
OpenAI-compatible endpoint | http://localhost:8005/v1/chat/completions |
DICTATE_ALLOW_REMOTE_API |
Allow non-localhost API URLs | unset |
Agent Integration
Dictate works well as a voice input layer for AI assistants and agent frameworks. If you're building with tools like Claude Code, OpenClaw, or similar — Dictate gives your setup a local, private voice interface with zero cloud dependency.
CLI Commands
dictate # Launch in menu bar (backgrounds automatically)
dictate config # View all preferences
dictate config set writing_style professional
dictate config set quality fast
dictate config set ptt_key cmd_r
dictate config set stt whisper
dictate config reset # Reset to defaults
dictate stats # Show usage statistics
dictate status # System info and model status
dictate doctor # Run diagnostic checks (troubleshooting)
dictate devices # List audio input devices
dictate update # Update to latest version
dictate -f # Run in foreground (debug)
dictate -V # Show version
Config Keys
| Key | Values |
|---|---|
writing_style |
clean, professional, bullets |
quality |
api, fast, quality |
stt |
ane, parakeet, whisper |
input_language |
auto, en, ja, de, fr, es, ... |
output_language |
auto, en, ja, de, fr, es, ... |
ptt_key |
ctrl_l, ctrl_r, cmd_r, alt_l, alt_r |
llm_cleanup |
on, off |
sound |
soft_pop, chime, warm, click, marimba, simple |
llm_endpoint |
host:port (for API backend) |
Shell Completions
Tab completions for bash and zsh:
# Bash — add to ~/.bashrc
source /path/to/dictate/completions/dictate.bash
# Zsh — copy to fpath dir, then reload
cp completions/dictate.zsh ~/.zsh/completions/_dictate
autoload -Uz compinit && compinit
Completes commands, config keys, and all valid values.
Debugging
# Run in foreground with logs
dictate --foreground
# Check background logs
tail -f ~/Library/Logs/Dictate/dictate.log
Security
- All processing is local. Audio and text never leave your machine.
- The ANE engine's
dictate-sttbinary is open source Swift code you build yourself fromswift-stt/. CoreML models download from Hugging Face on first run, then everything is cached locally. - LLM endpoints restricted to localhost by default (
DICTATE_ALLOW_REMOTE_API=1to override). - Preferences stored with
0o600permissions (owner-only). - No API keys, tokens, or accounts required.
Contributing
Issues and PRs welcome. Run the test suite before submitting:
python -m pytest tests/ -q
See CONTRIBUTING.md for guidelines.
License
MIT — See LICENSES.md for dependency licenses.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dictate_mlx-2.5.0.tar.gz.
File metadata
- Download URL: dictate_mlx-2.5.0.tar.gz
- Upload date:
- Size: 132.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15b91d58cc2b52207e69a927d93adff88cf4280195e60acc2dd41e7a218a2bbe
|
|
| MD5 |
a2ec34120008ee41bf728c009b6e08eb
|
|
| BLAKE2b-256 |
b6bbcd2ef2d5d90b8eaf249c42954f4b01370a048bc4e671236f893867dce387
|
File details
Details for the file dictate_mlx-2.5.0-py3-none-any.whl.
File metadata
- Download URL: dictate_mlx-2.5.0-py3-none-any.whl
- Upload date:
- Size: 62.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ccbad7fd4bfc6cc418a58bb13aaa3ac1305dedf79bd7893d2d3471580aea1b6d
|
|
| MD5 |
4198495e911a60b66510d121329260b6
|
|
| BLAKE2b-256 |
68f680b76d51d768f6be73fa69c8f5176a66de04a558279b7c2fb807fa10161d
|