Skip to main content

Voice-first control for AI coding agents

Project description

dictare icon

DICTAre

Voice layer for AI coding agents.

Speak to your agent. No window focus required. 100% local.

MIT License CI

dictare.io · OpenVIP™ Protocol

Watch the video

If you want to know how a poker game turned into a voice interaction system for coding agents... watch this →


Why dictare

Most voice tools (Wispr Flow, Superwhisper, etc.) simulate keystrokes — they type into whatever window has focus. Switch to your browser and your code gets your voice.

Dictare uses a protocol. Your agent listens via SSE and receives transcriptions regardless of window focus. Your coding agent can be behind 3 other windows — it still gets your words.

Features

  • No focus required — agent receives voice even when its window is in the background
  • Agent-native — transcriptions go to the agent protocol, not a text field
  • 100% local — STT runs on-device, zero data leaves your machine
  • Multi-agent — switch agents by voice: "agent coding", "agent review"
  • Open protocolOpenVIP — any tool can implement the SSE endpoint
  • Bidirectional — STT (voice in) + TTS (voice out)

Install

macOSfull guide

brew install dragfly/tap/dictare

Linuxfull guide

curl -fsSL https://raw.githubusercontent.com/dragfly/dictare/main/install.sh | bash
sudo usermod -aG input $USER   # required for hotkey (log out/in after)

Permissions

macOS — grant when prompted:

  1. Microphone — prompted on first launch
  2. Input Monitoring — System Settings → Privacy & Security → enable Dictare
  3. Accessibility — needed for keyboard mode (typing into other apps)

After granting all three: dictare service restart

Linux — two steps:

  1. Input group (hotkey, X11 + Wayland): sudo usermod -aG input $USER — log out/in
  2. ydotool (keyboard mode on Wayland): sudo apt install ydotool

Quick Start

dictare agent freddie       # starts the default profile (Claude Code)

That's it. The service starts automatically. Speak — your agent receives the transcription.

If you prefer a different coding agent:

dictare agent ozzy --profile codex      # OpenAI Codex
dictare agent gilmour --profile gemini  # Google Gemini CLI
dictare agent bowie --profile aider     # Aider

How It Works

  Microphone
      │
      ▼
  STT Module       Whisper (MLX / CTranslate2) or Parakeet (ONNX)
      │             all local, zero cold-start
      ▼
  Pipeline         submit detection, mute control, agent switching
      │
      ▼
  OpenVIP          HTTP / SSE — open protocol
      │
      ▼
  Agent            receives transcription, no window focus needed

The engine runs as a background service (launchd on macOS, systemd on Linux). STT models are preloaded at startup. Each agent connects in its own terminal.

Agent Profiles

Profiles are predefined in ~/.config/dictare/config.toml:

[agent_profiles]
default = "claude"

[agent_profiles.claude]
command = ["claude"]
description = "Claude Code"

[agent_profiles.codex]
command = ["codex"]
description = "OpenAI Codex"

[agent_profiles.pi]
command = ["pi", "--provider", "ollama", "--model", "qwen3:8b"]
continue_args = ["-c"]
description = "Pi + Ollama local, agentic with tools"

Then connect:

dictare agent freddie                      # default profile (claude)
dictare agent ozzy --profile codex         # use codex profile
dictare agent -- claude --model opus       # explicit command override

Voice Commands

Say Action
"ok, submit" / "ok, send" / "ok, invia" / "ja, senden" Submit to agent (Enter)
"ok, mute" / "ok, hold on" Mute (stop listening)
"ok, listen" / "ok, listen up" Unmute (resume listening)
"agent coding" / "agent review" Switch active agent

Submit triggers are multilingual (en, de, es, it, fr) and fully configurable.

Hotkey Cheat Sheet

Default hotkey: Right ⌘ (macOS) / Scroll Lock (Linux).

Gesture Action
Single tap Toggle listening on/off
Double tap Submit (send Enter to agent)
Right Alt + hotkey Switch mode: agents ↔ keyboard

Service Management

dictare service install     # Install + enable (auto-starts at login)
dictare service start       # Start the service
dictare service stop        # Stop the service
dictare service restart     # Restart the service
dictare service status      # Show service and engine status
dictare service logs        # View recent logs
dictare service uninstall   # Remove the service

Keyboard Mode

No agent? Use dictare as a dictation tool — voice to keystrokes in any app.

dictare config set output.mode keyboard

Hotkey to toggle listening (configurable):

  • macOS: Right ⌘ by default
  • Linux: Scroll Lock by default
dictare config set hotkey.key KEY_RIGHTALT   # change hotkey

Text-to-Speech

dictare speak "Hello world"
dictare speak --engine piper "Hello"
echo "Hello" | dictare speak

Engines: espeak, say (macOS), piper, kokoro

Configuration

dictare config edit           # Open config in editor
dictare config list           # Show all settings
dictare config get stt.model
dictare config set stt.language it

Full configuration reference at dictare.io/docs/configuration.

Development

git clone https://github.com/dragfly/dictare && cd dictare

# macOS Apple Silicon (MLX GPU acceleration)
uv sync --python 3.11 --extra mlx

# macOS Intel / Linux
uv sync --python 3.11

# Run engine in foreground
uv run --python 3.11 dictare serve

# Tests
uv run --python 3.11 pytest tests/ -x

# Tests (parallel)
uv run --python 3.11 pytest tests/ -x -n auto

Ghostty users: add keybind = shift+enter=text:\n to config. See TERMINAL_COMPATIBILITY.md.

Protocol

dictare is the reference implementation of OpenVIP — an open protocol for voice input to AI agents. Any tool can implement the SSE endpoint and receive voice transcriptions from dictare.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dictare-0.2.5rc3.tar.gz (2.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dictare-0.2.5rc3-py3-none-any.whl (1.9 MB view details)

Uploaded Python 3

File details

Details for the file dictare-0.2.5rc3.tar.gz.

File metadata

  • Download URL: dictare-0.2.5rc3.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dictare-0.2.5rc3.tar.gz
Algorithm Hash digest
SHA256 17bf4fe2d03820f8033f766cc6064e5b2e36ce424a889bd792ba2939058fa251
MD5 18c6052f3f8459f0d8197ca6f8ebe266
BLAKE2b-256 5fbd24e6edee8409ae0cf62ff83f77b9c49de67d76b1169f2406c68b4815421f

See more details on using hashes here.

Provenance

The following attestation bundles were made for dictare-0.2.5rc3.tar.gz:

Publisher: publish-pypi.yml on dragfly/dictare

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dictare-0.2.5rc3-py3-none-any.whl.

File metadata

  • Download URL: dictare-0.2.5rc3-py3-none-any.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dictare-0.2.5rc3-py3-none-any.whl
Algorithm Hash digest
SHA256 abec2aa046291a2a56b12379c317bfddcc54e31037fae18130f37ff09bf48bad
MD5 b6c6444449978ee761e2411db9382fc6
BLAKE2b-256 9e5d128160d64b6d83097aca7f01dc36a067fe1e007188ae8967c9e2dabc7783

See more details on using hashes here.

Provenance

The following attestation bundles were made for dictare-0.2.5rc3-py3-none-any.whl:

Publisher: publish-pypi.yml on dragfly/dictare

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page