Skip to main content

Voice-first control for AI coding agents

Reason this release was yanked:

critical bug

Project description

dictare icon

DICTAre

Voice layer for AI coding agents.

Speak to your agent. No window focus required. 100% local.

MIT License CI

dictare.io · OpenVIP™ Protocol

Watch the video

If you want to know how a poker game turned into a voice interaction system for coding agents... watch this →


Why dictare

Most voice tools (Wispr Flow, Superwhisper, etc.) simulate keystrokes — they type into whatever window has focus. Switch to your browser and your code gets your voice.

Dictare uses a protocol. Your agent listens via SSE and receives transcriptions regardless of window focus. Your coding agent can be behind 3 other windows — it still gets your words.

Features

  • No focus required — agent receives voice even when its window is in the background
  • Agent-native — transcriptions go to the agent protocol, not a text field
  • 100% local — STT runs on-device, zero data leaves your machine
  • Multi-agent — switch agents by voice: "agent coding", "agent review"
  • Open protocolOpenVIP — any tool can implement the SSE endpoint
  • Bidirectional — STT (voice in) + TTS (voice out)

Install

macOSfull guide

brew install dragfly/tap/dictare

Linuxfull guide

curl -fsSL https://raw.githubusercontent.com/dragfly/dictare/main/install.sh | bash
sudo usermod -aG input $USER   # required for hotkey (log out/in after)

Permissions

macOS — grant when prompted:

  1. Microphone — prompted on first launch
  2. Input Monitoring — System Settings → Privacy & Security → enable Dictare
  3. Accessibility — needed for keyboard mode (typing into other apps)

After granting all three: dictare service restart

Linux — two steps:

  1. Input group (hotkey, X11 + Wayland): sudo usermod -aG input $USER — log out/in
  2. ydotool (keyboard mode on Wayland): sudo apt install ydotool

Quick Start

dictare agent freddie       # starts the default profile (Claude Code)

That's it. The service starts automatically. Speak — your agent receives the transcription.

If you prefer a different coding agent:

dictare agent ozzy --profile codex      # OpenAI Codex
dictare agent gilmour --profile gemini  # Google Gemini CLI
dictare agent bowie --profile aider     # Aider

How It Works

  Microphone
      │
      ▼
  STT Module       Whisper (MLX / CTranslate2) or Parakeet (ONNX)
      │             all local, zero cold-start
      ▼
  Pipeline         submit detection, mute control, agent switching
      │
      ▼
  OpenVIP          HTTP / SSE — open protocol
      │
      ▼
  Agent            receives transcription, no window focus needed

The engine runs as a background service (launchd on macOS, systemd on Linux). STT models are preloaded at startup. Each agent connects in its own terminal.

Agent Profiles

Profiles are predefined in ~/.config/dictare/config.toml:

[agent_profiles]
default = "claude"

[agent_profiles.claude]
command = ["claude"]
description = "Claude Code"

[agent_profiles.codex]
command = ["codex"]
description = "OpenAI Codex"

[agent_profiles.pi]
command = ["pi", "--provider", "ollama", "--model", "qwen3:8b"]
continue_args = ["-c"]
description = "Pi + Ollama local, agentic with tools"

Then connect:

dictare agent freddie                      # default profile (claude)
dictare agent ozzy --profile codex         # use codex profile
dictare agent -- claude --model opus       # explicit command override

Voice Commands

Say Action
"ok, submit" / "ok, send" / "ok, invia" / "ja, senden" Submit to agent (Enter)
"ok, mute" / "ok, hold on" Mute (stop listening)
"ok, listen" / "ok, listen up" Unmute (resume listening)
"agent coding" / "agent review" Switch active agent

Submit triggers are multilingual (en, de, es, it, fr) and fully configurable.

Hotkey Cheat Sheet

Default hotkey: Right ⌘ (macOS) / Scroll Lock (Linux).

Gesture Action
Single tap Toggle listening on/off
Double tap Submit (send Enter to agent)
Right Alt + hotkey Switch mode: agents ↔ keyboard

Service Management

dictare service install     # Install + enable (auto-starts at login)
dictare service start       # Start the service
dictare service stop        # Stop the service
dictare service restart     # Restart the service
dictare service status      # Show service and engine status
dictare service logs        # View recent logs
dictare service uninstall   # Remove the service

Keyboard Mode

No agent? Use dictare as a dictation tool — voice to keystrokes in any app.

dictare config set output.mode keyboard

Hotkey to toggle listening (configurable):

  • macOS: Right ⌘ by default
  • Linux: Scroll Lock by default
dictare config set hotkey.key KEY_RIGHTALT   # change hotkey

Text-to-Speech

dictare speak "Hello world"
dictare speak --engine piper "Hello"
echo "Hello" | dictare speak

Engines: espeak, say (macOS), piper, kokoro

Configuration

dictare config edit           # Open config in editor
dictare config list           # Show all settings
dictare config get stt.model
dictare config set stt.language it

Full configuration reference at dictare.io/docs/configuration.

Development

git clone https://github.com/dragfly/dictare && cd dictare

# macOS Apple Silicon (MLX GPU acceleration)
uv sync --python 3.11 --extra mlx

# macOS Intel / Linux
uv sync --python 3.11

# Run engine in foreground
uv run --python 3.11 dictare serve

# Tests
uv run --python 3.11 pytest tests/ -x

# Tests (parallel)
uv run --python 3.11 pytest tests/ -x -n auto

Ghostty users: add keybind = shift+enter=text:\n to config. See TERMINAL_COMPATIBILITY.md.

Protocol

dictare is the reference implementation of OpenVIP — an open protocol for voice input to AI agents. Any tool can implement the SSE endpoint and receive voice transcriptions from dictare.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dictare-0.3.0.tar.gz (2.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dictare-0.3.0-py3-none-any.whl (1.9 MB view details)

Uploaded Python 3

File details

Details for the file dictare-0.3.0.tar.gz.

File metadata

  • Download URL: dictare-0.3.0.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dictare-0.3.0.tar.gz
Algorithm Hash digest
SHA256 9a56b945c92dfd54e35891a92a7f77f27bf952d8f8a988972488b817262559d9
MD5 825dbebd70973c0356f07fec02b7d597
BLAKE2b-256 4c67bae789525e22669ee207cfa1af315c8d9f0562623040acbeb16261fb3206

See more details on using hashes here.

Provenance

The following attestation bundles were made for dictare-0.3.0.tar.gz:

Publisher: publish-pypi.yml on dragfly/dictare

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dictare-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: dictare-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dictare-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 44feabde4728996fa3ebe55119a11714ec3c3d3298a2d94dd500d84f5fe9e826
MD5 6d194f2a5bd6055e2fee9a1a1a57c55d
BLAKE2b-256 2a8b7cb59f93b5c49a461dd9196d0650c206c7afeee614c02a0b44d38c988aa7

See more details on using hashes here.

Provenance

The following attestation bundles were made for dictare-0.3.0-py3-none-any.whl:

Publisher: publish-pypi.yml on dragfly/dictare

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page