Skip to main content

Voice dictation for macOS. Hold Option, speak, release. Text appears at your cursor.

Project description

voicekey

Open-source voice dictation for macOS.
Hold Option. Speak. Release. Text appears at your cursor.

PyPI CI License Python macOS


No subscription, no third-party servers. Your API key, one hotkey, and you're dictating.

Uses OpenAI's gpt-4o-mini-transcribe by default, with a pluggable provider system so you can swap in other transcription backends. Audio goes straight from your mic to the API over HTTPS, text comes back in under a second, gets pasted at your cursor. Costs about $0.003/minute with OpenAI.


Why this exists

Wispr Flow is good, but lots of companies (including OpenAI itself) block third-party services that process audio. If your employer won't let you install Wispr, or you'd rather not pay $10/month for dictation, this does the same thing with your own API key.


How it compares

voicekey Wispr Flow macOS Dictation whisper.cpp
Cost ~$0.003/min (API) $10/month Free Free (local)
Privacy Your API key, direct to provider Audio goes to Wispr servers Audio goes to Apple servers Fully local
Allowed at work Yes (your own key) Often blocked by IT Usually allowed Yes
Accuracy GPT-4o-mini-transcribe Proprietary Apple ML Open-source Whisper
Latency Sub-second Sub-second ~1-2s Depends on hardware
Paste at cursor Yes Yes Yes No (manual copy)
Works in any app Yes Yes Yes No (terminal only)
Open source Yes No No Yes
Setup time 2 minutes Account + install Built-in Compile from source

How it works

Hold Option → Recording → Release Option → Transcribe → Paste at cursor
  1. Hold the Option key. A red dot appears on screen, recording starts.
  2. Talk.
  3. Release Option. Audio gets sent to the transcription API, text streams back and gets typed wherever your cursor is.
  4. Your clipboard is untouched. It's saved before the paste and restored after.

Install

Prerequisites

  • macOS 13+
  • Python 3.11+
  • PortAudio: brew install portaudio

pipx (recommended)

brew install portaudio
pipx install voicekey
voicekey setup

From source with uv

brew install portaudio
git clone https://github.com/adamkhakhar/voicekey.git
cd voicekey
uv sync
uv run voicekey setup

Setup will ask for your API key (goes into macOS Keychain, never touches disk) and walk you through Accessibility and Microphone permissions.

Run

voicekey          # if installed via pipx
uv run voicekey   # if installed via uv

A mic icon shows up in your menu bar. Hold Option to dictate.


Permissions

macOS needs two permissions granted:

Permission Why How to grant
Accessibility Global hotkey detection + simulating Cmd+V to paste System Settings → Privacy & Security → Accessibility → add your terminal
Microphone Recording audio Dialog pops up on first use

voicekey setup checks both and opens the right Settings pane if either is missing.


Configuration

Config file is at ~/.config/voicekey/config.toml.

voicekey config                       # view all settings
voicekey config language en           # set language (ISO 639-1)
voicekey config model gpt-4o-transcribe   # higher-accuracy model (~$0.006/min)
voicekey config hotkey left_option    # trigger on left Option only
Key Default Options
provider openai openai (more coming)
model gpt-4o-mini-transcribe gpt-4o-mini-transcribe, gpt-4o-transcribe
hotkey option (either) option, left_option, right_option
language "" (auto-detect) Any ISO 639-1 code

Architecture

┌─────────────────────────────────────────────────────┐
│  Main thread (NSApp run loop)                       │
│  ┌──────────┐  ┌─────────┐  ┌────────────────────┐ │
│  │ CGEvent  │→ │ rumps   │  │ Overlay (NSWindow) │ │
│  │ tap      │  │ menubar │  │ red dot indicator  │ │
│  └──────────┘  └─────────┘  └────────────────────┘ │
├─────────────────────────────────────────────────────┤
│  Audio thread (sounddevice callback)                │
│  24kHz mono int16 PCM → WAV encoding                │
├─────────────────────────────────────────────────────┤
│  Transcription thread (per utterance)               │
│  Provider.transcribe() → paste at cursor            │
└─────────────────────────────────────────────────────┘

State machine: IDLE → RECORDING → TRANSCRIBING → INSERTING → IDLE

Providers are pluggable. The providers/ package defines a Protocol that any transcription backend can implement. OpenAI is the default. To add a new provider, implement transcribe() in a new module and register it.

There's a 200ms debounce on the Option key so it doesn't fire when you're typing special characters (Option+E for accents, etc).


Security and privacy

Your API key is stored in the macOS Keychain through the keyring library, not in a config file. Audio goes directly to your chosen provider's API over HTTPS with no middleman. There's no telemetry, no analytics, nothing phoning home. The clipboard gets saved before pasting transcribed text and restored right after. The whole thing is ~900 lines of Python you can read in 20 minutes.


Troubleshooting

"Failed to create event tap" - Your terminal doesn't have Accessibility permission. Add it in System Settings → Privacy & Security → Accessibility, then restart voicekey.

No audio captured - Check that your terminal has Microphone permission in System Settings → Privacy & Security → Microphone.

Text not appearing at cursor - A few apps with custom input handling don't respond to simulated Cmd+V. Most standard apps work.

PortAudio errors on install - brew install portaudio first.


Contributing

Contributions are welcome. Open an issue first if you want to discuss a change.

git clone https://github.com/adamkhakhar/voicekey.git
cd voicekey
uv sync
uv run pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicekey-0.1.0.tar.gz (50.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicekey-0.1.0-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file voicekey-0.1.0.tar.gz.

File metadata

  • Download URL: voicekey-0.1.0.tar.gz
  • Upload date:
  • Size: 50.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for voicekey-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ebf694676a759fd92e51cf8aabe1498e5dbc36901bd07872517839da8f20fd77
MD5 71a86a3ddd4452df6e5cee3d10b6a31f
BLAKE2b-256 3840b3a3c453cf7cb4e76535d651fb971886f43ff67f8af521062d8d6924a262

See more details on using hashes here.

File details

Details for the file voicekey-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: voicekey-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for voicekey-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 103114be9fef5aeed641f9171dce36d18958158637cae97837e9c088aa7d5ca9
MD5 2e7bedae587c70d670a6b289cfad22a3
BLAKE2b-256 c87a99170a2bdc8a1f704b7f6b8d8bd7748ee6e5ba70137c8d073bd1c428159c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page