Skip to main content

Voice dictation for macOS. Hold Option, speak, release. Text appears at your cursor.

Project description

voicekey

Open-source voice dictation for macOS.
Hold Option. Speak. Release. Text appears at your cursor.

PyPI CI License Python macOS


No subscription, no third-party servers. Your API key, one hotkey, and you're dictating.

Uses OpenAI's gpt-4o-mini-transcribe by default, with a pluggable provider system so you can swap in other transcription backends. Audio goes straight from your mic to the API over HTTPS, text comes back in under a second, gets pasted at your cursor. Costs about $0.003/minute with OpenAI.


Why this exists

Wispr Flow is good, but lots of companies (including OpenAI itself) block third-party services that process audio. If your employer won't let you install Wispr, or you'd rather not pay $10/month for dictation, this does the same thing with your own API key.


How it compares

voicekey Wispr Flow macOS Dictation whisper.cpp
Cost ~$0.003/min (API) $10/month Free Free (local)
Privacy Your API key, direct to provider Audio goes to Wispr servers Audio goes to Apple servers Fully local
Allowed at work Yes (your own key) Often blocked by IT Usually allowed Yes
Accuracy GPT-4o-mini-transcribe Proprietary Apple ML Open-source Whisper
Latency Sub-second Sub-second ~1-2s Depends on hardware
Paste at cursor Yes Yes Yes No (manual copy)
Works in any app Yes Yes Yes No (terminal only)
Open source Yes No No Yes
Setup time 2 minutes Account + install Built-in Compile from source

How it works

Hold Option → Recording → Release Option → Transcribe → Paste at cursor
  1. Hold the Option key. A red dot appears on screen, recording starts.
  2. Talk.
  3. Release Option. Audio gets sent to the transcription API, text streams back and gets typed wherever your cursor is.
  4. Your clipboard is untouched. It's saved before the paste and restored after.

Install

Prerequisites

  • macOS 13+
  • Python 3.11+
  • PortAudio: brew install portaudio

pipx (recommended)

brew install portaudio
pipx install voicekey
voicekey setup

From source with uv

brew install portaudio
git clone https://github.com/adamkhakhar/voicekey.git
cd voicekey
uv sync
uv run voicekey setup

Setup will ask for your API key (goes into macOS Keychain, never touches disk) and walk you through Accessibility and Microphone permissions.

Run

voicekey          # if installed via pipx
uv run voicekey   # if installed via uv

A mic icon shows up in your menu bar. Hold Option to dictate.


Permissions

macOS needs two permissions granted:

Permission Why How to grant
Accessibility Global hotkey detection + simulating Cmd+V to paste System Settings → Privacy & Security → Accessibility → add your terminal
Microphone Recording audio Dialog pops up on first use

voicekey setup checks both and opens the right Settings pane if either is missing.


Configuration

Config file is at ~/.config/voicekey/config.toml.

voicekey config                       # view all settings
voicekey config language en           # set language (ISO 639-1)
voicekey config model gpt-4o-transcribe   # higher-accuracy model (~$0.006/min)
voicekey config hotkey left_option    # trigger on left Option only
Key Default Options
provider openai openai (more coming)
model gpt-4o-mini-transcribe gpt-4o-mini-transcribe, gpt-4o-transcribe
hotkey option (either) option, left_option, right_option
language "" (auto-detect) Any ISO 639-1 code

Architecture

┌─────────────────────────────────────────────────────┐
│  Main thread (NSApp run loop)                       │
│  ┌──────────┐  ┌─────────┐  ┌────────────────────┐ │
│  │ CGEvent  │→ │ rumps   │  │ Overlay (NSWindow) │ │
│  │ tap      │  │ menubar │  │ red dot indicator  │ │
│  └──────────┘  └─────────┘  └────────────────────┘ │
├─────────────────────────────────────────────────────┤
│  Audio thread (sounddevice callback)                │
│  24kHz mono int16 PCM → WAV encoding                │
├─────────────────────────────────────────────────────┤
│  Transcription thread (per utterance)               │
│  Provider.transcribe() → paste at cursor            │
└─────────────────────────────────────────────────────┘

State machine: IDLE → RECORDING → TRANSCRIBING → INSERTING → IDLE

Providers are pluggable. The providers/ package defines a Protocol that any transcription backend can implement. OpenAI is the default. To add a new provider, implement transcribe() in a new module and register it.

There's a 200ms debounce on the Option key so it doesn't fire when you're typing special characters (Option+E for accents, etc).


Security and privacy

Your API key is stored in the macOS Keychain through the keyring library, not in a config file. Audio goes directly to your chosen provider's API over HTTPS with no middleman. There's no telemetry, no analytics, nothing phoning home. The clipboard gets saved before pasting transcribed text and restored right after. The whole thing is ~900 lines of Python you can read in 20 minutes.


Troubleshooting

"Failed to create event tap" - Your terminal doesn't have Accessibility permission. Add it in System Settings → Privacy & Security → Accessibility, then restart voicekey.

No audio captured - Check that your terminal has Microphone permission in System Settings → Privacy & Security → Microphone.

Text not appearing at cursor - A few apps with custom input handling don't respond to simulated Cmd+V. Most standard apps work.

PortAudio errors on install - brew install portaudio first.


Contributing

Contributions are welcome. Open an issue first if you want to discuss a change.

git clone https://github.com/adamkhakhar/voicekey.git
cd voicekey
uv sync
uv run pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicekey-0.1.1.tar.gz (50.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicekey-0.1.1-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file voicekey-0.1.1.tar.gz.

File metadata

  • Download URL: voicekey-0.1.1.tar.gz
  • Upload date:
  • Size: 50.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for voicekey-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e26f6b6ffe7b28e9a548b7f78811c0f894428f632374bdb64ff105aace2d17e0
MD5 20a66c87b88f0fbd53cd33f9281b43c6
BLAKE2b-256 a6ab2fcdf16f39bd80247674e5e802ead0431ccc91ea8db1d87dacd872c3511a

See more details on using hashes here.

File details

Details for the file voicekey-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: voicekey-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.5

File hashes

Hashes for voicekey-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 89bc11508e29cc4bfc9ad4cbef95d46b1fd520d0a77a548c3b224b1dfcf6e35c
MD5 2654a70d6b3956dac3df91e84bd4d1c3
BLAKE2b-256 e865192fca3b983361301194743b70b450a08e1765b365ad8a5aba303025b98e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page