Skip to main content

Hands-free voice input for Claude Code, Codex CLI, and any terminal — cross-platform

Project description

micoracle

Stop typing your AI prompts. Just say them.

Hands-free voice input for Claude Code, Codex CLI, and any terminal — macOS · Linux · Windows

Say "Micoracle, refactor this function" → transcribed → pasted into your terminal → Enter pressed. No push-to-talk. No cloud required.

PyPI version PyPI Downloads License: MIT Python 3.9+ macOS Linux Windows


Demo

https://github.com/user-attachments/assets/8ab4fc80-8557-4b4e-9149-d6dfad434f70


Quick Install

git clone https://github.com/thepradip/micoracle.git
cd micoracle
pip install -r requirements.txt

Then pick your platform and run:

./run_hands_free.sh        # macOS / Linux
run_hands_free.bat         # Windows

Need a specific STT backend? Jump to the full install guide below.


Why micoracle?

Without micoracle With micoracle
Stop → think → type prompt → Enter Say the prompt. Done.
Push-to-talk or browser extension Always-on wake-word listener
Cloud-only transcription 100% offline on Apple Silicon & CPU
Locked to one tool Works with any terminal app

Works with Claude Code · OpenAI Codex CLI · OpenCode · iTerm2 · Warp · VS Code terminal · Windows Terminal


Features

Feature Detail
🌐 Cross-platform Auto-selects macOS (AppleScript), Linux (xdotool / wtype), or Windows (pywin32 + pyautogui)
🎙️ 10 STT backends MLX Whisper · faster-whisper · OpenAI · Azure · OpenAI Realtime · 60dB · ElevenLabs · Deepgram · AssemblyAI · Groq · Gladia
🔊 4 TTS backends macOS say · pyttsx3 · OpenAI TTS · Azure Speech TTS
🔉 Continuous listening WebRTC VAD + 300 ms preroll buffer — wake words are never clipped at onset
💬 Wake-word gate "Claude, …" / "Codex, …" / "Micoracle, …" with fuzzy mishear tolerance
⏱️ Two-step follow-up Say wake word alone → hear "listening" → speak prompt within 8 s
💰 Cost-guard Cloud STT backends activate only after wake-word — continuous listening is always local & free
🚫 Hallucination filter Whisper artifacts like "Thank you." / "Amen." silently dropped
🔒 Target-aware dispatch macOS / Windows reactivate the startup target; Linux dispatches to focused window
📋 Clipboard-conscious Original clipboard contents restored immediately after each dispatch

STT Backends

Local (free, offline)

Backend --stt-backend Best for Install
MLX Whisper mlx Apple Silicon — fastest on-device pip install mlx-whisper
faster-whisper faster Cross-platform CPU / CUDA pip install faster-whisper

Cloud (post-wake-word only — never billed for continuous listening)

Backend --command-stt-backend Latency Extra install Key env var
OpenAI Whisper openai ~1 s pip install openai OPENAI_API_KEY
Azure Whisper azure ~1 s pip install openai AZURE_OPENAI_KEY
OpenAI Realtime realtime ~600 ms pip install openai websockets OPENAI_API_KEY
60dB.ai 60db ~600 ms (none — stdlib only) SIXTYDB_API_KEY
ElevenLabs Scribe elevenlabs ~400 ms (none — stdlib only) ELEVENLABS_API_KEY
Deepgram Nova-2 deepgram ~250 ms (none — stdlib only) DEEPGRAM_API_KEY
Groq Whisper groq ~200 ms (none — stdlib only) GROQ_API_KEY
AssemblyAI assemblyai ~3–5 s (none — stdlib only) ASSEMBLYAI_API_KEY
Gladia gladia ~3–5 s (none — stdlib only) GLADIA_API_KEY

Cost-guard rule: only mlx, faster, and auto are allowed for continuous listening. Any cloud backend set as --stt-backend is automatically demoted to --command-stt-backend and a local backend handles the mic stream instead.


Wake Words

Say Example
Claude, … "Claude, explain this function"
Codex, … "Codex, refactor to async"
Micoracle, … "Micoracle, write a SQL query"

All three support fuzzy mishear tolerance — common STT splits like "Mic Oracle", "Mick Oracle", "meek oracle", "Lord" (for Claude) are all caught automatically.

Two-step mode: say the wake word alone → hear "listening" → speak your command within 8 s.


Platform & Backend Matrix

Platform STT (listening) STT (command) TTS Focus & paste
macOS Apple Silicon mlx your choice say AppleScript
macOS Intel faster your choice say AppleScript
Linux X11 faster your choice pyttsx3 xdotool type
Linux Wayland faster your choice pyttsx3 wtype + wl-copy
Windows 10/11 faster your choice pyttsx3 pywin32 + pyautogui

Install

macOS install commands shown in a terminal

Step 1 — Core dependencies (all platforms)

git clone https://github.com/thepradip/micoracle.git
cd micoracle
pip install -r requirements.txt

Step 2 — System packages

macOS:

brew install portaudio

Linux (X11):

sudo apt install xdotool portaudio19-dev python3-dev

Linux (Wayland):

sudo apt install wtype wl-clipboard portaudio19-dev python3-dev

Step 3 — Pick a local STT backend (for continuous listening)

Platform Command
macOS Apple Silicon pip install mlx-whisper
macOS Intel / Linux / Windows pip install faster-whisper

Step 4 — Pick a cloud STT backend (for commands, optional)

Backend Command Notes
OpenAI Whisper pip install openai Set OPENAI_API_KEY
Azure Whisper pip install openai Set Azure env vars
OpenAI Realtime pip install openai websockets Set OPENAI_API_KEY
60dB.ai (none) Set SIXTYDB_API_KEY
ElevenLabs Scribe (none) Set ELEVENLABS_API_KEY
Deepgram Nova (none) Set DEEPGRAM_API_KEY
Groq Whisper (none) Set GROQ_API_KEY
AssemblyAI (none) Set ASSEMBLYAI_API_KEY
Gladia (none) Set GLADIA_API_KEY

Step 5 — Pick a TTS backend (optional, for status cues)

Backend Best for Install
say macOS (built-in) nothing
pyttsx3 Linux / Windows offline pip install pyttsx3 + sudo apt install espeak
openai Cloud (OpenAI TTS) pip install openai
azure Cloud (Azure Speech) set Azure Speech env vars

Step 6 — Windows dispatch packages

pip install pyperclip pyautogui pywin32 psutil

Step 7 — Configure

cp .env.example .env

Recommended .env for Apple Silicon + 60dB commands:

VOICE_AGENT_STT_BACKEND=mlx
VOICE_AGENT_COMMAND_STT_BACKEND=60db
SIXTYDB_API_KEY=sk_live_...

Recommended .env for Apple Silicon + Groq commands (fastest):

VOICE_AGENT_STT_BACKEND=mlx
VOICE_AGENT_COMMAND_STT_BACKEND=groq
GROQ_API_KEY=gsk_...

Quickstart

# Focus Claude Code, Codex CLI, or any terminal — then launch:
./run_hands_free.sh          # macOS / Linux
run_hands_free.bat           # Windows

One-shot: "Micoracle, write a Python hello world." → pasted with Enter.

Two-step: "Micoracle." → hear "listening" → say prompt within 8 s → pasted.

Override backends at launch:

./run_hands_free.sh --stt-backend mlx --command-stt-backend groq

Pin to a specific app (required on Wayland):

./run_hands_free.sh --target-app gnome-terminal

CLI Reference

Flag Default Description
--device <id|name> system default mic Audio input device
--list-devices Print available input devices and exit
--target-app <name> frontmost app at startup Lock the dispatch target
--stt-backend auto Local STT for continuous listening: auto / mlx / faster
--command-stt-backend same as --stt-backend Cloud STT for commands after wake-word: openai / azure / realtime / 60db / elevenlabs / deepgram / groq / assemblyai / gladia
--tts-backend auto auto / say / pyttsx3 / openai / azure / none
--no-speak Alias for --tts-backend none

Environment Variables

See .env.example for the full commented list.

Core

Variable Purpose
VOICE_AGENT_STT_BACKEND Local STT for listening (auto / mlx / faster)
VOICE_AGENT_COMMAND_STT_BACKEND Cloud STT for commands (60db / groq / deepgram / elevenlabs / assemblyai / gladia / openai / azure / realtime)
VOICE_AGENT_TTS_BACKEND TTS for status cues (auto / say / pyttsx3 / openai / azure / none)
VOICE_AGENT_TARGET_APP Default dispatch target app name
VOICE_AGENT_INPUT_DEVICE Default microphone device (name fragment or numeric id)

Local STT knobs

Variable Purpose
VOICE_AGENT_MLX_REPO MLX Whisper HuggingFace repo (Apple Silicon)
VOICE_AGENT_FASTER_MODEL faster-whisper model (tiny.en / base.en / small.en / medium.en / large-v3)
VOICE_AGENT_FASTER_DEVICE faster-whisper device (auto / cpu / cuda)
VOICE_AGENT_FASTER_COMPUTE faster-whisper compute type (int8 / float16 / int8_float16)

Cloud STT keys & options

Variable Backend Purpose
OPENAI_API_KEY openai / realtime OpenAI API key
VOICE_AGENT_OPENAI_STT_MODEL openai Model name (default: whisper-1)
VOICE_AGENT_REALTIME_MODEL realtime Realtime model (default: gpt-4o-transcribe)
AZURE_OPENAI_ENDPOINT azure Azure OpenAI endpoint URL
AZURE_OPENAI_KEY azure Azure OpenAI key
AZURE_WHISPER_DEPLOYMENT azure Deployment name (default: whisper)
SIXTYDB_API_KEY 60db 60dB.ai API key
VOICE_AGENT_SIXTYDB_LANGUAGE 60db Language code (default: en)
ELEVENLABS_API_KEY elevenlabs ElevenLabs API key
VOICE_AGENT_ELEVENLABS_MODEL elevenlabs Model (default: scribe_v2)
VOICE_AGENT_ELEVENLABS_LANGUAGE elevenlabs Language code (default: en)
DEEPGRAM_API_KEY deepgram Deepgram API key
VOICE_AGENT_DEEPGRAM_MODEL deepgram Model (default: nova-2)
VOICE_AGENT_DEEPGRAM_LANGUAGE deepgram Language code (default: en)
ASSEMBLYAI_API_KEY assemblyai AssemblyAI API key
VOICE_AGENT_ASSEMBLYAI_LANGUAGE assemblyai Language code (default: en)
GROQ_API_KEY groq Groq API key
VOICE_AGENT_GROQ_MODEL groq Model (default: whisper-large-v3-turbo)
VOICE_AGENT_GROQ_LANGUAGE groq Language code (default: en)
GLADIA_API_KEY gladia Gladia API key

TTS keys & options

Variable Purpose
VOICE_AGENT_TTS_VOICE macOS say voice name (e.g. Samantha)
VOICE_AGENT_OPENAI_TTS_VOICE OpenAI TTS voice (alloy / echo / fable / onyx / nova / shimmer)
AZURE_SPEECH_KEY Azure Speech TTS key
AZURE_SPEECH_REGION Azure Speech TTS region (e.g. eastus)
VOICE_AGENT_AZURE_TTS_VOICE Azure TTS voice (default: en-US-AriaNeural)
HF_HUB_ENABLE_HF_TRANSFER Set to 1 for faster HuggingFace model downloads

Architecture

micoracle architecture

How it works

  1. You speak a command — e.g. "Micoracle, refactor this function"
  2. micoracle listens for real speech — background noise is ignored via WebRTC VAD
  3. Wake word is checked locally — local STT transcribes the utterance; only Claude, Codex, or Micoracle pass the gate
  4. Command STT fires — if a cloud backend is configured, it re-transcribes for higher accuracy (paid API called only here)
  5. Clean prompt is sent — pasted into the target app, Enter pressed
  6. Status cue plays — e.g. "listening", "sent", or "error"

Module overview

Module Responsibility
hands_free_voice.py Main entry point — mic capture, VAD wiring, wake-word gate, dual-backend dispatch loop
segmenter.py VADSegmenter — frame-by-frame VAD state machine, preroll ring buffer
stt.py STTBackend ABC + 10 implementations + shared HTTP helpers + OS-aware auto factory
tts.py TTSBackend ABC + 4 implementations + auto factory
platform_adapter.py MacAdapter / LinuxAdapter / WindowsAdapter + factory

VAD state machine

IDLE ──(speech frames ≥ 4)──▶ CAPTURING ──(silence ≥ 840 ms OR 18 s cap)──▶ EMIT utterance ──▶ IDLE
 ▲                                 │
 └──(speech_run decays on silence)─┘

Troubleshooting

No input devices shown. Grant microphone permission to your terminal. macOS: Privacy & Security → Microphone. Linux: check PulseAudio / PipeWire. Windows: Settings → Privacy → Microphone.

Wake word never fires. Confirm the right mic with --list-devices. Say the wake word slowly — fuzzy matching covers common mishears, but very low mic gain can strip initial consonants.

Cloud backend not activating. Check that the API key env var is set in .env. Run with --command-stt-backend <name> to test explicitly.

[dispatch error] on Wayland. Wayland blocks programmatic window focus. Pass --target-app <name> and keep that window focused manually.

Windows: keystrokes go to the wrong window. Focus-stealing prevention can block SetForegroundWindow. Give the target window focus manually before speaking, or use AutoHotkey.

macOS: keystrokes ignored. Accessibility + Automation permissions missing. System Settings → Privacy & Security → Accessibility / Automation.


Privacy & Security

  • Local backends are fully on-device — MLX Whisper and faster-whisper make zero network calls
  • Cloud backends upload audio only after wake-word — continuous listening never touches cloud APIs
  • Clipboard temporarily overwritten per dispatch — original contents restored immediately
  • No telemetry. No analytics. No phone-home.
  • Accessibility permissions are powerful — review the source before granting

Future Scope

  • Stronger Linux target locking: closer to macOS / Windows target reactivation behaviour
  • Packaged installers: smoother setup with platform-specific dependency checks
  • Tray / menu bar control: pause, resume, backend selection, target status
  • Custom wake words: user-defined beyond the built-in three
  • Command history: optional local log of recent accepted prompts
  • Google Gemini STT: cloud transcription backend

Related searches

voice input for Claude Code · speech to text for terminal · hands-free coding assistant · talk to Codex CLI · whisper voice paste terminal · dictate to terminal macOS Linux Windows · offline speech recognition CLI · voice control AI coding tool · Claude Code voice · Codex CLI voice input · AI terminal voice control


License

MIT © 2026 Pradip Tivhale


Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

micoracle-1.4.0.tar.gz (46.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

micoracle-1.4.0-py3-none-any.whl (30.4 kB view details)

Uploaded Python 3

File details

Details for the file micoracle-1.4.0.tar.gz.

File metadata

  • Download URL: micoracle-1.4.0.tar.gz
  • Upload date:
  • Size: 46.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for micoracle-1.4.0.tar.gz
Algorithm Hash digest
SHA256 742ea597a05aa19c118ebade508bc49e3e2a296b7c6769d9a6a6882f62cbda27
MD5 6f524f0749d44d183dd4752eb5f81e15
BLAKE2b-256 590e3768f6a245f4fb5add1347611f90424a99b1b25efc12aa3b0ca2eb965d30

See more details on using hashes here.

File details

Details for the file micoracle-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: micoracle-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 30.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for micoracle-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3ce9289725757b926856aecd7a9cd682af97b440d70855e7a8b4c33a650523d9
MD5 f7ad0b547569775ed2dbb8f532ffdca6
BLAKE2b-256 f673e26a49ad322aeab04331ba7066790b5763d3c2a4322d75b08a926adbac19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page