Hands-free voice input for Claude Code, Codex CLI, and any terminal — cross-platform
Project description
Stop typing your AI prompts. Just say them.
Hands-free voice input for Claude Code, Codex CLI, and any terminal — macOS · Linux · Windows
Say "Micoracle, refactor this function" → transcribed → pasted into your terminal → Enter pressed. No push-to-talk. No cloud required.
Demo
https://github.com/user-attachments/assets/8ab4fc80-8557-4b4e-9149-d6dfad434f70
Quick Install
git clone https://github.com/thepradip/micoracle.git
cd micoracle
pip install -r requirements.txt
Then pick your platform and run:
./run_hands_free.sh # macOS / Linux
run_hands_free.bat # Windows
Need a specific STT backend? Jump to the full install guide below.
Why micoracle?
| Without micoracle | With micoracle |
|---|---|
| Stop → think → type prompt → Enter | Say the prompt. Done. |
| Push-to-talk or browser extension | Always-on wake-word listener |
| Cloud-only transcription | 100% offline on Apple Silicon & CPU |
| Locked to one tool | Works with any terminal app |
Works with Claude Code · OpenAI Codex CLI · OpenCode · iTerm2 · Warp · VS Code terminal · Windows Terminal
Features
| Feature | Detail | |
|---|---|---|
| 🌐 | Cross-platform | Auto-selects macOS (AppleScript), Linux (xdotool / wtype), or Windows (pywin32 + pyautogui) |
| 🎙️ | 10 STT backends | MLX Whisper · faster-whisper · OpenAI · Azure · OpenAI Realtime · 60dB · ElevenLabs · Deepgram · AssemblyAI · Groq · Gladia |
| 🔊 | 4 TTS backends | macOS say · pyttsx3 · OpenAI TTS · Azure Speech TTS |
| 🔉 | Continuous listening | WebRTC VAD + 300 ms preroll buffer — wake words are never clipped at onset |
| 💬 | Wake-word gate | "Claude, …" / "Codex, …" / "Micoracle, …" with fuzzy mishear tolerance |
| ⏱️ | Two-step follow-up | Say wake word alone → hear "listening" → speak prompt within 8 s |
| 💰 | Cost-guard | Cloud STT backends activate only after wake-word — continuous listening is always local & free |
| 🚫 | Hallucination filter | Whisper artifacts like "Thank you." / "Amen." silently dropped |
| 🔒 | Target-aware dispatch | macOS / Windows reactivate the startup target; Linux dispatches to focused window |
| 📋 | Clipboard-conscious | Original clipboard contents restored immediately after each dispatch |
STT Backends
Local (free, offline)
| Backend | --stt-backend |
Best for | Install |
|---|---|---|---|
| MLX Whisper | mlx |
Apple Silicon — fastest on-device | pip install mlx-whisper |
| faster-whisper | faster |
Cross-platform CPU / CUDA | pip install faster-whisper |
Cloud (post-wake-word only — never billed for continuous listening)
| Backend | --command-stt-backend |
Latency | Extra install | Key env var |
|---|---|---|---|---|
| OpenAI Whisper | openai |
~1 s | pip install openai |
OPENAI_API_KEY |
| Azure Whisper | azure |
~1 s | pip install openai |
AZURE_OPENAI_KEY |
| OpenAI Realtime | realtime |
~600 ms | pip install openai websockets |
OPENAI_API_KEY |
| 60dB.ai | 60db |
~600 ms | (none — stdlib only) | SIXTYDB_API_KEY |
| ElevenLabs Scribe | elevenlabs |
~400 ms | (none — stdlib only) | ELEVENLABS_API_KEY |
| Deepgram Nova-2 | deepgram |
~250 ms | (none — stdlib only) | DEEPGRAM_API_KEY |
| Groq Whisper | groq |
~200 ms | (none — stdlib only) | GROQ_API_KEY |
| AssemblyAI | assemblyai |
~3–5 s | (none — stdlib only) | ASSEMBLYAI_API_KEY |
| Gladia | gladia |
~3–5 s | (none — stdlib only) | GLADIA_API_KEY |
Cost-guard rule: only mlx, faster, and auto are allowed for continuous listening. Any cloud backend set as --stt-backend is automatically demoted to --command-stt-backend and a local backend handles the mic stream instead.
Wake Words
| Say | Example |
|---|---|
Claude, … |
"Claude, explain this function" |
Codex, … |
"Codex, refactor to async" |
Micoracle, … |
"Micoracle, write a SQL query" |
All three support fuzzy mishear tolerance — common STT splits like "Mic Oracle", "Mick Oracle", "meek oracle", "Lord" (for Claude) are all caught automatically.
Two-step mode: say the wake word alone → hear "listening" → speak your command within 8 s.
Platform & Backend Matrix
| Platform | STT (listening) | STT (command) | TTS | Focus & paste |
|---|---|---|---|---|
| macOS Apple Silicon | mlx |
your choice | say |
AppleScript |
| macOS Intel | faster |
your choice | say |
AppleScript |
| Linux X11 | faster |
your choice | pyttsx3 |
xdotool type |
| Linux Wayland | faster |
your choice | pyttsx3 |
wtype + wl-copy |
| Windows 10/11 | faster |
your choice | pyttsx3 |
pywin32 + pyautogui |
Install
Step 1 — Core dependencies (all platforms)
git clone https://github.com/thepradip/micoracle.git
cd micoracle
pip install -r requirements.txt
Step 2 — System packages
macOS:
brew install portaudio
Linux (X11):
sudo apt install xdotool portaudio19-dev python3-dev
Linux (Wayland):
sudo apt install wtype wl-clipboard portaudio19-dev python3-dev
Step 3 — Pick a local STT backend (for continuous listening)
| Platform | Command |
|---|---|
| macOS Apple Silicon | pip install mlx-whisper |
| macOS Intel / Linux / Windows | pip install faster-whisper |
Step 4 — Pick a cloud STT backend (for commands, optional)
| Backend | Command | Notes |
|---|---|---|
| OpenAI Whisper | pip install openai |
Set OPENAI_API_KEY |
| Azure Whisper | pip install openai |
Set Azure env vars |
| OpenAI Realtime | pip install openai websockets |
Set OPENAI_API_KEY |
| 60dB.ai | (none) | Set SIXTYDB_API_KEY |
| ElevenLabs Scribe | (none) | Set ELEVENLABS_API_KEY |
| Deepgram Nova | (none) | Set DEEPGRAM_API_KEY |
| Groq Whisper | (none) | Set GROQ_API_KEY |
| AssemblyAI | (none) | Set ASSEMBLYAI_API_KEY |
| Gladia | (none) | Set GLADIA_API_KEY |
Step 5 — Pick a TTS backend (optional, for status cues)
| Backend | Best for | Install |
|---|---|---|
say |
macOS (built-in) | nothing |
pyttsx3 |
Linux / Windows offline | pip install pyttsx3 + sudo apt install espeak |
openai |
Cloud (OpenAI TTS) | pip install openai |
azure |
Cloud (Azure Speech) | set Azure Speech env vars |
Step 6 — Windows dispatch packages
pip install pyperclip pyautogui pywin32 psutil
Step 7 — Configure
cp .env.example .env
Recommended .env for Apple Silicon + 60dB commands:
VOICE_AGENT_STT_BACKEND=mlx
VOICE_AGENT_COMMAND_STT_BACKEND=60db
SIXTYDB_API_KEY=sk_live_...
Recommended .env for Apple Silicon + Groq commands (fastest):
VOICE_AGENT_STT_BACKEND=mlx
VOICE_AGENT_COMMAND_STT_BACKEND=groq
GROQ_API_KEY=gsk_...
Quickstart
# Focus Claude Code, Codex CLI, or any terminal — then launch:
./run_hands_free.sh # macOS / Linux
run_hands_free.bat # Windows
One-shot: "Micoracle, write a Python hello world." → pasted with Enter.
Two-step: "Micoracle." → hear "listening" → say prompt within 8 s → pasted.
Override backends at launch:
./run_hands_free.sh --stt-backend mlx --command-stt-backend groq
Pin to a specific app (required on Wayland):
./run_hands_free.sh --target-app gnome-terminal
CLI Reference
| Flag | Default | Description |
|---|---|---|
--device <id|name> |
system default mic | Audio input device |
--list-devices |
— | Print available input devices and exit |
--target-app <name> |
frontmost app at startup | Lock the dispatch target |
--stt-backend |
auto |
Local STT for continuous listening: auto / mlx / faster |
--command-stt-backend |
same as --stt-backend |
Cloud STT for commands after wake-word: openai / azure / realtime / 60db / elevenlabs / deepgram / groq / assemblyai / gladia |
--tts-backend |
auto |
auto / say / pyttsx3 / openai / azure / none |
--no-speak |
— | Alias for --tts-backend none |
Environment Variables
See .env.example for the full commented list.
Core
| Variable | Purpose |
|---|---|
VOICE_AGENT_STT_BACKEND |
Local STT for listening (auto / mlx / faster) |
VOICE_AGENT_COMMAND_STT_BACKEND |
Cloud STT for commands (60db / groq / deepgram / elevenlabs / assemblyai / gladia / openai / azure / realtime) |
VOICE_AGENT_TTS_BACKEND |
TTS for status cues (auto / say / pyttsx3 / openai / azure / none) |
VOICE_AGENT_TARGET_APP |
Default dispatch target app name |
VOICE_AGENT_INPUT_DEVICE |
Default microphone device (name fragment or numeric id) |
Local STT knobs
| Variable | Purpose |
|---|---|
VOICE_AGENT_MLX_REPO |
MLX Whisper HuggingFace repo (Apple Silicon) |
VOICE_AGENT_FASTER_MODEL |
faster-whisper model (tiny.en / base.en / small.en / medium.en / large-v3) |
VOICE_AGENT_FASTER_DEVICE |
faster-whisper device (auto / cpu / cuda) |
VOICE_AGENT_FASTER_COMPUTE |
faster-whisper compute type (int8 / float16 / int8_float16) |
Cloud STT keys & options
| Variable | Backend | Purpose |
|---|---|---|
OPENAI_API_KEY |
openai / realtime |
OpenAI API key |
VOICE_AGENT_OPENAI_STT_MODEL |
openai |
Model name (default: whisper-1) |
VOICE_AGENT_REALTIME_MODEL |
realtime |
Realtime model (default: gpt-4o-transcribe) |
AZURE_OPENAI_ENDPOINT |
azure |
Azure OpenAI endpoint URL |
AZURE_OPENAI_KEY |
azure |
Azure OpenAI key |
AZURE_WHISPER_DEPLOYMENT |
azure |
Deployment name (default: whisper) |
SIXTYDB_API_KEY |
60db |
60dB.ai API key |
VOICE_AGENT_SIXTYDB_LANGUAGE |
60db |
Language code (default: en) |
ELEVENLABS_API_KEY |
elevenlabs |
ElevenLabs API key |
VOICE_AGENT_ELEVENLABS_MODEL |
elevenlabs |
Model (default: scribe_v2) |
VOICE_AGENT_ELEVENLABS_LANGUAGE |
elevenlabs |
Language code (default: en) |
DEEPGRAM_API_KEY |
deepgram |
Deepgram API key |
VOICE_AGENT_DEEPGRAM_MODEL |
deepgram |
Model (default: nova-2) |
VOICE_AGENT_DEEPGRAM_LANGUAGE |
deepgram |
Language code (default: en) |
ASSEMBLYAI_API_KEY |
assemblyai |
AssemblyAI API key |
VOICE_AGENT_ASSEMBLYAI_LANGUAGE |
assemblyai |
Language code (default: en) |
GROQ_API_KEY |
groq |
Groq API key |
VOICE_AGENT_GROQ_MODEL |
groq |
Model (default: whisper-large-v3-turbo) |
VOICE_AGENT_GROQ_LANGUAGE |
groq |
Language code (default: en) |
GLADIA_API_KEY |
gladia |
Gladia API key |
TTS keys & options
| Variable | Purpose |
|---|---|
VOICE_AGENT_TTS_VOICE |
macOS say voice name (e.g. Samantha) |
VOICE_AGENT_OPENAI_TTS_VOICE |
OpenAI TTS voice (alloy / echo / fable / onyx / nova / shimmer) |
AZURE_SPEECH_KEY |
Azure Speech TTS key |
AZURE_SPEECH_REGION |
Azure Speech TTS region (e.g. eastus) |
VOICE_AGENT_AZURE_TTS_VOICE |
Azure TTS voice (default: en-US-AriaNeural) |
HF_HUB_ENABLE_HF_TRANSFER |
Set to 1 for faster HuggingFace model downloads |
Architecture
How it works
- You speak a command — e.g. "Micoracle, refactor this function"
- micoracle listens for real speech — background noise is ignored via WebRTC VAD
- Wake word is checked locally — local STT transcribes the utterance; only
Claude,Codex, orMicoraclepass the gate - Command STT fires — if a cloud backend is configured, it re-transcribes for higher accuracy (paid API called only here)
- Clean prompt is sent — pasted into the target app, Enter pressed
- Status cue plays — e.g. "listening", "sent", or "error"
Module overview
| Module | Responsibility |
|---|---|
hands_free_voice.py |
Main entry point — mic capture, VAD wiring, wake-word gate, dual-backend dispatch loop |
segmenter.py |
VADSegmenter — frame-by-frame VAD state machine, preroll ring buffer |
stt.py |
STTBackend ABC + 10 implementations + shared HTTP helpers + OS-aware auto factory |
tts.py |
TTSBackend ABC + 4 implementations + auto factory |
platform_adapter.py |
MacAdapter / LinuxAdapter / WindowsAdapter + factory |
VAD state machine
IDLE ──(speech frames ≥ 4)──▶ CAPTURING ──(silence ≥ 840 ms OR 18 s cap)──▶ EMIT utterance ──▶ IDLE
▲ │
└──(speech_run decays on silence)─┘
Troubleshooting
No input devices shown. Grant microphone permission to your terminal. macOS: Privacy & Security → Microphone. Linux: check PulseAudio / PipeWire. Windows: Settings → Privacy → Microphone.
Wake word never fires.
Confirm the right mic with --list-devices. Say the wake word slowly — fuzzy matching covers common mishears, but very low mic gain can strip initial consonants.
Cloud backend not activating.
Check that the API key env var is set in .env. Run with --command-stt-backend <name> to test explicitly.
[dispatch error] on Wayland.
Wayland blocks programmatic window focus. Pass --target-app <name> and keep that window focused manually.
Windows: keystrokes go to the wrong window.
Focus-stealing prevention can block SetForegroundWindow. Give the target window focus manually before speaking, or use AutoHotkey.
macOS: keystrokes ignored. Accessibility + Automation permissions missing. System Settings → Privacy & Security → Accessibility / Automation.
Privacy & Security
- Local backends are fully on-device — MLX Whisper and faster-whisper make zero network calls
- Cloud backends upload audio only after wake-word — continuous listening never touches cloud APIs
- Clipboard temporarily overwritten per dispatch — original contents restored immediately
- No telemetry. No analytics. No phone-home.
- Accessibility permissions are powerful — review the source before granting
Future Scope
- Stronger Linux target locking: closer to macOS / Windows target reactivation behaviour
- Packaged installers: smoother setup with platform-specific dependency checks
- Tray / menu bar control: pause, resume, backend selection, target status
- Custom wake words: user-defined beyond the built-in three
- Command history: optional local log of recent accepted prompts
- Google Gemini STT: cloud transcription backend
Related searches
voice input for Claude Code · speech to text for terminal · hands-free coding assistant · talk to Codex CLI · whisper voice paste terminal · dictate to terminal macOS Linux Windows · offline speech recognition CLI · voice control AI coding tool · Claude Code voice · Codex CLI voice input · AI terminal voice control
License
MIT © 2026 Pradip Tivhale
Acknowledgements
- MLX Whisper · faster-whisper · py-webrtcvad
- 60dB.ai · ElevenLabs · Deepgram · Groq · AssemblyAI · Gladia
- sounddevice · soundfile
- xdotool · wtype · pyautogui
- pyttsx3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file micoracle-1.4.0.tar.gz.
File metadata
- Download URL: micoracle-1.4.0.tar.gz
- Upload date:
- Size: 46.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
742ea597a05aa19c118ebade508bc49e3e2a296b7c6769d9a6a6882f62cbda27
|
|
| MD5 |
6f524f0749d44d183dd4752eb5f81e15
|
|
| BLAKE2b-256 |
590e3768f6a245f4fb5add1347611f90424a99b1b25efc12aa3b0ca2eb965d30
|
File details
Details for the file micoracle-1.4.0-py3-none-any.whl.
File metadata
- Download URL: micoracle-1.4.0-py3-none-any.whl
- Upload date:
- Size: 30.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ce9289725757b926856aecd7a9cd682af97b440d70855e7a8b4c33a650523d9
|
|
| MD5 |
f7ad0b547569775ed2dbb8f532ffdca6
|
|
| BLAKE2b-256 |
f673e26a49ad322aeab04331ba7066790b5763d3c2a4322d75b08a926adbac19
|