Hands-free voice input for Claude Code, Codex CLI, and any terminal — cross-platform
Project description
micoracle
Hands-free voice input for Claude Code, Codex CLI, OpenCode, and any terminal — on macOS, Linux, and Windows. OS is auto-detected at launch; STT and TTS backends are pluggable (local or cloud).
Say "Codex, refactor this function" → your speech is captured, transcribed, and pasted into the focused terminal with Enter pressed. No push-to-talk, no cloud required.
Features
- Cross-platform —
platform_adapter.pyauto-selects macOS (AppleScript) / Linux (xdotool on X11, wtype on Wayland) / Windows (pywin32 + pyautogui). - 4 STT backends — MLX Whisper (Apple Silicon), faster-whisper (cross-platform local), OpenAI Whisper API, Azure OpenAI Whisper.
- 4 TTS backends — macOS
say,pyttsx3(cross-platform offline), OpenAI TTS, Azure Speech TTS. Ornoneto stay silent. - Continuous listening with WebRTC VAD + 300 ms preroll buffer (wake words never get chopped by onset detection).
- Wake-word gate —
"Claude, …"/"Codex, …"with fuzzy mishear tolerance. - Two-step follow-up — say the wake word alone, then the prompt within 8 s.
- Silence-hallucination filter — Whisper's "Thank you." / "Amen." artifacts are dropped silently.
- Locked dispatch target — frontmost app captured at startup (or pin via
--target-app), stays fixed regardless of where you click. - Clipboard-safe — text is pasted via clipboard; your original contents are restored afterwards.
OS / backend matrix
| Platform | STT default | TTS default | Focus & paste | Notes |
|---|---|---|---|---|
| macOS (Apple Silicon) | mlx |
say |
AppleScript | Best local latency |
| macOS (Intel) | faster |
say |
AppleScript | |
| Linux X11 | faster |
pyttsx3 |
xdotool + xclip |
Full support |
| Linux Wayland | faster |
pyttsx3 |
wtype + wl-copy |
--target-app required; auto-focus limited |
| Windows 10/11 | faster |
pyttsx3 |
pywin32 + pyautogui | Installed automatically by the PyPI package on Windows |
All defaults can be overridden via --stt-backend / --tts-backend or
environment variables.
Install
Common (all platforms)
git clone https://github.com/thepradip/micoracle.git
cd micoracle
python3 -m pip install -r requirements.txt
Pick an STT backend
Apple Silicon Mac (best local):
pip install mlx-whisper
Cross-platform local:
pip install faster-whisper
Cloud (OpenAI or Azure):
pip install openai
Pick a TTS backend (optional — only if you want spoken status cues)
macOS: say is built-in, no install needed.
Linux / Windows offline:
pip install pyttsx3
# Linux also needs: sudo apt install espeak (or distro equivalent)
Cloud:
pip install openai # OpenAI TTS
Platform-specific
macOS:
brew install portaudio # required by sounddevice
Linux (X11):
sudo apt install xdotool xclip portaudio19-dev python3-dev
Linux (Wayland):
sudo apt install wtype wl-clipboard portaudio19-dev python3-dev
Windows:
pip install micoracle
Configure
cp .env.example .env
# Edit .env to set API keys (only if using cloud backends), pick backends, etc.
Quickstart
# Focus the app you want to dispatch to (e.g. Claude Code window), then:
./run_hands_free.sh # macOS / Linux
run_hands_free.bat # Windows
# Override STT/TTS at launch:
./run_hands_free.sh --stt-backend azure --tts-backend openai
# Pin the target app (required on Wayland):
./run_hands_free.sh --target-app gnome-terminal
Speak:
- One-shot: "Codex, write a Python hello world." → transcribed + pasted into the focused app.
- Two-step: "Codex." → you hear "listening" → within 8 s say the prompt → pasted.
CLI reference
| Flag | Default | Description |
|---|---|---|
--device <id|name> |
system default mic | Audio input device. |
--list-devices |
— | Print available input devices and exit. |
--target-app <name> |
frontmost at startup | Lock dispatch target. |
--stt-backend |
auto |
auto / mlx / faster / openai / azure. |
--tts-backend |
auto |
auto / say / pyttsx3 / openai / azure / none. |
--no-speak |
— | Alias for --tts-backend none. |
Environment variables
See .env.example for the full commented list. Highlights:
| Variable | Purpose |
|---|---|
VOICE_AGENT_STT_BACKEND |
Default STT backend |
VOICE_AGENT_TTS_BACKEND |
Default TTS backend |
VOICE_AGENT_TARGET_APP |
Default dispatch target |
VOICE_AGENT_INPUT_DEVICE |
Default mic |
VOICE_AGENT_MLX_REPO |
MLX Whisper HF repo |
VOICE_AGENT_FASTER_MODEL |
faster-whisper model name |
OPENAI_API_KEY |
For openai STT/TTS backends |
AZURE_OPENAI_ENDPOINT / AZURE_OPENAI_KEY / AZURE_WHISPER_DEPLOYMENT |
For azure STT |
AZURE_SPEECH_KEY / AZURE_SPEECH_REGION |
For azure TTS |
Architecture
[ mic ] ──▶ sounddevice callback ──▶ audio_q ──▶ main loop
│
▼ (VAD + preroll buffer)
utterance_q
│
▼
worker: STTBackend → wake/cleanup → PlatformAdapter → TTSBackend
stt.py—STTBackendinterface + 4 implementations + OS-aware factory.tts.py—TTSBackendinterface + 4 implementations + factory.platform_adapter.py—PlatformAdapterinterface +MacAdapter,LinuxAdapter,WindowsAdapter+get_platform_adapter()factory.hands_free_voice.py— mic capture, VAD state machine, wake-state, wiring.
Troubleshooting
No input devices shown. OS microphone permission for your terminal is missing. Grant it (macOS: Privacy & Security → Microphone; Linux: check PulseAudio / PipeWire; Windows: Settings → Privacy → Microphone) and relaunch the terminal.
Wake word never fires. Confirm the right mic via --list-devices. Say
"Codex" slowly and clearly — fuzzy matching covers most mishears but a low
input gain can strip initial consonants.
[dispatch error] on Wayland. Wayland blocks programmatic window focusing.
Pass --target-app and keep that window focused yourself.
Windows: keystrokes sent to the wrong window. Windows' focus-stealing
prevention can block SetForegroundWindow. Give the target window focus
manually, or use [AutoHotkey] to nudge focus-stealing permissions.
macOS: keystrokes ignored. Accessibility + Automation permissions missing for the terminal. System Settings → Privacy & Security → Accessibility / Automation.
Privacy & security
- Local backends keep audio on-device. MLX Whisper and faster-whisper do not make any network calls at inference time.
- Cloud backends upload audio (OpenAI / Azure). Use only if you're comfortable.
- Clipboard is temporarily overwritten with each dispatch; original contents are restored immediately after.
- No telemetry. No analytics. No phone-home.
- Accessibility / Automation permissions are powerful — the agent types into the focused app and presses Enter. Review the source before granting.
License
MIT © 2026 Pradip Tivhale
Acknowledgements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file micoracle-1.3.3.tar.gz.
File metadata
- Download URL: micoracle-1.3.3.tar.gz
- Upload date:
- Size: 58.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56a881b16fdd55d42af8f189190b083bdf24bba2457f181cc4f98b789accad27
|
|
| MD5 |
b08a8084a4ca26ca271290e5361341d7
|
|
| BLAKE2b-256 |
71b5f15f488090e6263136f330b6093c4616ca07ff57eb61d4d1f51b1e79fb7f
|
File details
Details for the file micoracle-1.3.3-py3-none-any.whl.
File metadata
- Download URL: micoracle-1.3.3-py3-none-any.whl
- Upload date:
- Size: 36.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f319346ba17c96de07e6d908cb3277caa8a9ded456f8df3ab1f8416259eb570
|
|
| MD5 |
b7867680d4f6dd8906ef7f2981f489bb
|
|
| BLAKE2b-256 |
947764ad943707bb936e576fd86b6c64c2154b6870f2d0186a2971a0c537b18d
|