Voice-activated terminal-aware prompt enhancer for AI coding assistants
Project description
PromptShell
"Great developers write great code, not great prompts. Just say 'fix it' - PromptShell sees your terminal and does the rest."
A voice-activated, terminal-aware prompt enhancer for AI coding assistants. Press a hotkey, speak a vague command like "fix the error" - PromptShell captures your full terminal context (screen buffer, working directory, recent commands, detected errors, git branch, screenshot) and rewrites it into a precise, actionable prompt ready for ChatGPT, Copilot, Claude, Cursor, or any other AI tool.
Enhancement is powered by Gemini 2.5 Flash Lite on Google Cloud Run, using the Google GenAI SDK with multimodal (text + screenshot) input.
Table of Contents
- Quick Start
- Platform Support
- Why PromptShell
- How It Works
- Architecture
- Tech Stack
- Features
- Configuration
- Google Cloud Deployment
- Development
- Documentation
- License
Quick Start
Prerequisites
- Python 3.11+
uv(recommended) or pip- macOS: Xcode Command Line Tools (
xcode-select --install) - Linux:
portaudio19-devfor microphone support (sudo apt install portaudio19-dev) - A Google AI Studio API key (free tier available)
Install
pip install prompt-shell-cli
Set up
# Set your Gemini API key
export GEMINI_API_KEY=your_key_here
# Generate default config at ~/.prompt-shell/config.yaml
prompt-shell init
# Install shell hook for richer terminal state (zsh/bash/fish)
prompt-shell install-hook
# Verify it works with a one-shot enhance
prompt-shell enhance "fix the build error"
Run the daemon
prompt-shell start
# Press Ctrl+Alt+E, speak your command, prompt lands in clipboard
Platform Support
Requires Python 3.11+. Shells supported: bash, zsh, fish.
| Platform | Support | Notes |
|---|---|---|
| macOS 12+ | Full | Primary platform - all features available |
| Linux Desktop (X11) | Full | Requires xclip + scrot |
| Linux Desktop (Wayland) | Full | Requires wl-copy + grim |
| Linux Headless / SSH | Partial | No hotkey or clipboard; use delivery.method: file |
| WSL2 | Full | Treated as Linux; clipboard works via interop |
| Windows (native) | Not supported | Use WSL2 |
Feature availability by platform
| Feature | macOS | Linux Desktop | Headless/SSH | WSL2 |
|---|---|---|---|---|
| Hotkey daemon | ✓ | ✓ | ✗ | ✓ |
| Screenshot capture | ✓ | ✓ | ✗ | ✓ |
| Voice + transcription | ✓ | ✓ | ✓ ¹ | ✓ |
| Apple Speech (local) | ✓ | ✗ | ✗ | ✗ |
| Clipboard delivery | ✓ | ✓ | ✗ | ✓ |
| File delivery | ✓ | ✓ | ✓ | ✓ |
| iTerm2 paste | ✓ | ✗ | ✗ | ✗ |
| Desktop notifications | ✓ | ✓ | ✗ | ✓ |
| Systemd service | ✗ | ✓ | ✓ | ✓ |
| Shell hooks (bash/zsh/fish) | ✓ | ✓ | ✓ | ✓ |
¹ Requires an audio device accessible over SSH.
System dependencies
| Platform | Required packages |
|---|---|
| macOS | brew install portaudio · xcode-select --install |
| Linux X11 | sudo apt install portaudio19-dev xclip scrot libnotify-bin |
| Linux Wayland | sudo apt install portaudio19-dev wl-clipboard grim libnotify-bin |
| Linux Headless | sudo apt install portaudio19-dev (voice only) |
| iTerm2 integration | pip install "prompt-shell[iterm2]" (macOS only) |
Why PromptShell
Without PromptShell, the typical AI debugging loop looks like this:
- Send vague command ("fix the error") → AI guesses wrong → wasted round
- Paste error manually, forget the file path → AI still misses → second wasted round
- Re-send with full context → finally get a useful answer → 3rd round is the first productive one
Each round burns input + output tokens. PromptShell collapses 3 rounds into 1 by front-loading all the context the AI needs to answer correctly on the first attempt.
| Without PromptShell | With PromptShell | |
|---|---|---|
| Rounds to resolution | 3–5 | 1 |
| Tokens per debug task | ~3,000–5,000 | ~1,100 |
| Token reduction | baseline | ~60–75% fewer |
The meta-prompt adds ~500 tokens of structured context - but eliminates the back-and-forth entirely. At $15/million tokens, a developer doing 30 AI interactions/day saves roughly $540/year in API credits while resolving issues faster.
Beyond API Credits
- Zero context-switching - no manual copy-paste of errors, paths, or branch names
- Voice-first - speak while looking at the terminal, hands stay on the keyboard
- Multimodal ground truth - screenshot catches UI dialogs and editor state the text buffer misses
- Privacy by default - Whisper transcription runs locally; voice never leaves the machine
- Works with any AI - output is plain text, paste into ChatGPT, Copilot, Claude, Cursor, or any tool
- Offline capable - falls back to Ollama when the cloud service is unreachable
- No lock-in - enhances whatever AI assistant you already use
How It Works
Input: you say "fix the error"
PromptShell captures:
- Screen buffer:
error TS2345 in src/auth/middleware.ts:42 - CWD:
~/project/backend - Last command:
npm run build(exit code 1) - Git branch:
feature/auth-refactor - Screenshot of the current terminal (PNG)
Pipeline: local client → Cloud Run → Gemini 2.5 Flash Lite (multimodal)
Output delivered to clipboard:
Fix the TypeScript compilation error TS2345 in src/auth/middleware.ts:42 -
Argument of type 'string' is not assignable to parameter of type 'AuthToken'.
The last command `npm run build` failed with exit code 1.
CWD: ~/project/backend, branch: feature/auth-refactor
Result: paste into any AI tool and get a precise answer on the first try.
Architecture
PromptShell runs as a multimodal AI pipeline - text, voice, and vision feed into a single agentic context capture step. The AI Orchestrator routes the request to Gemini Vision on a serverless Google Cloud backend, with graceful degradation to an offline AI model if the cloud is unreachable. Everything except the Gemini call runs on-device.
Full system design: docs/system_design.md.
Tech Stack
| Component | Technology | Role |
|---|---|---|
| Multimodal AI | Gemini 2.5 Flash Lite | Prompt enhancement (text + vision) |
| AI SDK | Google GenAI SDK | Gemini API integration |
| Cloud Backend | Google Cloud Run | Serverless hosting - scales to zero |
| Speech-to-Text | Whisper AI (faster-whisper) | Local, private voice transcription |
| Local AI Fallback | Ollama (litellm) | Offline inference when cloud is unreachable |
| Terminal Backends | tmux · iTerm2 · shell hook · generic | Terminal state capture (4 backends) |
| Vision Capture | screencapture · grim · scrot | Terminal screenshot for Gemini Vision |
| CLI Framework | Typer + Rich | CLI interface and hotkey daemon |
| Language | Python 3.11+ | Cross-platform (macOS + Linux) |
Features
Terminal Context Capture
Four pluggable backends, auto-detected at startup in priority order:
| Backend | Screen Buffer | CWD | Commands | Exit Code | Setup |
|---|---|---|---|---|---|
| tmux | Yes | Yes | Yes | Via hook | Be inside a tmux session |
| iterm2 | Yes | Yes | Yes | Yes | pip install .[iterm2] (macOS only) |
| shell_hook | No | Yes | Yes | Yes | prompt-shell install-hook |
| generic | No | Yes | History | No | None (always available) |
- Polls every 2s in idle mode; captures immediately on hotkey trigger
- Detects project type from manifest files (
package.json,Cargo.toml,go.mod,pyproject.toml) - Reads git branch from the working directory
- Captures a screenshot of the active display concurrently with voice recording
Error Detection
A regex engine scans the terminal output and extracts structured error info (type, code, file, line, message) for 12+ error families:
- Build errors - TypeScript (
TS*), ESLint, Rust (cargo), Go, Python - Runtime errors - Node.js stack traces, Python tracebacks, segfaults
- Test failures - Jest, pytest,
cargo test - Git conflicts - merge conflict markers
- Permission errors -
EACCES, sudo prompts - Network errors - HTTP 4xx/5xx,
ECONNREFUSED
Voice Capture and Transcription
- Records from the microphone with energy-based VAD; stops on 1s of silence
- Three transcription backends (auto-fallback):
- faster-whisper (local, offline, private) - default
- OpenAI Whisper API (cloud, most accurate for jargon)
- Apple Speech Framework (macOS native, lowest latency)
Screenshot Context (Multimodal)
On each hotkey trigger, PromptShell captures a PNG screenshot of the current screen and attaches it to the Gemini request alongside the text context. This makes the call truly multimodal - Gemini Vision sees the exact terminal state, catching errors and UI details that the text buffer alone may miss.
- macOS - uses the built-in
screencapture(no install needed) - Linux/GNOME - uses
gnome-screenshot(sudo apt install gnome-screenshot) - Linux/Wayland (wlroots) - uses
grim(sudo apt install grim) - Linux/X11 - uses
scrot(sudo apt install scrot)
Disable in config if not needed:
terminal:
capture_screenshot: false
Prompt Enhancement (Gemini on Cloud Run)
- Terminal context + voice transcript + screenshot sent to Cloud Run as a single request
- Gemini 2.5 Flash Lite receives text + PNG screenshot as multimodal input via the Google GenAI SDK
- Falls back to a local LLM or template if Cloud Run is unavailable
Delivery
| Method | Description |
|---|---|
clipboard |
Copies to system clipboard - pbcopy (macOS), xclip / wl-copy (Linux). Default. |
iterm_paste |
Pastes directly into the active iTerm2 session (macOS only). |
file |
Writes the prompt to ~/.prompt-shell/last_prompt.txt. Useful for SSH/headless environments. |
File delivery - read or pipe the output from another terminal:
cat ~/.prompt-shell/last_prompt.txt
cat ~/.prompt-shell/last_prompt.txt | aichat
Global Hotkeys
| Hotkey | Action |
|---|---|
Ctrl+Alt+E |
Voice capture → enhance → deliver to clipboard |
Ctrl+Alt+L |
Enhance last clipboard text with terminal context (no voice) |
Ctrl+Alt+R |
Re-enhance last prompt with updated terminal context |
Esc |
Cancel ongoing voice capture |
Customize hotkeys in
~/.prompt-shell/config.yamlunder thehotkeys:key. AvoidCtrl+Shift+E- reserved by GNOME/GTK for Unicode input.
CLI Reference
prompt-shell start # Start the daemon with global hotkeys
prompt-shell enhance "..." # One-shot: enhance a text prompt
prompt-shell context # Print current terminal context (debug)
prompt-shell install-hook # Install shell hook (zsh/bash/fish)
prompt-shell install-service # Install as a systemd user service (Linux)
prompt-shell init # Generate default config
Running as a Service (Linux)
Install PromptShell as a systemd user service so it starts automatically on login:
prompt-shell install-service
echo "GEMINI_API_KEY=your_key_here" >> ~/.prompt-shell/env
systemctl --user status prompt-shell # Check status
systemctl --user restart prompt-shell # Restart
systemctl --user stop prompt-shell # Stop
systemctl --user disable prompt-shell # Disable autostart
journalctl --user -u prompt-shell -f # Follow logs
Security
- Screen buffer is held in memory only - never persisted to disk
- All transcription is local by default (faster-whisper); cloud APIs are opt-in
GEMINI_API_KEYis read from the environment variable only; never stored in config files- Environment variable substitution (
${VAR}) is supported in config - Configurable redaction patterns for secrets in terminal output
Configuration
All settings live in ~/.prompt-shell/config.yaml. Generate the default:
prompt-shell init
terminal:
backend: auto # auto | tmux | iterm2 | shell_hook | generic
capture_screenshot: true
voice:
engine: whisper_local # whisper_local | whisper_api | apple_speech
whisper_model: base.en
llm:
provider: gemini
model: gemini-2.5-flash-lite
api_key: ${GEMINI_API_KEY}
cloud_run_url: ${CLOUD_RUN_URL}
delivery:
method: clipboard # clipboard | iterm_paste | file
show_notification: true
See config.example.yaml for all available options with annotations.
Google Cloud Deployment
The enhancement service runs on Cloud Run with Gemini 2.5 Flash Lite via the Google GenAI SDK. It scales to zero when idle - cost is effectively $0/month at personal/demo scale.
Automatic deployment (recommended)
Pushing a version tag triggers the full release pipeline:
# 1. Bump version in pyproject.toml, then commit
git commit -m "chore: bump version to 0.2.0"
# 2. Tag and push - pipeline handles the rest
git tag v0.2.0
git push origin v0.2.0
Pipeline: validate → test → build → publish-pypi → deploy-cloud-run / github-release / docker
Cloud Run deploys after PyPI publish, keeping the service and package version in sync. Each image is tagged with the version (e.g. v0.2.0) for easy rollback.
Required GitHub secrets
Add these to Settings → Secrets and variables → Actions before your first release:
| Secret | Value |
|---|---|
GCP_PROJECT_ID |
Your GCP project ID |
GCP_SA_KEY |
Service account JSON key, base64-encoded |
GEMINI_API_KEY |
Google AI Studio API key |
See docs/deployment.md for step-by-step GCP setup.
First-time / local deploy
export PROJECT_ID=your-gcp-project-id
export GEMINI_API_KEY=your-gemini-api-key
bash deploy.sh
Manual re-deploy (without a new release)
To redeploy without bumping the version - e.g. after a secret rotation:
GitHub → Actions → Deploy to Cloud Run (Manual) → Run workflow
Verify the deployment
curl $CLOUD_RUN_URL/health
# {"status":"ok"}
curl -X POST $CLOUD_RUN_URL/enhance \
-H "Content-Type: application/json" \
-d '{"voice_transcript": "fix the build error", "cwd": "/app"}'
Rollback
gcloud run deploy prompt-shell-enhancer \
--image gcr.io/$PROJECT_ID/prompt-shell-enhancer:v0.1.0 \
--platform managed --region us-central1
Cost
~$0/month at demo/personal scale. Cloud Run scales to zero between requests. Gemini 2.5 Flash Lite free tier covers 1,500 requests/day. See docs/deployment.md for a full cost breakdown.
Development
git clone https://github.com/VinnyBabuManjaly/PromptShell.git
cd PromptShell
uv sync --extra dev
# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/
# Run tests
uv run pytest tests/ -v --tb=short
See docs/development-guide.md for the full contribution guide, branch conventions, and cross-platform notes.
Troubleshooting
| Symptom | Fix |
|---|---|
| No audio captured | Check microphone permissions (macOS: System Settings → Privacy → Microphone) |
portaudio not found |
sudo apt install portaudio19-dev (Linux) |
| Hotkey not triggering | Run prompt-shell start with sudo on Linux (required for global keyboard events) |
| Cloud Run returns 404 | Model name mismatch - verify gemini-2.5-flash-lite in your config and redeploy |
shell_hook not working |
Run prompt-shell install-hook then restart your shell |
| Clipboard empty after enhance | Check delivery.method: clipboard in config; try delivery.method: file as a fallback |
Documentation
| Document | Description |
|---|---|
| docs/development-guide.md | Development guide - prerequisites, build/test/lint, branch conventions, contribution workflow |
| docs/system_design.md | Full system design - Mermaid diagrams for architecture, pipeline sequence, data models, CI/CD |
| docs/architecture.md | ASCII architecture diagrams, module layout, data flow, error handling strategy, performance budget |
| docs/deployment.md | End-to-end deployment guide - GCP setup, GitHub secrets, release pipeline, rollback, cost breakdown |
| docs/spec.md | Technical specification - problem statement, requirements, API design, data models |
| docs/article.md | Published article - "I Built a Voice-Activated Prompt Enhancer That Reads Your Terminal" |
| docs/future_scope.md | Future scope - 10 planned features, 12 scalability and architectural improvements |
| config.example.yaml | Annotated configuration template with all available options |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prompt_shell_cli-0.1.0.tar.gz.
File metadata
- Download URL: prompt_shell_cli-0.1.0.tar.gz
- Upload date:
- Size: 2.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
308de50cf7c74e6ed99f83f1b9c8ba751c627776f7313b4dc78e9a3f3d952988
|
|
| MD5 |
513e876ae86e6ed16eb0b25e99f15d8e
|
|
| BLAKE2b-256 |
db18f6126b3f373a0c49a3c8317eaba091007b717e51469f0105226be7fd856e
|
Provenance
The following attestation bundles were made for prompt_shell_cli-0.1.0.tar.gz:
Publisher:
release.yml on VinnyBabuManjaly/PromptShell
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
prompt_shell_cli-0.1.0.tar.gz -
Subject digest:
308de50cf7c74e6ed99f83f1b9c8ba751c627776f7313b4dc78e9a3f3d952988 - Sigstore transparency entry: 1111593663
- Sigstore integration time:
-
Permalink:
VinnyBabuManjaly/PromptShell@67f8564867c05e6614eddb8318554ca652b029a3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/VinnyBabuManjaly
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@67f8564867c05e6614eddb8318554ca652b029a3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file prompt_shell_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: prompt_shell_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 45.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7374d9cc2dc912581022cabfdc274e4c68fe46f83f69247217d756730b47097f
|
|
| MD5 |
81a6b3f47b47cc48ce26e127b2632fe5
|
|
| BLAKE2b-256 |
df9cc4cd97f2c0ff33cc1401a56ec31f5be55d3e15249a9ceabd9845026c0e77
|
Provenance
The following attestation bundles were made for prompt_shell_cli-0.1.0-py3-none-any.whl:
Publisher:
release.yml on VinnyBabuManjaly/PromptShell
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
prompt_shell_cli-0.1.0-py3-none-any.whl -
Subject digest:
7374d9cc2dc912581022cabfdc274e4c68fe46f83f69247217d756730b47097f - Sigstore transparency entry: 1111593736
- Sigstore integration time:
-
Permalink:
VinnyBabuManjaly/PromptShell@67f8564867c05e6614eddb8318554ca652b029a3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/VinnyBabuManjaly
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@67f8564867c05e6614eddb8318554ca652b029a3 -
Trigger Event:
push
-
Statement type: