Voice-activated terminal-aware prompt enhancer for AI coding assistants

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Project description

PromptShell

"Great developers write great code, not great prompts. Just say 'fix it' - PromptShell sees your terminal and does the rest."

A voice-activated, terminal-aware prompt enhancer for AI coding assistants. Press a hotkey, speak a vague command like "fix the error" - PromptShell captures your full terminal context (screen buffer, working directory, recent commands, detected errors, git branch, screenshot) and rewrites it into a precise, actionable prompt ready for ChatGPT, Copilot, Claude, Cursor, or any other AI tool.

Enhancement is powered by Gemini 2.5 Flash Lite on Google Cloud Run, using the Google GenAI SDK with multimodal (text + screenshot) input.

Quick Start
Platform Support
Why PromptShell
How It Works
Architecture
Tech Stack
Features
Configuration
Google Cloud Deployment
Development
Documentation
License

Quick Start

Prerequisites

Python 3.11+
uv (recommended) or pip
macOS: Xcode Command Line Tools (xcode-select --install)
Linux: portaudio19-dev for microphone support (sudo apt install portaudio19-dev)
A Google AI Studio API key (free tier available)

Install

pip install prompt-shell-cli

Set up

# Set your Gemini API key
export GEMINI_API_KEY=your_key_here

# Generate default config at ~/.prompt-shell/config.yaml
prompt-shell init

# Install shell hook for richer terminal state (zsh/bash/fish)
prompt-shell install-hook

# Verify it works with a one-shot enhance
prompt-shell enhance "fix the build error"

Run the daemon

prompt-shell start
# Press Ctrl+Alt+E, speak your command, prompt lands in clipboard

Platform Support

Requires Python 3.11+. Shells supported: bash, zsh, fish.

Platform	Support	Notes
macOS 12+	Full	Primary platform - all features available
Linux Desktop (X11)	Full	Requires `xclip` + `scrot`
Linux Desktop (Wayland)	Full	Requires `wl-copy` + `grim`
Linux Headless / SSH	Partial	No hotkey or clipboard; use `delivery.method: file`
WSL2	Full	Treated as Linux; clipboard works via interop
Windows (native)	Not supported	Use WSL2

Feature availability by platform

Feature	macOS	Linux Desktop	Headless/SSH	WSL2
Hotkey daemon	✓	✓	✗	✓
Screenshot capture	✓	✓	✗	✓
Voice + transcription	✓	✓	✓ ¹	✓
Apple Speech (local)	✓	✗	✗	✗
Clipboard delivery	✓	✓	✗	✓
File delivery	✓	✓	✓	✓
iTerm2 paste	✓	✗	✗	✗
Desktop notifications	✓	✓	✗	✓
Systemd service	✗	✓	✓	✓
Shell hooks (bash/zsh/fish)	✓	✓	✓	✓

¹ Requires an audio device accessible over SSH.

System dependencies

Platform	Required packages
macOS	`brew install portaudio` · `xcode-select --install`
Linux X11	`sudo apt install portaudio19-dev xclip scrot libnotify-bin`
Linux Wayland	`sudo apt install portaudio19-dev wl-clipboard grim libnotify-bin`
Linux Headless	`sudo apt install portaudio19-dev` (voice only)
iTerm2 integration	`pip install "prompt-shell[iterm2]"` (macOS only)

Why PromptShell

Without PromptShell, the typical AI debugging loop looks like this:

Send vague command ("fix the error") → AI guesses wrong → wasted round
Paste error manually, forget the file path → AI still misses → second wasted round
Re-send with full context → finally get a useful answer → 3rd round is the first productive one

Each round burns input + output tokens. PromptShell collapses 3 rounds into 1 by front-loading all the context the AI needs to answer correctly on the first attempt.

	Without PromptShell	With PromptShell
Rounds to resolution	3–5	1
Tokens per debug task	~3,000–5,000	~1,100
Token reduction	baseline	~60–75% fewer

The meta-prompt adds ~500 tokens of structured context - but eliminates the back-and-forth entirely. At $15/million tokens, a developer doing 30 AI interactions/day saves roughly $540/year in API credits while resolving issues faster.

Beyond API Credits

Zero context-switching - no manual copy-paste of errors, paths, or branch names
Voice-first - speak while looking at the terminal, hands stay on the keyboard
Multimodal ground truth - screenshot catches UI dialogs and editor state the text buffer misses
Privacy by default - Whisper transcription runs locally; voice never leaves the machine
Works with any AI - output is plain text, paste into ChatGPT, Copilot, Claude, Cursor, or any tool
Offline capable - falls back to Ollama when the cloud service is unreachable
No lock-in - enhances whatever AI assistant you already use

How It Works

Input: you say "fix the error"

PromptShell captures:

Screen buffer: error TS2345 in src/auth/middleware.ts:42
CWD: ~/project/backend
Last command: npm run build (exit code 1)
Git branch: feature/auth-refactor
Screenshot of the current terminal (PNG)

Pipeline: local client → Cloud Run → Gemini 2.5 Flash Lite (multimodal)

Output delivered to clipboard:

Fix the TypeScript compilation error TS2345 in src/auth/middleware.ts:42 -
Argument of type 'string' is not assignable to parameter of type 'AuthToken'.
The last command `npm run build` failed with exit code 1.
CWD: ~/project/backend, branch: feature/auth-refactor

Result: paste into any AI tool and get a precise answer on the first try.

Architecture

PromptShell Architecture

PromptShell runs as a multimodal AI pipeline - text, voice, and vision feed into a single agentic context capture step. The AI Orchestrator routes the request to Gemini Vision on a serverless Google Cloud backend, with graceful degradation to an offline AI model if the cloud is unreachable. Everything except the Gemini call runs on-device.

Full system design: docs/system_design.md.

Tech Stack

Component	Technology	Role
Multimodal AI	Gemini 2.5 Flash Lite	Prompt enhancement (text + vision)
AI SDK	Google GenAI SDK	Gemini API integration
Cloud Backend	Google Cloud Run	Serverless hosting - scales to zero
Speech-to-Text	Whisper AI (faster-whisper)	Local, private voice transcription
Local AI Fallback	Ollama (litellm)	Offline inference when cloud is unreachable
Terminal Backends	tmux · iTerm2 · shell hook · generic	Terminal state capture (4 backends)
Vision Capture	screencapture · grim · scrot	Terminal screenshot for Gemini Vision
CLI Framework	Typer + Rich	CLI interface and hotkey daemon
Language	Python 3.11+	Cross-platform (macOS + Linux)

Features

Terminal Context Capture

Four pluggable backends, auto-detected at startup in priority order:

Backend	Screen Buffer	CWD	Commands	Exit Code	Setup
tmux	Yes	Yes	Yes	Via hook	Be inside a tmux session
iterm2	Yes	Yes	Yes	Yes	`pip install .[iterm2]` (macOS only)
shell_hook	No	Yes	Yes	Yes	`prompt-shell install-hook`
generic	No	Yes	History	No	None (always available)

Polls every 2s in idle mode; captures immediately on hotkey trigger
Detects project type from manifest files (package.json, Cargo.toml, go.mod, pyproject.toml)
Reads git branch from the working directory
Captures a screenshot of the active display concurrently with voice recording

Error Detection

A regex engine scans the terminal output and extracts structured error info (type, code, file, line, message) for 12+ error families:

Build errors - TypeScript (TS*), ESLint, Rust (cargo), Go, Python
Runtime errors - Node.js stack traces, Python tracebacks, segfaults
Test failures - Jest, pytest, cargo test
Git conflicts - merge conflict markers
Permission errors - EACCES, sudo prompts
Network errors - HTTP 4xx/5xx, ECONNREFUSED

Voice Capture and Transcription

Records from the microphone with energy-based VAD; stops on 1s of silence
Three transcription backends (auto-fallback):
- faster-whisper (local, offline, private) - default
- OpenAI Whisper API (cloud, most accurate for jargon)
- Apple Speech Framework (macOS native, lowest latency)

Screenshot Context (Multimodal)

On each hotkey trigger, PromptShell captures a PNG screenshot of the current screen and attaches it to the Gemini request alongside the text context. This makes the call truly multimodal - Gemini Vision sees the exact terminal state, catching errors and UI details that the text buffer alone may miss.

macOS - uses the built-in screencapture (no install needed)
Linux/GNOME - uses gnome-screenshot (sudo apt install gnome-screenshot)
Linux/Wayland (wlroots) - uses grim (sudo apt install grim)
Linux/X11 - uses scrot (sudo apt install scrot)

Disable in config if not needed:

terminal:
  capture_screenshot: false

Prompt Enhancement (Gemini on Cloud Run)

Terminal context + voice transcript + screenshot sent to Cloud Run as a single request
Gemini 2.5 Flash Lite receives text + PNG screenshot as multimodal input via the Google GenAI SDK
Falls back to a local LLM or template if Cloud Run is unavailable

Delivery

Method	Description
`clipboard`	Copies to system clipboard - `pbcopy` (macOS), `xclip` / `wl-copy` (Linux). Default.
`iterm_paste`	Pastes directly into the active iTerm2 session (macOS only).
`file`	Writes the prompt to `~/.prompt-shell/last_prompt.txt`. Useful for SSH/headless environments.

File delivery - read or pipe the output from another terminal:

cat ~/.prompt-shell/last_prompt.txt
cat ~/.prompt-shell/last_prompt.txt | aichat

Global Hotkeys

Hotkey	Action
`Ctrl+Alt+E`	Voice capture → enhance → deliver to clipboard
`Ctrl+Alt+L`	Enhance last clipboard text with terminal context (no voice)
`Ctrl+Alt+R`	Re-enhance last prompt with updated terminal context
`Esc`	Cancel ongoing voice capture

Customize hotkeys in ~/.prompt-shell/config.yaml under the hotkeys: key. Avoid Ctrl+Shift+E - reserved by GNOME/GTK for Unicode input.

CLI Reference

prompt-shell start             # Start the daemon with global hotkeys
prompt-shell enhance "..."     # One-shot: enhance a text prompt
prompt-shell context           # Print current terminal context (debug)
prompt-shell install-hook      # Install shell hook (zsh/bash/fish)
prompt-shell install-service   # Install as a systemd user service (Linux)
prompt-shell init              # Generate default config

Running as a Service (Linux)

Install PromptShell as a systemd user service so it starts automatically on login:

prompt-shell install-service
echo "GEMINI_API_KEY=your_key_here" >> ~/.prompt-shell/env

systemctl --user status prompt-shell    # Check status
systemctl --user restart prompt-shell   # Restart
systemctl --user stop prompt-shell      # Stop
systemctl --user disable prompt-shell   # Disable autostart
journalctl --user -u prompt-shell -f    # Follow logs

Security

Screen buffer is held in memory only - never persisted to disk
All transcription is local by default (faster-whisper); cloud APIs are opt-in
GEMINI_API_KEY is read from the environment variable only; never stored in config files
Environment variable substitution (${VAR}) is supported in config
Configurable redaction patterns for secrets in terminal output

Configuration

All settings live in ~/.prompt-shell/config.yaml. Generate the default:

prompt-shell init

terminal:
  backend: auto            # auto | tmux | iterm2 | shell_hook | generic
  capture_screenshot: true

voice:
  engine: whisper_local    # whisper_local | whisper_api | apple_speech
  whisper_model: base.en

llm:
  provider: gemini
  model: gemini-2.5-flash-lite
  api_key: ${GEMINI_API_KEY}
  cloud_run_url: ${CLOUD_RUN_URL}

delivery:
  method: clipboard        # clipboard | iterm_paste | file
  show_notification: true

See config.example.yaml for all available options with annotations.

Google Cloud Deployment

The enhancement service runs on Cloud Run with Gemini 2.5 Flash Lite via the Google GenAI SDK. It scales to zero when idle - cost is effectively $0/month at personal/demo scale.

Automatic deployment (recommended)

Pushing a version tag triggers the full release pipeline:

# 1. Bump version in pyproject.toml, then commit
git commit -m "chore: bump version to 0.2.0"

# 2. Tag and push - pipeline handles the rest
git tag v0.2.0
git push origin v0.2.0

Pipeline: validate → test → build → publish-pypi → deploy-cloud-run / github-release / docker

Cloud Run deploys after PyPI publish, keeping the service and package version in sync. Each image is tagged with the version (e.g. v0.2.0) for easy rollback.

Required GitHub secrets

Add these to Settings → Secrets and variables → Actions before your first release:

Secret	Value
`GCP_PROJECT_ID`	Your GCP project ID
`GCP_SA_KEY`	Service account JSON key, base64-encoded
`GEMINI_API_KEY`	Google AI Studio API key

See docs/deployment.md for step-by-step GCP setup.

First-time / local deploy

export PROJECT_ID=your-gcp-project-id
export GEMINI_API_KEY=your-gemini-api-key
bash deploy.sh

Manual re-deploy (without a new release)

To redeploy without bumping the version - e.g. after a secret rotation:

GitHub → Actions → Deploy to Cloud Run (Manual) → Run workflow

Verify the deployment

curl $CLOUD_RUN_URL/health
# {"status":"ok"}

curl -X POST $CLOUD_RUN_URL/enhance \
  -H "Content-Type: application/json" \
  -d '{"voice_transcript": "fix the build error", "cwd": "/app"}'

Rollback

gcloud run deploy prompt-shell-enhancer \
  --image gcr.io/$PROJECT_ID/prompt-shell-enhancer:v0.1.0 \
  --platform managed --region us-central1

Cost

~$0/month at demo/personal scale. Cloud Run scales to zero between requests. Gemini 2.5 Flash Lite free tier covers 1,500 requests/day. See docs/deployment.md for a full cost breakdown.

Development

git clone https://github.com/VinnyBabuManjaly/PromptShell.git
cd PromptShell
uv sync --extra dev

# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/

# Run tests
uv run pytest tests/ -v --tb=short

See docs/development-guide.md for the full contribution guide, branch conventions, and cross-platform notes.

Troubleshooting

Symptom	Fix
No audio captured	Check microphone permissions (macOS: System Settings → Privacy → Microphone)
`portaudio` not found	`sudo apt install portaudio19-dev` (Linux)
Hotkey not triggering	Run `prompt-shell start` with `sudo` on Linux (required for global keyboard events)
Cloud Run returns 404	Model name mismatch - verify `gemini-2.5-flash-lite` in your config and redeploy
`shell_hook` not working	Run `prompt-shell install-hook` then restart your shell
Clipboard empty after enhance	Check `delivery.method: clipboard` in config; try `delivery.method: file` as a fallback

Documentation

Document	Description
docs/development-guide.md	Development guide - prerequisites, build/test/lint, branch conventions, contribution workflow
docs/system_design.md	Full system design - Mermaid diagrams for architecture, pipeline sequence, data models, CI/CD
docs/architecture.md	ASCII architecture diagrams, module layout, data flow, error handling strategy, performance budget
docs/deployment.md	End-to-end deployment guide - GCP setup, GitHub secrets, release pipeline, rollback, cost breakdown
docs/spec.md	Technical specification - problem statement, requirements, API design, data models
docs/article.md	Published article - "I Built a Voice-Activated Prompt Enhancer That Reads Your Terminal"
docs/future_scope.md	Future scope - 10 planned features, 12 scalability and architectural improvements
config.example.yaml	Annotated configuration template with all available options

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vinnybabu

Release history Release notifications | RSS feed

This version

0.1.1

Mar 16, 2026

0.1.0

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_shell_cli-0.1.1.tar.gz (2.0 MB view details)

Uploaded Mar 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

prompt_shell_cli-0.1.1-py3-none-any.whl (45.9 kB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file prompt_shell_cli-0.1.1.tar.gz.

File metadata

Download URL: prompt_shell_cli-0.1.1.tar.gz
Upload date: Mar 16, 2026
Size: 2.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prompt_shell_cli-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`734d43ec5fef36726b0234dc14ef5309fe95e3c2584068d21b6a9712ae5070f7`
MD5	`ce079255d99433d7bdbfdee172e1230d`
BLAKE2b-256	`7056f1d51d6db7c4290b61c99fbc9d83f87146e0c75149febe6613a8d49c7c52`

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_shell_cli-0.1.1.tar.gz:

Publisher: release.yml on VinnyBabuManjaly/PromptShell

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: prompt_shell_cli-0.1.1.tar.gz
- Subject digest: 734d43ec5fef36726b0234dc14ef5309fe95e3c2584068d21b6a9712ae5070f7
- Sigstore transparency entry: 1111734934
- Sigstore integration time: Mar 16, 2026
Source repository:
- Permalink: VinnyBabuManjaly/PromptShell@f09c7d7c1f2b0d4691361a1c3519030ce9f875ca
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/VinnyBabuManjaly
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f09c7d7c1f2b0d4691361a1c3519030ce9f875ca
- Trigger Event: push

File details

Details for the file prompt_shell_cli-0.1.1-py3-none-any.whl.

File metadata

Download URL: prompt_shell_cli-0.1.1-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 45.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prompt_shell_cli-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`174e7971805a63a592989a3e87c66024981bb4b0369574cdd57dcb8b9dd4277e`
MD5	`eeec734735bed22abb067e9cc405f38c`
BLAKE2b-256	`a4f7865cffcdc385367109906bca05fd130269ba794903abe7236476e4c5a7e8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_shell_cli-0.1.1-py3-none-any.whl:

Publisher: release.yml on VinnyBabuManjaly/PromptShell

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: prompt_shell_cli-0.1.1-py3-none-any.whl
- Subject digest: 174e7971805a63a592989a3e87c66024981bb4b0369574cdd57dcb8b9dd4277e
- Sigstore transparency entry: 1111734980
- Sigstore integration time: Mar 16, 2026
Source repository:
- Permalink: VinnyBabuManjaly/PromptShell@f09c7d7c1f2b0d4691361a1c3519030ce9f875ca
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/VinnyBabuManjaly
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f09c7d7c1f2b0d4691361a1c3519030ce9f875ca
- Trigger Event: push

prompt-shell-cli 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

PromptShell

Table of Contents

Quick Start

Prerequisites

Install

Set up

Run the daemon

Platform Support

Feature availability by platform

System dependencies

Why PromptShell

Beyond API Credits

How It Works

Architecture

Tech Stack

Features

Terminal Context Capture

Error Detection

Voice Capture and Transcription

Screenshot Context (Multimodal)

Prompt Enhancement (Gemini on Cloud Run)

Delivery

Global Hotkeys

CLI Reference

Running as a Service (Linux)

Security

Configuration

Google Cloud Deployment

Automatic deployment (recommended)

Required GitHub secrets

First-time / local deploy

Manual re-deploy (without a new release)

Verify the deployment

Rollback

Cost

Development

Troubleshooting

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance