VoxKage - OS-level Agentic AI Assistant. Autonomous, Persistent, Aware.
Project description
VoxKage
The Living Agentic OS Framework
Utilizing the Gemini CLI interface to power an untethered, system-wide AI brain.
VoxKage is a massive evolution beyond standard coding assistants. It is a living Agentic OS Framework designed to break the AI out of its IDE prison. By hijacking the Gemini CLI to use as its conversational frontend, VoxKage deploys a complex "honeycomb" of intertwined MCP capabilities to gain real-time, autonomous, and untethered access to the whole internet, your file system, and your operating system.
[View Architecture] โข [Explore Capabilities] โข [Get Started] โข [Update / Upgrade]
๐ง The Innovation:
The Imprisonment Limitation:-
Modern AI CLIs (like Claude Code, Cursor, or the base Gemini CLI) are incredibly powerful text generators, but they suffer from a fundamental limitation: Imprisonment. They are strictly confined to the directory they are launched in.
If you ask a normal CLI assistant to "Diagnose why my locally hosted web app isn't rendering properly, cross-reference the frontend CSS with the network tab, download the correct backend dependency from the official site, and install it", they fail. They don't have eyes, they can't orchestrate multi-domain research, and they can't interact with your operating system on a holistic level.
The Challenge: How do we transform a static text-generation tool into a proactive, self-healing, system-wide orchestrator without compromising security or relying on paid API credits?
The "VoxKage" OS Evolution:-
VoxKage solves this by treating the official Gemini CLI merely as a "mouthpiece" for its own highly complex brain. VoxKage is not a wrapperโit is an independent entity that mounts 18 specialized Model Context Protocol (MCP) servers into the runtime state, creating an interwoven web of tools.
VoxKage doesn't just "plug and play" a web search tool. It utilizes its honeycomb architecture to combine tools autonomously: it spins up a Playwright browser, takes a screenshot of a broken webpage, extracts the DOM computed styles, uses semantic web search to find a solution, writes a step-by-step repair plan, and executes it via the native OS shellโall in one fluid, self-correcting thought loop.
The Architectural Breakdown:-
graph TD
subgraph "VoxKage Agentic Brain (Honeycomb Architecture)"
A((VoxKage Core Directive)) --> B{Agentic Reasoning Loop}
subgraph "Self-Healing & Orchestration"
B <--> C(ACE: Dynamic Planning)
B <--> D(DOM Verification & GUI Thinking)
B <--> E(Autonomous Research & Fallback)
end
end
subgraph "The Interconnected MCP Web"
C <--> |AST Skeletons| F[Codebase Index & RAG]
D <--> |Screenshot/Compute CSS| G[Playwright DOM Engine]
E <--> |Cross-Reference| H[Web Search & Download Automation]
C <--> |Native Commands| I[OS Shell & FileOps]
end
subgraph "External Control Layers"
B <--> J[Telegram Remote Bridge]
B <--> K[API Plugins: Gmail/Spotify/GitHub]
end
subgraph "Interface Hook"
L[Official Gemini CLI] --> |Frontend IO| A
end
style A fill:#0ea5e9,stroke:#fff,stroke-width:2px,color:#fff
style B fill:#8b5cf6,stroke:#fff,stroke-width:2px,color:#fff
style C fill:#10b981,stroke:#fff,stroke-width:2px,color:#fff
style D fill:#f59e0b,stroke:#fff,stroke-width:2px,color:#fff
VoxKage operates using a deeply interconnected web of capabilities. Here is how the brain actually works:
โ๏ธ 1. ACE Coding Engine & Autonomous Self-Correction
VoxKage forces a strict 5-phase developer pipeline (The Agentic Coding Engine). It does not guess.
- RAG Awareness: Indexes the codebase into a vector store before typing.
- Planning: Generates a persistent
active_plan.mdstep-by-step checklist. - AST Skeletons: Extracts 40-line structural metadata from 2000-line files, creating 95% token efficiency.
- Self-Healing Verification: Runs compilation or DOM checks after editing. If a step fails, VoxKage automatically flags it as "failed", researches the error, fixes it, and updates the plan.
๐ 2. GUI Thinking & Deep Web Automation
VoxKage uses the entire internet as its playground. It spins up an invisible Playwright browser to:
- Take visual screenshots and perform OCR verification.
- Extract
computed CSSto debug animations and layouts. - Automatically navigate official software pages, find the correct
.exefor your OS, verify it, and execute the installation.
๐ 3. The Omnipresent Bridges
You can walk away from your PC and text your VoxKage Telegram bot. Ask it: "Hey, my CI/CD pipeline failed on GitHub. Find the error log, write a patch locally on my PC, test it, and push the fix." VoxKage coordinates the Telegram API, GitHub API, local Git shell, and ACE engine to do it while you're grabbing coffee.
โจ Core Capabilities & Engineering Specs:-
๐ The VoxKage Advantage vs Industry Standards
| Metric | Standard AI IDEs (Cursor/Cline) | VoxKage Framework |
|---|---|---|
| Execution Scope | Imprisoned (Single Project) | |
| Token Efficiency | Reads full files (High Burn Rate) | |
| Operating Cost (OPEX) | $20/mo + API Usage Costs | |
| Model Amplification | Depends on strictly paid models | |
| Web & GUI Logic | Text Scraping / No Visuals | |
| Remote Access | Requires physical PC access | |
| Installation | Complex repo cloning + setup |
[!TIP] Model Amplification: Because VoxKage enforces structured "Agent Thinking Loops" and reduces context payloads using AST Skeletons, it allows free-tier models (like
gemini-3-flash-previeworgemini-2.5-flash-lite) to execute tasks with the accuracy and reliability typically reserved for heavy, expensive Pro models.
๐ ๏ธ Getting Started: Install in 60 Seconds
VoxKage is a globally installable Python package. No cloning, no virtual environments, no setup scripts.
Prerequisites
Before installing VoxKage, ensure the following are on your system:
| Requirement | Minimum Version | Check command |
|---|---|---|
| Python | 3.10+ | python --version |
| pipx | any | pipx --version |
| Gemini CLI | any | gemini --version |
| Node.js | 18+ | node --version |
Install pipx if you don't have it:
pip install pipx
pipx ensurepath
Install Gemini CLI (the AI frontend VoxKage hijacks):
npm install -g @google/gemini-cli
gemini # Run once to authenticate with your Google account
Step 1: Install VoxKage
pipx install voxkage
That's it. VoxKage is now globally available as the voxkage command from any directory on your machine. The core install is around ~80 MB and takes under a minute on a decent connection.
Step 2: Run the Setup Wizard
voxkage init
The wizard will:
- Create your
~/.voxkagedata directory (stores memory, credentials, config) - Scaffold your
.envsecrets file for Telegram, Spotify, GitHub, Gmail - Inject the VoxKage personality directives into your Gemini CLI settings
- Register all 18 MCP servers into Gemini CLI's
settings.json - Prompt you to install optional capability packs
Expected output:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โฆ VoxKage v1.1.0 โ First-Time Setup โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ VoxKage supercharges your Gemini CLI into a living OS AI. โ
โ This takes about 2 minutes. โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Platform: Windows
โ Data directory: C:\Users\YourName\.voxkage
โ MCP servers registered: 18
โ Gemini CLI settings patched
Step 3: Install Capability Packs (Optional but Recommended)
The core VoxKage is immediately powerful. Heavy ML packs are opt-in to keep the base install fast. Install them anytime using:
voxkage install <pack>
| Pack | What it unlocks | Size |
|---|---|---|
browser |
Playwright web automation, DOM inspection, screenshot analysis, PDF reading | ~80 MB pkg + ~150 MB Chromium |
rag |
ChromaDB semantic memory, full codebase indexing, document RAG | ~500 MB |
vision |
OpenCV + RapidOCR for screen reading and image analysis | ~250 MB |
docs_plus |
Word/PDF/Excel format conversion and document intelligence | ~80 MB |
full |
Everything above in one command | ~910 MB |
Install the browser engine (highly recommended โ powers web search and automation):
voxkage install browser
Install everything at once:
voxkage install full
Step 4: Configure Your Integrations
Edit your secrets file to connect VoxKage to external services:
# Open the secrets file (Windows)
notepad C:\Users\YourName\.voxkage\.env
# macOS / Linux
nano ~/.voxkage/.env
# โโ Telegram Remote Control โโโโโโโโโโโโโโโโโโโโโโโโโโ
TELEGRAM_BOT_TOKEN=your_token_from_@BotFather
TELEGRAM_CHAT_ID=your_personal_chat_id
# โโ Spotify Music Control โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
SPOTIFY_CLIENT_ID=your_client_id
SPOTIFY_CLIENT_SECRET=your_client_secret
SPOTIFY_REDIRECT_URI=http://localhost:8888/callback
# โโ GitHub Integration โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
GITHUB_PAT=your_personal_access_token
# โโ Gmail (uses OAuth โ run voxkage plugins add gmail)
# No token needed here โ handled by OAuth flow
Check your connection status at any time:
voxkage status
SYSTEM HEALTH
โ VoxKage Core v1.1.0
โ MCP Servers 18 registered
CAPABILITY PACKS
โ Core AI + OS Control (always on)
โ RAG Memory installed
โ Vision & OCR installed
โ Browser Engine voxkage install browser
โ PDF Conversion installed
INTEGRATIONS
โ Telegram Connected
โ Spotify Add SPOTIFY_CLIENT_ID to .env
โ GitHub Connected
โ Gmail Connected
Step 5: Wake Up VoxKage
voxkage
You are now inside a fully agentic OS session. VoxKage is running with all 18 MCP tools mounted and ready.
Step 6 (Optional): System Tray + Telegram Remote Mode
Launch the persistent background daemon that puts VoxKage in your system tray and starts listening for Telegram messages:
voxkage tray
From this point, you can close the terminal. VoxKage is alive in the background. Text it from your phone via Telegram to command your PC remotely from anywhere in the world.
Directory Structure
After initialization, VoxKage creates this layout:
C:\Users\YourName\.voxkage\ # Core data directory
โโโ .gemini\
โ โโโ GEMINI.md # VoxKage personality & tool awareness directives
โ โโโ settings.json # All 18 MCP server registrations
โโโ data\ # Credentials, Gmail OAuth tokens
โโโ rag\ # ChromaDB vector store (if RAG installed)
โโโ logs\ # Session traces and health logs
โโโ .env # Your integration secrets
โโโ config.json # Model selection and agentic loop config
๐ Updating & Upgrading VoxKage
Standard Upgrade (Recommended)
To update VoxKage to the latest release from PyPI:
pipx upgrade voxkage
Check your installed version vs the latest:
voxkage --version
pip index versions voxkage # Lists all available versions
If pipx upgrade Fails or Gets Stuck
This can happen if a previous VoxKage process (tray, watcher) is still running and has locked the Python executable. Follow this sequence:
Step 1 โ Kill any running VoxKage processes:
# Windows PowerShell
Get-Process -Name "pythonw","python" -ErrorAction SilentlyContinue | `
Where-Object { $_.Path -like "*pipx*voxkage*" } | `
Stop-Process -Force
Start-Sleep -Seconds 2
Step 2 โ Force reinstall the latest version:
pipx install voxkage --force
Step 3 โ If permission errors still appear (e.g., [Errno 13] Permission denied):
# Remove the broken venv and reinstall cleanly
pipx uninstall voxkage
pipx install voxkage
Step 4 โ Verify the upgrade worked:
voxkage --version
Pinning to a Specific Version
If you need to test or rollback to a specific version:
pipx install voxkage==1.1.0 --force
Upgrading Optional Packs After a VoxKage Upgrade
Optional capability packs (RAG, Vision, Browser) are injected into VoxKage's isolated pipx venv. After upgrading VoxKage itself, re-inject them if any are missing:
# Re-inject individual packs (using exact packages from pyproject.toml)
pipx inject voxkage playwright PyMuPDF # browser
pipx inject voxkage chromadb sentence-transformers numpy pyarrow # rag
pipx inject voxkage opencv-python rapidocr-onnxruntime # vision
pipx inject voxkage docx2pdf pdf2docx # docs_plus
# After injecting the browser pack, also install the Chromium binary:
pipx run --spec voxkage playwright install chromium
# Or simply:
voxkage install browser # the CLI handles the playwright install chromium step automatically
# Or install all packs in one shot via the VoxKage CLI
voxkage install full
Completely Uninstalling VoxKage
# Remove the package
pipx uninstall voxkage
# Optionally remove all stored data, memory, and configs
# Windows:
Remove-Item -Recurse -Force "$env:USERPROFILE\.voxkage"
# macOS / Linux:
rm -rf ~/.voxkage
๐ Command Reference
| Command | Description |
|---|---|
voxkage |
Start a VoxKage agentic session |
voxkage init |
Run the first-time setup wizard (safe to re-run) |
voxkage status |
Check system health, pack status, and integration connections |
voxkage tray |
Launch the background system tray daemon + Telegram watcher |
voxkage install <pack> |
Install an optional capability pack (rag, browser, vision, docs_plus, full) |
voxkage plugins |
List all registered plugins and their connection state |
voxkage plugins add <name> |
Configure a plugin interactively (telegram, spotify, github, gmail) |
voxkage --version |
Print the installed version |
voxkage --help |
Show all available commands |
๐บ๏ธ Roadmap & Future Evolutions
- Shipped:
pipx install voxkageโ single-command global installation - Shipped: Native tkinter Settings Dashboard (zero extra deps, instant-open from tray)
- Shipped: Core-First lean install (~80 MB) with optional heavy packs
- Shipped: Telegram Remote Control โ command your OS from your phone
- Shipped:
voxkage initintelligence โ detects already-installed packs, skips redundant prompts - In Progress: Finalizing the
[project.entry-points."voxkage.plugins"]API to allow the community to publish custom plugins (e.g., Jira, AWS, Docker orchestrators) via PyPI that VoxKage automatically detects and mounts into its honeycomb. - Planned: macOS and Linux System Tray parity.
- Planned: VoxKage Cloud Sync โ encrypted cross-device memory persistence.
๐ค Contributing
VoxKage is an open-source initiative designed to push the boundaries of local AI orchestration. If you want to contribute a new MCP server or refine the ACE logic:
- Fork the repository.
- Create your feature branch (
git checkout -b feature/AdvancedRAG). - Commit your changes (
git commit -m 'Implement advanced semantic search'). - Push to the branch (
git push origin feature/AdvancedRAG). - Open a Pull Request.
โ VoxKage
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voxkage-1.1.4-py3-none-any.whl.
File metadata
- Download URL: voxkage-1.1.4-py3-none-any.whl
- Upload date:
- Size: 357.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c3b36740fb6f1bb8769bda2aa0fcd96264b48b173d0e058a270b32562ff2f62
|
|
| MD5 |
3d3e2e80b17ff7af8afa4ff9ed466770
|
|
| BLAKE2b-256 |
ce8b7fc9d400a43d75b1c320a67c4c052b094679749deab4924ebd1d10a21f4e
|