VoxKage — OS-level Agentic AI Assistant. Autonomous · Persistent · Aware.
Project description
VoxKage
The Living Agentic OS Framework
Utilizing the Gemini CLI interface to power an untethered, system-wide AI brain.
VoxKage is a massive evolution beyond standard coding assistants. It is a living Agentic OS Framework designed to break the AI out of its IDE prison. By hijacking the Gemini CLI to use as its conversational frontend, VoxKage deploys a complex "honeycomb" of intertwined MCP capabilities to gain real-time, autonomous, and untethered access to the whole internet, your file system, and your operating system.
[Explore Capabilities] • [View Architecture] • [Get Started]
🧠 The Innovation:
The Imprisonment Limitation:-
Modern AI CLIs (like Claude Code, Cursor, or the base Gemini CLI) are incredibly powerful text generators, but they suffer from a fundamental limitation: Imprisonment. They are strictly confined to the directory they are launched in.
If you ask a normal CLI assistant to "Diagnose why my locally hosted web app isn't rendering properly, cross-reference the frontend CSS with the network tab, download the correct backend dependency from the official site, and install it", they fail. They don't have eyes, they can't orchestrate multi-domain research, and they can't interact with your operating system on a holistic level.
The Challenge: How do we transform a static text-generation tool into a proactive, self-healing, system-wide orchestrator without compromising security or relying on paid API credits?
The "VoxKage" OS Evolution:-
VoxKage solves this by treating the official Gemini CLI merely as a "mouthpiece" for its own highly complex brain. VoxKage is not a wrapper—it is an independent entity that mounts 18 specialized Model Context Protocol (MCP) servers into the runtime state, creating an interwoven web of tools.
VoxKage doesn't just "plug and play" a web search tool. It utilizes its honeycomb architecture to combine tools autonomously: it spins up a Playwright browser, takes a screenshot of a broken webpage, extracts the DOM computed styles, uses semantic web search to find a solution, writes a step-by-step repair plan, and executes it via the native OS shell—all in one fluid, self-correcting thought loop.
The Architectural Breakdown:-
graph TD
subgraph "VoxKage Agentic Brain (Honeycomb Architecture)"
A((VoxKage Core Directive)) --> B{Agentic Reasoning Loop}
subgraph "Self-Healing & Orchestration"
B <--> C(ACE: Dynamic Planning)
B <--> D(DOM Verification & GUI Thinking)
B <--> E(Autonomous Research & Fallback)
end
end
subgraph "The Interconnected MCP Web"
C <--> |AST Skeletons| F[Codebase Index & RAG]
D <--> |Screenshot/Compute CSS| G[Playwright DOM Engine]
E <--> |Cross-Reference| H[Web Search & Download Automation]
C <--> |Native Commands| I[OS Shell & FileOps]
end
subgraph "External Control Layers"
B <--> J[Telegram Remote Bridge]
B <--> K[API Plugins: Gmail/Spotify/GitHub]
end
subgraph "Interface Hook"
L[Official Gemini CLI] --> |Frontend IO| A
end
style A fill:#0ea5e9,stroke:#fff,stroke-width:2px,color:#fff
style B fill:#8b5cf6,stroke:#fff,stroke-width:2px,color:#fff
style C fill:#10b981,stroke:#fff,stroke-width:2px,color:#fff
style D fill:#f59e0b,stroke:#fff,stroke-width:2px,color:#fff
VoxKage operates using a deeply interconnected web of capabilities. Here is how the brain actually works:
⚙️ 1. ACE Coding Engine & Autonomous Self-Correction
VoxKage forces a strict 5-phase developer pipeline (The Agentic Coding Engine). It does not guess.
- RAG Awareness: Indexes the codebase into a vector store before typing.
- Planning: Generates a persistent
active_plan.mdstep-by-step checklist. - AST Skeletons: Extracts 40-line structural metadata from 2000-line files, creating 95% token efficiency.
- Self-Healing Verification: Runs compilation or DOM checks after editing. If a step fails, VoxKage automatically flags it as "failed", researches the error, fixes it, and updates the plan.
🌐 2. GUI Thinking & Deep Web Automation
VoxKage uses the entire internet as its playground. It spins up an invisible Playwright browser to:
- Take visual screenshots and perform OCR verification.
- Extract
computed CSSto debug animations and layouts. - Automatically navigate official software pages, find the correct
.exeor.dmgfor your specific OS, verify it, and execute the installation.
🌉 3. The Omnipresent Bridges
You can walk away from your PC and text your VoxKage Telegram bot. Ask it: "Hey, my CI/CD pipeline failed on GitHub. Find the error log, write a patch locally on my PC, test it, and push the fix." VoxKage coordinates the Telegram API, GitHub API, local Git shell, and ACE engine to do it while you're grabbing coffee.
✨ Core Capabilities & Engineering Specs:-
📈 The VoxKage Advantage vs Industry Standards
| Metric | Standard AI IDEs (Cursor/Cline) | VoxKage Framework |
|---|---|---|
| Execution Scope | Imprisoned (Single Project) | |
| Token Efficiency | Reads full files (High Burn Rate) | |
| Operating Cost (OPEX) | $20/mo + API Usage Costs | |
| Model Amplification | Depends on strictly paid models | |
| Web & GUI Logic | Text Scraping / No Visuals |
[!TIP] Model Amplification: Because VoxKage enforces structured "Agent Thinking Loops" and reduces context payloads using AST Skeletons, it allows free-tier models (like
gemini-3-flashorflash-lite) to execute tasks with the accuracy and reliability typically reserved for heavy, expensive Pro models.
🛠️ Getting Started: Initialize Your Assistant
Note: VoxKage is currently preparing for its
pipxstandalone release. For now, it is installed directly from source.
1. Directory Structure Overview
VoxKage organizes its brains and user settings into a clean architectural layout:
C:\Users\YourName\.voxkage\ # The core configuration isolated from the host
├── .gemini/
│ ├── GEMINI.md # The injected "VoxKage Personality" directives
│ └── settings.json # Dynamic theme & MCP connection registry
├── data/ # Persistent RAG embeddings and memory snippets
├── logs/ # System health and execution traces
├── .env # Encrypted plugin credentials
└── config.json # Agentic loop constraints and model selection
2. Environment Preparation
Ensure you have Python 3.10+, Node.js (for the Gemini CLI backend), and Git installed.
# Clone the repository
git clone https://github.com/ayushdwivedi001/VoxKage.git
cd VoxKage
# Initialize the virtual environment
python -m venv venv
# Windows:
.\venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate
# Install dependencies and link the package
pip install -r requirements.txt
pip install -e .
3. The Setup Wizard
Run the automated configuration wizard. This establishes your ~/.voxkage environment, scaffolds your .env secrets file, and links the system correctly.
voxkage init
Expected Terminal Output:
┌────────────────────────────────────────────────────────────┐
│ ✦ VoxKage vX.X.X — First-Time Setup │
│ ──────────────────────────────────────────────────────── │
│ VoxKage supercharges your Gemini CLI into a living OS AI. │
│ This takes about 2 minutes. │
└────────────────────────────────────────────────────────────┘
✓ Platform: Windows
✓ Data directory: C:\Users\YourName\.voxkage
...
4. Command Reference & CLI Visuals
Once initialized, transforming your terminal into an agentic OS is a single command away:
voxkage
System Management Commands:
To check system health, memory persistence, and plugin connection status:
voxkage status
Output preview:
SYSTEM HEALTH
✓ VoxKage Core v1.0.0-rc
✓ Agent Memory 12 MB active
INTEGRATIONS
✓ Telegram Connected
✓ GitHub Connected
✗ Spotify voxkage plugins add spotify
To install heavy capability payloads modularly (like Playwright browsers or Document OCR):
voxkage install <pack>
# Available packs: rag, browser, vision, docs_plus, full
To list available native integrations or configure them interactively:
voxkage plugins
voxkage plugins add <name>
(Windows Only) Launch the background system tray listener for persistent hotkey access:
voxkage tray
🗺️ Roadmap & Future Evolutions
- In Progress: Transitioning to a globally accessible
pipx install voxkagearchitecture. - In Progress: Finalizing the
[project.entry-points."voxkage.plugins"]API to allow the open-source community to publish custom plugins (e.g., Jira, AWS, Docker orchestrators) via PyPI that VoxKage automatically detects and mounts into its honeycomb. - In Progress: GUI automation parity for macOS and Linux System Tray modules.
🤝 Contributing
VoxKage is an open-source initiative designed to push the boundaries of local AI orchestration. If you want to contribute a new MCP server or refine the ACE logic:
- Fork the repository.
- Create your feature branch (
git checkout -b feature/AdvancedRAG). - Commit your changes (
git commit -m 'Implement advanced semantic search'). - Push to the branch (
git push origin feature/AdvancedRAG). - Open a Pull Request.
— VoxKage
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voxkage-1.0.2-py3-none-any.whl.
File metadata
- Download URL: voxkage-1.0.2-py3-none-any.whl
- Upload date:
- Size: 347.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7dc367873b342ccc622867259e9d5867703c250fec0b93d5edd0afa08e5469c8
|
|
| MD5 |
69d6c2703212967d5cc016ade5842da2
|
|
| BLAKE2b-256 |
f244c3e142ea6f3293bbe0d3b059ed96623dc6faea947511ec8a8f3120ef7a00
|