Hexa - Voice Controlled AI Desktop Assistant
Project description
Hardik Agent - Voice Controlled AI Desktop Assistant
Overview
Hardik Agent is a production-grade voice-controlled AI desktop assistant that allows users to control their entire operating system using natural language voice commands. No typing required - just speak and your computer responds.
Features
- Voice controlled OS automation
- Natural language understanding using Ollama + Llama3
- Open any application by voice
- Open any website by voice
- Send WhatsApp messages by voice
- Search Google and YouTube by voice
- System monitoring (battery, CPU, RAM, disk)
- Screenshot capture by voice
- Volume control by voice
- File and folder management by voice
- Window management by voice
- Secure user authentication
- Works 100% offline after setup
- Cross platform - Windows, Linux, MacOS
Tech Stack
| Component | Technology |
|---|---|
| Language | Python 3.12 |
| Voice STT | OpenAI Whisper |
| AI Brain | Ollama + Llama3.2 |
| Agent Framework | LangChain + LangGraph |
| TTS | pyttsx3 |
| GUI Automation | pyautogui |
| CLI | Typer |
| Auth | bcrypt |
| Memory | ChromaDB |
Architecture
Voice Input ↓ Whisper STT ↓ LangGraph Runtime ↓ Ollama AI Brain ↓ LangChain Tools ↓ OS Automation ↓ Voice Response
Installation
Requirements
- Python 3.12+
- Windows 10/11 (Linux and MacOS supported)
- Microphone
- 8GB RAM minimum
- 10GB free disk space
- Internet connection for initial setup only
Install
ash pip install hardik-agent
Setup
ash hardik-agent setup
This automatically:
- Detects your operating system
- Installs FFmpeg
- Installs Ollama
- Downloads Llama3 AI model
- Creates your secure account
Start
ash hardik-agent start
Usage
Voice Commands
Open Applications
"open chrome" "open spotify" "open whatsapp" "open telegram" "open vs code" "open notepad" "open calculator" "open file manager" "open task manager""open chrome" "open spotify" "open whatsapp" "open telegram" "open vs code" "open notepad" "open calculator" "open file manager" "open task manager"
Open Websites
"open youtube" "open netflix" "open chatgpt" "open instagram" "open gmail" "open whatsapp web"
Search
"search python tutorials on google" "search lofi music on youtube" "search weather in Mumbai"
Send WhatsApp Message
"send whatsapp message to John hello how are you" "message mom I am coming home"
System Information
"what time is it" "check my battery" "check CPU usage" "check RAM usage" "check disk space" "what is my IP address"
System Control
"take a screenshot" "increase volume" "decrease volume" "mute volume" "minimize window" "maximize window" "close window" "show desktop" "lock screen" "shutdown computer"
File Management
"create a folder called projects" "create new folder"
Project Structure
hardik-agent/ │ ├── main.py ├── requirements.txt ├── .env │ ├── cli/ │ ├── commands.py │ └── setup_manager.py │ ├── config/ │ └── settings.py │ ├── auth/ │ ├── signup.py │ ├── login.py │ ├── password_manager.py │ └── session.py │ ├── voice/ │ ├── microphone.py │ ├── speech_to_text.py │ └── text_to_speech.py │ ├── agent/ │ ├── brain.py │ ├── langchain_brain.py │ └── langgraph_brain.py │ ├── tools/ │ ├── app_tools.py │ └── tool_registry.py │ ├── automation/ │ ├── gui_automation.py │ ├── browser_automation.py │ ├── system_automation.py │ └── whatsapp.py │ ├── runtime/ │ ├── event_loop.py │ ├── startup.py │ └── tray.py │ ├── memory/ │ └── memory_manager.py │ ├── security/ │ ├── permissions.py │ └── safe_execution.py │ └── logs/ └── agent.log
Development Phases
- Phase 1 - CLI Foundation
- Phase 2 - Authentication System
- Phase 3 - Voice Pipeline
- Phase 4 - Tool Execution
- Phase 5 - Voice Controls OS
- Phase 6 - Ollama AI Brain
- Phase 7 - Continuous Runtime
- Phase 8 - GUI Automation
- Phase 9 - LangChain + LangGraph
- Phase 10 - Memory System (coming soon)
- Phase 11 - Wake Word (coming soon)
- Phase 12 - Background Service (coming soon)
- Phase 13 - GUI Dashboard (coming soon)
- Phase 14 - PyPI Package (coming soon)
How It Works
- User speaks a command
- Whisper converts speech to text
- LangGraph manages the workflow
- Ollama AI understands the intent
- LangChain selects the right tool
- Tool executes on the operating system
- Result is spoken back to user
Privacy
- All AI processing runs locally on your machine
- No data sent to external servers
- No API keys required
- No internet needed after setup
- Your commands never leave your computer
Security
- Encrypted password storage using bcrypt
- Local profile stored at ~/.hardik-agent/
- Dangerous commands require confirmation
- No arbitrary code execution
Future Features
- Wake word detection
- Background runtime service
- System tray application
- GUI dashboard
- Plugin system
- Memory and personalization
- Multi language support
- MCP integration
- Computer vision
- Autonomous workflows
Author
Hardik Yerne
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hexa_agent-1.0.0.tar.gz.
File metadata
- Download URL: hexa_agent-1.0.0.tar.gz
- Upload date:
- Size: 27.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25d29f1a52b3195272cdf8363a9976f4355eaa9197b0bebe26fa53d7263e4dc6
|
|
| MD5 |
a1f45dede036763f18b7b5d1ee729f8f
|
|
| BLAKE2b-256 |
a6ec500a0c76e3c736becd15153aed0bd041a1f153084bc2cb50938b5316eb06
|
File details
Details for the file hexa_agent-1.0.0-py3-none-any.whl.
File metadata
- Download URL: hexa_agent-1.0.0-py3-none-any.whl
- Upload date:
- Size: 33.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d508ee15a15631809567dc43092aeaf30c12272e9d106c89ec1b12a16ca9a2bc
|
|
| MD5 |
e0d6b0bd23646e9a8b88e5b68b1467b5
|
|
| BLAKE2b-256 |
9107e7d1d1712845cfcacd36c2acb5cc4e50d252eec375a3b419e8d946f6e39a
|