Skip to main content

Hexa - Voice Controlled AI Desktop Assistant

Project description

Hardik Agent - Voice Controlled AI Desktop Assistant

Overview

Hardik Agent is a production-grade voice-controlled AI desktop assistant that allows users to control their entire operating system using natural language voice commands. No typing required - just speak and your computer responds.

Features

  • Voice controlled OS automation
  • Natural language understanding using Ollama + Llama3
  • Open any application by voice
  • Open any website by voice
  • Send WhatsApp messages by voice
  • Search Google and YouTube by voice
  • System monitoring (battery, CPU, RAM, disk)
  • Screenshot capture by voice
  • Volume control by voice
  • File and folder management by voice
  • Window management by voice
  • Secure user authentication
  • Works 100% offline after setup
  • Cross platform - Windows, Linux, MacOS

Tech Stack

Component Technology
Language Python 3.12
Voice STT OpenAI Whisper
AI Brain Ollama + Llama3.2
Agent Framework LangChain + LangGraph
TTS pyttsx3
GUI Automation pyautogui
CLI Typer
Auth bcrypt
Memory ChromaDB

Architecture

Voice Input ↓ Whisper STT ↓ LangGraph Runtime ↓ Ollama AI Brain ↓ LangChain Tools ↓ OS Automation ↓ Voice Response

Installation

Requirements

  • Python 3.12+
  • Windows 10/11 (Linux and MacOS supported)
  • Microphone
  • 8GB RAM minimum
  • 10GB free disk space
  • Internet connection for initial setup only

Install

ash pip install hardik-agent

Setup

ash hardik-agent setup

This automatically:

  • Detects your operating system
  • Installs FFmpeg
  • Installs Ollama
  • Downloads Llama3 AI model
  • Creates your secure account

Start

ash hardik-agent start

Usage

Voice Commands

Open Applications

"open chrome" "open spotify" "open whatsapp" "open telegram" "open vs code" "open notepad" "open calculator" "open file manager" "open task manager""open chrome" "open spotify" "open whatsapp" "open telegram" "open vs code" "open notepad" "open calculator" "open file manager" "open task manager"

Open Websites

"open youtube" "open netflix" "open chatgpt" "open instagram" "open gmail" "open whatsapp web"

Search

"search python tutorials on google" "search lofi music on youtube" "search weather in Mumbai"

Send WhatsApp Message

"send whatsapp message to John hello how are you" "message mom I am coming home"

System Information

"what time is it" "check my battery" "check CPU usage" "check RAM usage" "check disk space" "what is my IP address"

System Control

"take a screenshot" "increase volume" "decrease volume" "mute volume" "minimize window" "maximize window" "close window" "show desktop" "lock screen" "shutdown computer"

File Management

"create a folder called projects" "create new folder"

Project Structure

hardik-agent/ │ ├── main.py ├── requirements.txt ├── .env │ ├── cli/ │ ├── commands.py │ └── setup_manager.py │ ├── config/ │ └── settings.py │ ├── auth/ │ ├── signup.py │ ├── login.py │ ├── password_manager.py │ └── session.py │ ├── voice/ │ ├── microphone.py │ ├── speech_to_text.py │ └── text_to_speech.py │ ├── agent/ │ ├── brain.py │ ├── langchain_brain.py │ └── langgraph_brain.py │ ├── tools/ │ ├── app_tools.py │ └── tool_registry.py │ ├── automation/ │ ├── gui_automation.py │ ├── browser_automation.py │ ├── system_automation.py │ └── whatsapp.py │ ├── runtime/ │ ├── event_loop.py │ ├── startup.py │ └── tray.py │ ├── memory/ │ └── memory_manager.py │ ├── security/ │ ├── permissions.py │ └── safe_execution.py │ └── logs/ └── agent.log

Development Phases

  • Phase 1 - CLI Foundation
  • Phase 2 - Authentication System
  • Phase 3 - Voice Pipeline
  • Phase 4 - Tool Execution
  • Phase 5 - Voice Controls OS
  • Phase 6 - Ollama AI Brain
  • Phase 7 - Continuous Runtime
  • Phase 8 - GUI Automation
  • Phase 9 - LangChain + LangGraph
  • Phase 10 - Memory System (coming soon)
  • Phase 11 - Wake Word (coming soon)
  • Phase 12 - Background Service (coming soon)
  • Phase 13 - GUI Dashboard (coming soon)
  • Phase 14 - PyPI Package (coming soon)

How It Works

  1. User speaks a command
  2. Whisper converts speech to text
  3. LangGraph manages the workflow
  4. Ollama AI understands the intent
  5. LangChain selects the right tool
  6. Tool executes on the operating system
  7. Result is spoken back to user

Privacy

  • All AI processing runs locally on your machine
  • No data sent to external servers
  • No API keys required
  • No internet needed after setup
  • Your commands never leave your computer

Security

  • Encrypted password storage using bcrypt
  • Local profile stored at ~/.hardik-agent/
  • Dangerous commands require confirmation
  • No arbitrary code execution

Future Features

  • Wake word detection
  • Background runtime service
  • System tray application
  • GUI dashboard
  • Plugin system
  • Memory and personalization
  • Multi language support
  • MCP integration
  • Computer vision
  • Autonomous workflows

Author

Hardik Yerne

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hexa_agent-1.0.0.tar.gz (27.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hexa_agent-1.0.0-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file hexa_agent-1.0.0.tar.gz.

File metadata

  • Download URL: hexa_agent-1.0.0.tar.gz
  • Upload date:
  • Size: 27.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for hexa_agent-1.0.0.tar.gz
Algorithm Hash digest
SHA256 25d29f1a52b3195272cdf8363a9976f4355eaa9197b0bebe26fa53d7263e4dc6
MD5 a1f45dede036763f18b7b5d1ee729f8f
BLAKE2b-256 a6ec500a0c76e3c736becd15153aed0bd041a1f153084bc2cb50938b5316eb06

See more details on using hashes here.

File details

Details for the file hexa_agent-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: hexa_agent-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for hexa_agent-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d508ee15a15631809567dc43092aeaf30c12272e9d106c89ec1b12a16ca9a2bc
MD5 e0d6b0bd23646e9a8b88e5b68b1467b5
BLAKE2b-256 9107e7d1d1712845cfcacd36c2acb5cc4e50d252eec375a3b419e8d946f6e39a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page