A terminal-focused AI agent with RAG and local tool capabilities
Project description
Helium Agent
Important: Voice support is still under development so kindly use TEXT mode.
Helium is a local-first AI assistant with a voice pipeline, tool-calling agent loop, structured memory, RAG support and an optional web chat UI. It is designed for macOS and Apple Silicon, with local STT through MLX Whisper, wake-word detection through OpenWakeWord, TTS through Kokoro, and an LLM brain served from your own llama.cpp or Ollama-compatible local stack.
If you directly want to try go to Docker Section
What It Can Do
- Answer everyday questions: Just like any other agent it can repsond to any mundane queries you might have. It won't judge you.
- Tool calling: Helium can calls tools it has to perform complex operations in order to respond to your queries.
- Research: For queries that include an
in-depthknowledge and information retrieval Helium will take help of its research tool to provide with most accurate repsonse with proper citations. - Web Search: It can use
DuckDuckGoSearchAPI to get web results and if necessary it will useplaywrightto dig deeper into complex websites all to make sure you get the best answer. - RAG: Currently a simple RAG pipeline is integrated where only 1 file at a time can be given to Helium and it will respond accordingly. [Future plans to scale this]
- Bash execution: Helium can perform
safebash operations in its terminal. - Long-term memory: It uses a
in-memory sqlitedatabase which is currently session-scoped to remember important facts.
Project Structure
Helium/
├── main.py # Voice assistant entry point
├── assistant.py # Assistant-facing orchestration helpers
├── requirements.txt # Python dependencies
├── requirements-rag.txt # Heavy RAG dependencies
├── docker-compose.yml # Terminal + RAG containers
├── Dockerfile.api # FastAPI backend image
├── Dockerfile.frontend # React frontend image
├── Dockerfile.terminal # Terminal UI image
├── api/
│ └── main.py # FastAPI WebSocket chat API
├── config/
│ ├── settings.py # Typed defaults and settings loader
│ └── settings.toml # Local service, wake, speech, and assistant settings
├── core/
│ ├── llm.py # LLM response generation and tool loop
│ └── orchestrator.py # Assistant orchestration layer
├── engine/
│ ├── stt.py # Speech-to-text handling
│ ├── tts.py # Text-to-speech handling
│ └── wake_word.py # Wake-word detection
├── frontend/
│ ├── src/ # React chat interface
│ ├── nginx/ # Static app server config
│ └── package.json # Vite scripts and frontend dependencies
├── memory/
│ └── graph.py # Local memory graph support
├── rag_service/ # standalone document intelligence FastAPI service
├── tools/
│ ├── registry.py # Tool definitions and prompt context
│ ├── file_ops.py # File creation tool
│ ├── memory_ops.py # Memory tools
│ ├── system_ops.py # System tools
│ ├── web_search.py # Web-search tool entry point
│ ├── search/ # Search providers, planning, ranking, fetching, extraction
│ └── research/ # Research planner, models, pipeline, execution
├── utils/
│ ├── audio.py # macOS sound cues
│ ├── health.py # Service health checks
│ ├── history.py # Command/conversation history helpers
│ └── parser.py # Robust JSON/tool-call parsing
├── tests/ # Unit tests for parser, tools, search, memory, and wake word logic
└── .env.example # Example env file
Prerequisites
Helium is optimized for macOS on Apple Silicon because the voice pipeline uses mlx-whisper and macOS audio cues. Some server-only pieces can run in containers, but microphone capture and local audio playback are best run directly on macOS.
You will need:
- Python 3.11+
- A working microphone with terminal/app permission [Not needed currently]
- PortAudio dependencies for
pyaudioandsounddevice[Not needed currently] - A local LLM service, usually
llama.cpp - If you have an API endpoint to any LLM you can use that too.
- Optional local SearxNG for local-first web search [No longer needed]
- Node/Bun only if you are developing the frontend outside Docker
Default service URLs are configured in config/settings.py and can be overridden in config/settings.toml.
Docker
Use this if you just want to chat without worrying the technical complexities but make sure to have you
envconfigured accordingly.It will take care of RAG pipeline automatically.
You can build and run the entire terminal application using:
docker compose up --build
You might need to wait for a bit. So, go have a coffee while it is building.
This will run the image:
docker compose run --rm --service-ports helium
The API container is configured to reach host services through host.docker.internal. Keep llama.cpp instance running on the host, then update docker-compose.yml if your ports differ.
Dev Installation
IMPORTANT: Use this only if you want to run it manually otherwise go to Docker section.
-
Clone the repository:
git clone <repository-url> cd helium-agent
-
Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate
-
Install Python dependencies:
pip install -r requirements.txt pip install -r requirements-rag.txt
-
Doctor command for RAG check:
python -m rag_service doctor
If audio dependencies fail to build, install PortAudio first, then rerun the Python dependency install.
Local Services
Note: You can either use llama.cpp or any LLM provider API.
Start llama.cpp
Run a compatible instruction-tuned GGUF model on port 3000:
./llama-server -m /path/to/your/model.gguf -c 4096 --port 3000
Helium expects the default completion endpoint to be OPENAI compatible version:
http://127.0.0.1:3000/v1/chat/completion
Use LLM API
If you have an API to any LLM provider then you can use them directly by adding the API Key to a .env file in the directory.
LLM_API_KEY=your-llm-url-llm-api-key
LLM_API_URL=your-llm-url
Look at .env.example for more detail.
Start Playwright
Helium comes with playwright compatibility. So, if you want to get more in-depth results from web you can turn on this feature by updating use_playwright=true in config/settings.toml
Then install playwright and chromium.
pip install playwright
playright install chromium
These are not added in
requirements.txtbecause Helium aims to be lightweight. But you can do whatever you want!
Note: Playwright is heavy as it downloads chromium so it can take some of your memory. Use with caution.
Start RAG pipeline
Helium comes with its own RAG pipeline. This allows you to add files with @ prefix to the file path to your file. Then you can ask anything about that file.
Currently it is good enough to answer what is inside it, summarize it, and other basic questions. Later I intend to deepen the understanding of the file using local embeddings.
This is an optional feature. Look into rag_service directory for more detail.
Run The Assistant
Note: Only TEXT mode is ready for use.
-
Confirm the LLM service is running.
-
Confirm your web services are running if you want better results.
-
Start Helium:
python main.py --mode text
-
Wait for:
Animation to load and welcome message to be shown.
- Type your query and enjoy Helium.
Example requests:
What is the latest news on AI?
Remember that I prefer concise responses.
Create a file named hello.txt that says hi.
Open Safari.
Compare India and China GDP in 2025.
Why is the Indian Rupee falling recently?
Give me a report on the latest AI regulation changes in the EU.
RAG request example:
@README.md what does this project do?
@docs/plan.pdf summarize the risks
Run The Web UI
The web UI has two parts:
- FastAPI backend: WebSocket endpoint at
ws://localhost:8080/ws/chat - React/Vite frontend: browser chat interface under
frontend/
Backend
uvicorn api.main:app --host 0.0.0.0 --port 8080
Frontend
cd frontend
bun install
bun run dev
The frontend opens a WebSocket to port 8080, so keep the API running while using the browser UI.
Configuration
Most runtime behavior lives in config/settings.toml:
services.llama_cpp_urlwake_word.thresholdwake_word.push_to_talkspeech.whisper_modelspeech.timeout_secondsspeech.follow_up_timeout_secondsassistant.tts_voiceassistant.follow_up_modeassistant.confirm_risky_toolsassistant.persona
When a key is missing, Helium falls back to defaults in config/settings.py.
Testing
Run the test suite from the repository root:
python -m unittest discover -s tests
For the frontend:
cd frontend
bun run lint
bun run build
Troubleshooting
- No wake detection: Check microphone permissions, input device selection, and
wake_word.threshold. - False wakes: Increase
wake_word.thresholdorwake_word.required_hits. - No transcription: Confirm
mlx-whisper, microphone access, and PortAudio dependencies are working. - No LLM response: Confirm the llama.cpp completion endpoint matches
services.llama_cpp_url. - Search is weak or failing: Start SearxNG or verify
services.searxng_url; DDGS fallback may be less consistent. - Web UI cannot connect: Make sure the FastAPI backend is running on port
8080. - Tool call JSON errors: Check terminal logs.
utils/parser.pyincludes recovery logic, but malformed model output can still skip a tool step.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file helium_agent-0.1.0.tar.gz.
File metadata
- Download URL: helium_agent-0.1.0.tar.gz
- Upload date:
- Size: 80.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aee9a4892e239547917b66ebe5d724c5255ae9ea8eb98572d8f553c86f5d21ea
|
|
| MD5 |
755c21f4d28875d6bca87fdb417ddcdb
|
|
| BLAKE2b-256 |
e96b0e0b3172032a1feef11afc34ae024efcea737663adb28eb1ee1ac8a1c5a8
|
File details
Details for the file helium_agent-0.1.0-py3-none-any.whl.
File metadata
- Download URL: helium_agent-0.1.0-py3-none-any.whl
- Upload date:
- Size: 67.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5280fd4e09a761d566b0d6e5e90640b5759830c25e8a90cebd74259cefe212a9
|
|
| MD5 |
85b23ec32494c8535fed82a1afc1ea2a
|
|
| BLAKE2b-256 |
b99cdba08ebcfdc046d791408672a38d6d49dd7371d1e84131a508d4f8554d0f
|