A terminal-focused AI agent with RAG and local tool capabilities

Project description

Helium Agent

Important: Voice support is still under development so kindly use TEXT mode.

Helium is a local-first AI assistant with a voice pipeline, tool-calling agent loop, structured memory, RAG support and an optional web chat UI. It is designed for macOS and Apple Silicon, with local STT through MLX Whisper, wake-word detection through OpenWakeWord, TTS through Kokoro, and an LLM brain served from your own llama.cpp or Ollama-compatible local stack.

If you directly want to try go to Docker Section

What It Can Do

Answer everyday questions: Just like any other agent it can repsond to any mundane queries you might have. It won't judge you.
Tool calling: Helium can calls tools it has to perform complex operations in order to respond to your queries.
Research: For queries that include an in-depth knowledge and information retrieval Helium will take help of its research tool to provide with most accurate repsonse with proper citations.
Web Search: It can use DuckDuckGoSearch API to get web results and if necessary it will use playwright to dig deeper into complex websites all to make sure you get the best answer.
RAG: Currently a simple RAG pipeline is integrated where only 1 file at a time can be given to Helium and it will respond accordingly. [Future plans to scale this]
Bash execution: Helium can perform safe bash operations in its terminal.
Long-term memory: It uses a in-memory sqlite database which is currently session-scoped to remember important facts.

Project Structure

Helium/
├── main.py                 # Voice assistant entry point
├── assistant.py            # Assistant-facing orchestration helpers
├── requirements.txt        # Python dependencies
├── requirements-rag.txt    # Heavy RAG dependencies
├── docker-compose.yml      # Terminal + RAG containers
├── Dockerfile.api          # FastAPI backend image
├── Dockerfile.frontend     # React frontend image
├── Dockerfile.terminal     # Terminal UI image
├── api/
│   └── main.py             # FastAPI WebSocket chat API
├── config/
│   ├── settings.py         # Typed defaults and settings loader
│   └── settings.toml       # Local service, wake, speech, and assistant settings
├── core/
│   ├── llm.py              # LLM response generation and tool loop
│   └── orchestrator.py     # Assistant orchestration layer
├── engine/
│   ├── stt.py              # Speech-to-text handling
│   ├── tts.py              # Text-to-speech handling
│   └── wake_word.py        # Wake-word detection
├── frontend/
│   ├── src/                # React chat interface
│   ├── nginx/              # Static app server config
│   └── package.json        # Vite scripts and frontend dependencies
├── memory/
│   └── graph.py            # Local memory graph support
├── rag_service/            # standalone document intelligence FastAPI service
├── tools/
│   ├── registry.py         # Tool definitions and prompt context
│   ├── file_ops.py         # File creation tool
│   ├── memory_ops.py       # Memory tools
│   ├── system_ops.py       # System tools
│   ├── web_search.py       # Web-search tool entry point
│   ├── search/             # Search providers, planning, ranking, fetching, extraction
│   └── research/           # Research planner, models, pipeline, execution
├── utils/
│   ├── audio.py            # macOS sound cues
│   ├── health.py           # Service health checks
│   ├── history.py          # Command/conversation history helpers
│   └── parser.py           # Robust JSON/tool-call parsing
├── tests/                  # Unit tests for parser, tools, search, memory, and wake word logic
└── .env.example            # Example env file

Prerequisites

Helium is optimized for macOS on Apple Silicon because the voice pipeline uses mlx-whisper and macOS audio cues. Some server-only pieces can run in containers, but microphone capture and local audio playback are best run directly on macOS.

You will need:

Python 3.11+
A working microphone with terminal/app permission [Not needed currently]
PortAudio dependencies for pyaudio and sounddevice [Not needed currently]
A local LLM service, usually llama.cpp
If you have an API endpoint to any LLM you can use that too.
Optional local SearxNG for local-first web search [No longer needed]
Node/Bun only if you are developing the frontend outside Docker

Default service URLs are configured in config/settings.py and can be overridden in config/settings.toml.

Docker

Use this if you just want to chat without worrying the technical complexities but make sure to have you env configured accordingly.

It will take care of RAG pipeline automatically.

You can build and run the entire terminal application using:

docker compose up --build

You might need to wait for a bit. So, go have a coffee while it is building.

This will run the image:

docker compose run --rm --service-ports helium

The API container is configured to reach host services through host.docker.internal. Keep llama.cpp instance running on the host, then update docker-compose.yml if your ports differ.

Dev Installation

IMPORTANT: Use this only if you want to run it manually otherwise go to Docker section.

Clone the repository:

git clone <repository-url>
cd helium-agent

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate

Install Python dependencies:

pip install -r requirements.txt
pip install -r requirements-rag.txt

Doctor command for RAG check:
```
python -m rag_service doctor
```

If audio dependencies fail to build, install PortAudio first, then rerun the Python dependency install.

Local Services

Note: You can either use llama.cpp or any LLM provider API.

Start llama.cpp

Run a compatible instruction-tuned GGUF model on port 3000:

./llama-server -m /path/to/your/model.gguf -c 4096 --port 3000

Helium expects the default completion endpoint to be OPENAI compatible version:

http://127.0.0.1:3000/v1/chat/completion

Use LLM API

If you have an API to any LLM provider then you can use them directly by adding the API Key to a .env file in the directory.

LLM_API_KEY=your-llm-url-llm-api-key
LLM_API_URL=your-llm-url

Look at .env.example for more detail.

Start Playwright

Helium comes with playwright compatibility. So, if you want to get more in-depth results from web you can turn on this feature by updating use_playwright=true in config/settings.toml

Then install playwright and chromium.

pip install playwright
playright install chromium

These are not added in requirements.txt because Helium aims to be lightweight. But you can do whatever you want!

Note: Playwright is heavy as it downloads chromium so it can take some of your memory. Use with caution.

Start RAG pipeline

Helium comes with its own RAG pipeline. This allows you to add files with @ prefix to the file path to your file. Then you can ask anything about that file.

Currently it is good enough to answer what is inside it, summarize it, and other basic questions. Later I intend to deepen the understanding of the file using local embeddings.

This is an optional feature. Look into rag_service directory for more detail.

Run The Assistant

Note: Only TEXT mode is ready for use.

Confirm the LLM service is running.
Confirm your web services are running if you want better results.
Start Helium:
```
python main.py --mode text
```
Wait for:

Animation to load and welcome message to be shown.

Type your query and enjoy Helium.

Example requests:

What is the latest news on AI?
Remember that I prefer concise responses.
Create a file named hello.txt that says hi.
Open Safari.
Compare India and China GDP in 2025.
Why is the Indian Rupee falling recently?
Give me a report on the latest AI regulation changes in the EU.

RAG request example:

@README.md what does this project do?
@docs/plan.pdf summarize the risks

Run The Web UI

The web UI has two parts:

FastAPI backend: WebSocket endpoint at ws://localhost:8080/ws/chat
React/Vite frontend: browser chat interface under frontend/

Backend

uvicorn api.main:app --host 0.0.0.0 --port 8080

Frontend

cd frontend
bun install
bun run dev

The frontend opens a WebSocket to port 8080, so keep the API running while using the browser UI.

Configuration

Most runtime behavior lives in config/settings.toml:

services.llama_cpp_url
wake_word.threshold
wake_word.push_to_talk
speech.whisper_model
speech.timeout_seconds
speech.follow_up_timeout_seconds
assistant.tts_voice
assistant.follow_up_mode
assistant.confirm_risky_tools
assistant.persona

When a key is missing, Helium falls back to defaults in config/settings.py.

Testing

Run the test suite from the repository root:

python -m unittest discover -s tests

For the frontend:

cd frontend
bun run lint
bun run build

Troubleshooting

No wake detection: Check microphone permissions, input device selection, and wake_word.threshold.
False wakes: Increase wake_word.threshold or wake_word.required_hits.
No transcription: Confirm mlx-whisper, microphone access, and PortAudio dependencies are working.
No LLM response: Confirm the llama.cpp completion endpoint matches services.llama_cpp_url.
Search is weak or failing: Start SearxNG or verify services.searxng_url; DDGS fallback may be less consistent.
Web UI cannot connect: Make sure the FastAPI backend is running on port 8080.
Tool call JSON errors: Check terminal logs. utils/parser.py includes recovery logic, but malformed model output can still skip a tool step.

Project details

Release history Release notifications | RSS feed

0.1.3

May 28, 2026

0.1.2

May 28, 2026

0.1.1

May 28, 2026

This version

0.1.0

May 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helium_agent-0.1.0.tar.gz (80.9 kB view details)

Uploaded May 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

helium_agent-0.1.0-py3-none-any.whl (67.9 kB view details)

Uploaded May 28, 2026 Python 3

File details

Details for the file helium_agent-0.1.0.tar.gz.

File metadata

Download URL: helium_agent-0.1.0.tar.gz
Upload date: May 28, 2026
Size: 80.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for helium_agent-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`aee9a4892e239547917b66ebe5d724c5255ae9ea8eb98572d8f553c86f5d21ea`
MD5	`755c21f4d28875d6bca87fdb417ddcdb`
BLAKE2b-256	`e96b0e0b3172032a1feef11afc34ae024efcea737663adb28eb1ee1ac8a1c5a8`

See more details on using hashes here.

File details

Details for the file helium_agent-0.1.0-py3-none-any.whl.

File metadata

Download URL: helium_agent-0.1.0-py3-none-any.whl
Upload date: May 28, 2026
Size: 67.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for helium_agent-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5280fd4e09a761d566b0d6e5e90640b5759830c25e8a90cebd74259cefe212a9`
MD5	`85b23ec32494c8535fed82a1afc1ea2a`
BLAKE2b-256	`b99cdba08ebcfdc046d791408672a38d6d49dd7371d1e84131a508d4f8554d0f`

See more details on using hashes here.

helium-agent 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Helium Agent

What It Can Do

Project Structure

Prerequisites

Docker

Dev Installation

IMPORTANT: Use this only if you want to run it manually otherwise go to Docker section.

Local Services

Start llama.cpp

Use LLM API

Start Playwright

Start RAG pipeline

Run The Assistant

Run The Web UI

Backend

Frontend

Configuration

Testing

Troubleshooting

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes