Skip to main content

A terminal-focused AI agent with RAG and local tool capabilities

Project description

Helium Agent Logo

Helium Agent

Important: Voice support is still under development so kindly use TEXT mode.

Helium is a local-first AI assistant with a voice pipeline, tool-calling agent loop, structured memory, RAG support and an optional web chat UI. It is designed for macOS and Apple Silicon, with local STT through MLX Whisper, wake-word detection through OpenWakeWord, TTS through Kokoro, and an LLM brain served from your own llama.cpp or Ollama-compatible local stack.

If you directly want to try go to Docker Section

What It Can Do

  • Answer everyday questions: Just like any other agent it can repsond to any mundane queries you might have. It won't judge you.
  • Tool calling: Helium can calls tools it has to perform complex operations in order to respond to your queries.
  • Research: For queries that include an in-depth knowledge and information retrieval Helium will take help of its research tool to provide with most accurate repsonse with proper citations.
  • Web Search: It can use DuckDuckGoSearch API to get web results and if necessary it will use playwright to dig deeper into complex websites all to make sure you get the best answer.
  • RAG: Currently a simple RAG pipeline is integrated where only 1 file at a time can be given to Helium and it will respond accordingly. [Future plans to scale this]
  • Bash execution: Helium can perform safe bash operations in its terminal.
  • Long-term memory: It uses a in-memory sqlite database which is currently session-scoped to remember important facts.

Project Structure

Helium/
├── main.py                 # Voice assistant entry point
├── assistant.py            # Assistant-facing orchestration helpers
├── requirements.txt        # Python dependencies
├── requirements-rag.txt    # Heavy RAG dependencies
├── docker-compose.yml      # Terminal + RAG containers
├── Dockerfile.api          # FastAPI backend image
├── Dockerfile.frontend     # React frontend image
├── Dockerfile.terminal     # Terminal UI image
├── api/
│   └── main.py             # FastAPI WebSocket chat API
├── config/
│   ├── settings.py         # Typed defaults and settings loader
│   └── settings.toml       # Local service, wake, speech, and assistant settings
├── core/
│   ├── llm.py              # LLM response generation and tool loop
│   └── orchestrator.py     # Assistant orchestration layer
├── engine/
│   ├── stt.py              # Speech-to-text handling
│   ├── tts.py              # Text-to-speech handling
│   └── wake_word.py        # Wake-word detection
├── frontend/
│   ├── src/                # React chat interface
│   ├── nginx/              # Static app server config
│   └── package.json        # Vite scripts and frontend dependencies
├── memory/
│   └── graph.py            # Local memory graph support
├── rag_service/            # standalone document intelligence FastAPI service
├── tools/
│   ├── registry.py         # Tool definitions and prompt context
│   ├── file_ops.py         # File creation tool
│   ├── memory_ops.py       # Memory tools
│   ├── system_ops.py       # System tools
│   ├── web_search.py       # Web-search tool entry point
│   ├── search/             # Search providers, planning, ranking, fetching, extraction
│   └── research/           # Research planner, models, pipeline, execution
├── utils/
│   ├── audio.py            # macOS sound cues
│   ├── health.py           # Service health checks
│   ├── history.py          # Command/conversation history helpers
│   └── parser.py           # Robust JSON/tool-call parsing
├── tests/                  # Unit tests for parser, tools, search, memory, and wake word logic
└── .env.example            # Example env file

Prerequisites

Helium is optimized for macOS on Apple Silicon because the voice pipeline uses mlx-whisper and macOS audio cues. Some server-only pieces can run in containers, but microphone capture and local audio playback are best run directly on macOS.

You will need:

  • Python 3.11+
  • A working microphone with terminal/app permission [Not needed currently]
  • PortAudio dependencies for pyaudio and sounddevice [Not needed currently]
  • A local LLM service, usually llama.cpp
  • If you have an API endpoint to any LLM you can use that too.
  • Optional local SearxNG for local-first web search [No longer needed]
  • Node/Bun only if you are developing the frontend outside Docker

Default service URLs are configured in config/settings.py and can be overridden in config/settings.toml.

Docker

Use this if you just want to chat without worrying the technical complexities but make sure to have you env configured accordingly.

It will take care of RAG pipeline automatically.

You can build and run the entire terminal application using:

docker compose up --build

You might need to wait for a bit. So, go have a coffee while it is building.

This will run the image:

docker compose run --rm --service-ports helium

The API container is configured to reach host services through host.docker.internal. Keep llama.cpp instance running on the host, then update docker-compose.yml if your ports differ.

Dev Installation

IMPORTANT: Use this only if you want to run it manually otherwise go to Docker section.

  1. Clone the repository:

    git clone <repository-url>
    cd helium-agent
    
  2. Create and activate a virtual environment:

    python -m venv .venv
    source .venv/bin/activate
    
  3. Install Python dependencies:

    pip install -r requirements.txt
    pip install -r requirements-rag.txt
    
  4. Doctor command for RAG check:

    python -m rag_service doctor
    

If audio dependencies fail to build, install PortAudio first, then rerun the Python dependency install.

Local Services

Note: You can either use llama.cpp or any LLM provider API.

Start llama.cpp

Run a compatible instruction-tuned GGUF model on port 3000:

./llama-server -m /path/to/your/model.gguf -c 4096 --port 3000

Helium expects the default completion endpoint to be OPENAI compatible version:

http://127.0.0.1:3000/v1/chat/completion

Use LLM API

If you have an API to any LLM provider then you can use them directly by adding the API Key to a .env file in the directory.

LLM_API_KEY=your-llm-url-llm-api-key
LLM_API_URL=your-llm-url

Look at .env.example for more detail.

Start Playwright

Helium comes with playwright compatibility. So, if you want to get more in-depth results from web you can turn on this feature by updating use_playwright=true in config/settings.toml

Then install playwright and chromium.

pip install playwright
playright install chromium

These are not added in requirements.txt because Helium aims to be lightweight. But you can do whatever you want!

Note: Playwright is heavy as it downloads chromium so it can take some of your memory. Use with caution.

Start RAG pipeline

Helium comes with its own RAG pipeline. This allows you to add files with @ prefix to the file path to your file. Then you can ask anything about that file.

Currently it is good enough to answer what is inside it, summarize it, and other basic questions. Later I intend to deepen the understanding of the file using local embeddings.

This is an optional feature. Look into rag_service directory for more detail.

Run The Assistant

Note: Only TEXT mode is ready for use.

  1. Confirm the LLM service is running.

  2. Confirm your web services are running if you want better results.

  3. Start Helium:

    python main.py --mode text
    
  4. Wait for:

Animation to load and welcome message to be shown.

  1. Type your query and enjoy Helium.

Example requests:

What is the latest news on AI?
Remember that I prefer concise responses.
Create a file named hello.txt that says hi.
Open Safari.
Compare India and China GDP in 2025.
Why is the Indian Rupee falling recently?
Give me a report on the latest AI regulation changes in the EU.

RAG request example:

@README.md what does this project do?
@docs/plan.pdf summarize the risks

Run The Web UI

The web UI has two parts:

  • FastAPI backend: WebSocket endpoint at ws://localhost:8080/ws/chat
  • React/Vite frontend: browser chat interface under frontend/

Backend

uvicorn api.main:app --host 0.0.0.0 --port 8080

Frontend

cd frontend
bun install
bun run dev

The frontend opens a WebSocket to port 8080, so keep the API running while using the browser UI.

Configuration

Most runtime behavior lives in config/settings.toml:

  • services.llama_cpp_url
  • wake_word.threshold
  • wake_word.push_to_talk
  • speech.whisper_model
  • speech.timeout_seconds
  • speech.follow_up_timeout_seconds
  • assistant.tts_voice
  • assistant.follow_up_mode
  • assistant.confirm_risky_tools
  • assistant.persona

When a key is missing, Helium falls back to defaults in config/settings.py.

Testing

Run the test suite from the repository root:

python -m unittest discover -s tests

For the frontend:

cd frontend
bun run lint
bun run build

Troubleshooting

  • No wake detection: Check microphone permissions, input device selection, and wake_word.threshold.
  • False wakes: Increase wake_word.threshold or wake_word.required_hits.
  • No transcription: Confirm mlx-whisper, microphone access, and PortAudio dependencies are working.
  • No LLM response: Confirm the llama.cpp completion endpoint matches services.llama_cpp_url.
  • Search is weak or failing: Start SearxNG or verify services.searxng_url; DDGS fallback may be less consistent.
  • Web UI cannot connect: Make sure the FastAPI backend is running on port 8080.
  • Tool call JSON errors: Check terminal logs. utils/parser.py includes recovery logic, but malformed model output can still skip a tool step.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

helium_agent-0.1.2.tar.gz (102.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

helium_agent-0.1.2-py3-none-any.whl (99.2 kB view details)

Uploaded Python 3

File details

Details for the file helium_agent-0.1.2.tar.gz.

File metadata

  • Download URL: helium_agent-0.1.2.tar.gz
  • Upload date:
  • Size: 102.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for helium_agent-0.1.2.tar.gz
Algorithm Hash digest
SHA256 af491f6fd078961f15360d8fc151cd8e99b51a4dfe62040e46f45292f1924f9c
MD5 5023c036f71f55e1473cc52129f16869
BLAKE2b-256 6083cbeab0154ec54482ceddc35676f595b1ecc936bf36d4efebc0b4c8105962

See more details on using hashes here.

File details

Details for the file helium_agent-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: helium_agent-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 99.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for helium_agent-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 db8b6a2cee1d08e20acee0fd80cf23e4986dc240b149e92cf6b791c730912b10
MD5 522e73f02eb1c8945fddffd5d3be42eb
BLAKE2b-256 c9e235ea9066e4097d21d5740a410c51425610ffa509c437958df183d8954834

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page