Local AI coding agent — Ollama + any model
Project description
LLMai
A local-first AI coding agent with three layers of awareness.
Runs on your own Ollama. Remembers across sessions. Searches your org's knowledge before writing code. Observable end-to-end.
What LLMai is
A privacy-first AI coding agent that runs the model locally and gives it three distinct kinds of awareness — not just one giant chat window:
| Layer | Backed by | What it gives the agent |
|---|---|---|
| Operational | Dynatrace (OpenTelemetry → Bindplane) | Every tool call traced: latency, token count, permission outcome, success/error |
| Personal | MongoDB Atlas (per-workspace) | Recalls your past sessions, decisions, and extracted facts. Boots warm |
| Organizational | Elastic (per-org) | Hybrid search over GitLab issues, CI failure logs, docs — checks "have we seen this before?" before writing code |
All three are opt-in. The core agent runs 100% locally with no external dependencies. Each integration is one config flip away.
Live demo: ll-mai.vercel.app
Stack at a glance
┌─ search_knowledge ──► Elastic (issues + docs)
│
Browser / CLI ──► Agent Loop ────────┼─ recall_memory ─────► MongoDB Atlas
│ │ (per-workspace sessions
│ │ + summaries + knowledge)
│ │
│ └─ query_logs ────────► Elastic ES|QL
│ (pipeline + agent self-logs)
│
▼
OTel spans
│
▼
Bindplane ──┬──► Dynatrace (traces + metrics)
│
└──► Elastic llmai-agent-logs
(so the agent can query itself)
LLM backend: Ollama localhost:11434 — Gemini / Groq optional
Two interfaces
| How to run | Best for | |
|---|---|---|
| CLI REPL | llmai |
Terminal power users |
| Web UI | llmai-server → http://localhost:7777 |
Browser, streaming, permission cards |
Quick start (core only, no partner integrations)
Install from PyPI (the distribution is named llmai-agent because bare
llmai was already taken; the import path is still import llmai):
pip install llmai-agent
ollama serve
ollama pull qwen2.5-coder
llmai-server # opens http://localhost:7777
Or install from source for development:
git clone https://github.com/sechan9999/LLMai.git
cd LLMai
pip install -e .
ollama serve
ollama pull qwen2.5-coder
llmai-server # opens http://localhost:7777
That's it. The agent has 8 core tools (read/write/edit files, run shell, search code, list files, fetch URL, mkdir) plus 11 GitLab tools when GITLAB_TOKEN is set. Read-only ops auto-approve; writes and shell commands prompt for permission.
Adding the three layers (10-15 min each)
1. Dynatrace — observe every tool call
make install-telemetry # adds opentelemetry packages
# in .env (copy from .env.example):
DT_ENDPOINT=https://<your>.live.dynatrace.com
DT_API_TOKEN=dt0c01.YOUR_TOKEN
make bindplane-up # starts the OTel collector locally
export LLMAI_OTEL_ENABLED=true
export LLMAI_OTEL_ENDPOINT=http://localhost:4318
llmai-server
Spans: agent.turn → agent.iteration → llm.chat + tool.invocation. Metrics: tool invocation counts, LLM latency, token histograms.
Full guide: docs/dynatrace-setup.md
2. MongoDB Atlas — remember across sessions
make install-memory # adds pymongo
ollama pull nomic-embed-text # embedding model
# in config.json or env:
LLMAI_MEMORY_ENABLED=true
LLMAI_MEMORY_URI=mongodb+srv://USER:PASS@cluster.mongodb.net/
python scripts/setup_atlas_indexes.py # one-time vector index bootstrap
llmai-server
Three collections: sessions, summaries (vector-embedded), knowledge (extracted facts, vector-embedded). New tool: recall_memory for the agent. On each new session, the 3 most recent prior summaries are auto-injected as a system message.
Full guide: docs/atlas-setup.md
3. Elastic — search org knowledge before writing code
make install-elastic # adds elasticsearch client
make elastic-up # local ES + Kibana via docker
make elastic-setup # creates llmai-* indices
# optional: pull org knowledge in
export GITLAB_TOKEN=glpat-...
export GITLAB_PROJECT=group/project
make elastic-ingest # pulls last 500 issues + 50 failed pipelines
export LLMAI_ELASTIC_ENABLED=true
export LLMAI_ELASTIC_URL=http://localhost:9200
llmai-server
Two tools: search_knowledge (hybrid keyword + dense vector, RRF-fused, auto-approved) and query_logs (raw ES|QL, permission-gated). System prompt nudges the model to call search_knowledge before writing code that touches error paths or external APIs.
Full guide: docs/elastic-setup.md
One-command demo bootstrap
make demo-up # starts Elastic + Kibana + Bindplane
make demo-bootstrap # pulls embed model + creates ES indices
make demo-status # health check across the whole stack
Key features
- 100% local by default — no API keys, no cloud, nothing leaves your machine
- Agentic loop — observe → judge → act, up to 20 iterations per turn
- Permission gates — read-only auto-approves; writes and shell prompt
- Three-layer awareness — operational (Dynatrace), personal (Atlas), org (Elastic) — all opt-in
- MCP-compatible tool shapes —
recall_memory,search_knowledge,query_logsmirror the official MCP server contracts - Dual tool-calling modes — native OpenAI function calling for capable models; XML fallback for
gemma3,phi3, etc. - Context compression — auto-summarizes when conversation exceeds ~50k tokens
- Workspace sandboxing — file ops restricted to
WORKSPACE_ROOT; dangerous command patterns blocked - Provider-agnostic — Ollama, Gemini, Groq (cloud fallback for the hosted demo)
- Dual interface — CLI REPL + WebSocket-streaming Web UI
Tools (when their layer is enabled)
| Category | Tools | Default permission |
|---|---|---|
| Core (8) | read_file, write_file, edit_file, run_command, list_files, search_code, fetch_url, create_directory |
reads allow, writes/shell ask |
| GitLab (11) | gitlab_list_issues, gitlab_get_mr, gitlab_get_job_log, … |
reads allow, mutations ask |
| Memory (1) | recall_memory |
allow |
| Elastic (2) | search_knowledge, query_logs |
allow / ask |
Tools are registered conditionally — the model never sees a tool whose backend isn't connected.
Configuration
config.json (or env vars — env always wins):
{
"ollama_url": "http://localhost:11434",
"model": "qwen2.5-coder",
"permissions": { ... },
"telemetry": { "enabled": false, "endpoint": "http://localhost:4318", ... },
"memory": { "enabled": false, "uri": "mongodb+srv://...", ... },
"elastic": { "enabled": false, "url": "http://localhost:9200", ... }
}
Full example: config.example.json.
CLI commands
| Command | Description |
|---|---|
/reset |
Clear conversation context (and finalize session memory) |
/model <name> |
Switch model |
/models |
List Ollama models locally |
/tokens |
Show estimated token count |
/perms |
Show current permission settings |
/compress |
Force context compression now |
/exit |
Quit |
Supported models
Native tool calling (recommended): Qwen 2.5 / 2.5-Coder / 3, Llama 3.1+, Mistral NeMo, FireFunction, Command-R(+).
XML fallback: gemma3, phi3, mistral — anything OpenAI-incompatible. Slightly lower fidelity but functional.
Cloud backends (optional)
The agent speaks the OpenAI /v1/chat/completions contract.
# Gemini (via AI Studio key — also works for Vertex AI compat endpoint)
export GEMINI_API_KEY=...
# Any OpenAI-compat endpoint (LM Studio, vLLM, custom):
# set "ollama_url" in config.json
Project structure
LLMai/
├── llmai/ # Python package (formerly vixcode)
│ ├── agent.py # Sync CLI agent loop
│ ├── main.py # CLI REPL entry point
│ ├── llm.py # OpenAI-compat HTTP client
│ ├── tools.py # 8 core tools + conditional registration
│ ├── gitlab_tools.py # 11 GitLab tools
│ ├── permissions.py # allow / ask / deny system
│ ├── telemetry.py # OpenTelemetry init + span context managers
│ ├── memory/ # MongoDB Atlas persistent memory
│ │ ├── store.py
│ │ ├── embeddings.py
│ │ └── recall_tool.py
│ └── elastic/ # Elasticsearch knowledge search + log analytics
│ ├── client.py
│ ├── search_tool.py
│ └── query_tool.py
├── server/ # FastAPI + WebSocket Web UI
│ ├── app.py
│ ├── agent_ws.py # Async agent loop (native + XML modes)
│ └── static/index.html
├── website/ # Landing page (Vercel)
│ └── index.html
├── api/chat.js # Vercel serverless cloud fallback (Groq)
├── scripts/ # Bootstrap scripts for partner integrations
│ ├── setup_atlas_indexes.py
│ ├── elastic_setup_indexes.py
│ ├── elastic_ingest_gitlab.py
│ └── elastic_ingest_logs.py
├── bindplane/config.yaml # OTel collector → Dynatrace + Elastic
├── docker-compose.bindplane.yml # Bindplane container
├── docker-compose.elastic.yml # Elasticsearch + Kibana
├── Makefile # Common dev / demo tasks
├── docs/
│ ├── dynatrace-setup.md
│ ├── atlas-setup.md
│ └── elastic-setup.md
└── tests/ # 103 passing
Documentation
- Dynatrace observability — OpenTelemetry spans, metrics, Bindplane pipeline
- MongoDB Atlas memory — cross-session continuity, semantic recall
- Elastic knowledge search — hybrid search, ES|QL, agent self-logs
- Local LLM Setup — best practices for Ollama
- Permission System — allow/ask/deny configuration
- Tool System — tool definitions and sandboxing
- OSS Comparison — vs. Claude Code, Aider, Open Interpreter, Goose
License
MIT
Acknowledgments
- Ollama — Local LLM runtime
- FastAPI — Web framework
- Rich — Terminal formatting
- OpenTelemetry — Observability standard
- Bindplane — OTel collector
- Dynatrace, MongoDB Atlas, Elastic — Hackathon partner backends
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmai_agent-0.2.2.tar.gz.
File metadata
- Download URL: llmai_agent-0.2.2.tar.gz
- Upload date:
- Size: 74.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f65de472ee03db1ed6ccf9bb1be9a134ca0a3c3de8c1014d0133565ce84ad29b
|
|
| MD5 |
1529733a6c4e017b18a5de3b9701b7d1
|
|
| BLAKE2b-256 |
231e675e742f0b91379736355b6b16f20a78c72f69d9bb74fa51e13477a1bbfc
|
Provenance
The following attestation bundles were made for llmai_agent-0.2.2.tar.gz:
Publisher:
release.yml on sechan9999/LLMai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmai_agent-0.2.2.tar.gz -
Subject digest:
f65de472ee03db1ed6ccf9bb1be9a134ca0a3c3de8c1014d0133565ce84ad29b - Sigstore transparency entry: 1644518343
- Sigstore integration time:
-
Permalink:
sechan9999/LLMai@f2d028967ed6cead3283c0ea8f403137022b741a -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/sechan9999
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f2d028967ed6cead3283c0ea8f403137022b741a -
Trigger Event:
push
-
Statement type:
File details
Details for the file llmai_agent-0.2.2-py3-none-any.whl.
File metadata
- Download URL: llmai_agent-0.2.2-py3-none-any.whl
- Upload date:
- Size: 69.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a058fa0551a5e1a358aabf096dc28ab27b816ff5f99770c8d9be76595de4768
|
|
| MD5 |
9809e2e2d1beb7249c4f94353ae63a83
|
|
| BLAKE2b-256 |
c5423a97f529222b0abcfea09543b24524bd698e707e20d6ce5be69486706620
|
Provenance
The following attestation bundles were made for llmai_agent-0.2.2-py3-none-any.whl:
Publisher:
release.yml on sechan9999/LLMai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmai_agent-0.2.2-py3-none-any.whl -
Subject digest:
8a058fa0551a5e1a358aabf096dc28ab27b816ff5f99770c8d9be76595de4768 - Sigstore transparency entry: 1644518423
- Sigstore integration time:
-
Permalink:
sechan9999/LLMai@f2d028967ed6cead3283c0ea8f403137022b741a -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/sechan9999
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@f2d028967ed6cead3283c0ea8f403137022b741a -
Trigger Event:
push
-
Statement type: