Local AI coding agent — Ollama + any model

Project description

LLMai

A local-first AI coding agent with three layers of awareness.
Runs on your own Ollama. Remembers across sessions. Searches your org's knowledge before writing code. Observable end-to-end.

What LLMai is

A privacy-first AI coding agent that runs the model locally and gives it three distinct kinds of awareness — not just one giant chat window:

Layer	Backed by	What it gives the agent
Operational	Dynatrace (OpenTelemetry → Bindplane)	Every tool call traced: latency, token count, permission outcome, success/error
Personal	MongoDB Atlas (per-workspace)	Recalls your past sessions, decisions, and extracted facts. Boots warm
Organizational	Elastic (per-org)	Hybrid search over GitLab issues, CI failure logs, docs — checks "have we seen this before?" before writing code

All three are opt-in. The core agent runs 100% locally with no external dependencies. Each integration is one config flip away.

Live demo: ll-mai.vercel.app

Stack at a glance

                                        ┌─ search_knowledge ──► Elastic (issues + docs)
                                        │
   Browser / CLI ──► Agent Loop ────────┼─ recall_memory ─────► MongoDB Atlas
                          │             │                       (per-workspace sessions
                          │             │                        + summaries + knowledge)
                          │             │
                          │             └─ query_logs ────────► Elastic ES|QL
                          │                                     (pipeline + agent self-logs)
                          │
                          ▼
                     OTel spans
                          │
                          ▼
                  Bindplane ──┬──► Dynatrace (traces + metrics)
                              │
                              └──► Elastic llmai-agent-logs
                                   (so the agent can query itself)

LLM backend: Ollama localhost:11434 — Gemini / Groq optional

Two interfaces

	How to run	Best for
CLI REPL	`llmai`	Terminal power users
Web UI	`llmai-server` → http://localhost:7777	Browser, streaming, permission cards

Quick start (core only, no partner integrations)

Install from PyPI (the distribution is named llmai-agent because bare llmai was already taken; the import path is still import llmai):

pip install llmai-agent
ollama serve
ollama pull qwen2.5-coder
llmai-server   # opens http://localhost:7777

Or install from source for development:

git clone https://github.com/sechan9999/LLMai.git
cd LLMai
pip install -e .

ollama serve
ollama pull qwen2.5-coder

llmai-server   # opens http://localhost:7777

That's it. The agent has 8 core tools (read/write/edit files, run shell, search code, list files, fetch URL, mkdir) plus 11 GitLab tools when GITLAB_TOKEN is set. Read-only ops auto-approve; writes and shell commands prompt for permission.

Adding the three layers (10-15 min each)

1. Dynatrace — observe every tool call

make install-telemetry          # adds opentelemetry packages

# in .env (copy from .env.example):
DT_ENDPOINT=https://<your>.live.dynatrace.com
DT_API_TOKEN=dt0c01.YOUR_TOKEN

make bindplane-up               # starts the OTel collector locally

export LLMAI_OTEL_ENABLED=true
export LLMAI_OTEL_ENDPOINT=http://localhost:4318
llmai-server

Spans: agent.turn → agent.iteration → llm.chat + tool.invocation. Metrics: tool invocation counts, LLM latency, token histograms.

Full guide: docs/dynatrace-setup.md

2. MongoDB Atlas — remember across sessions

make install-memory             # adds pymongo
ollama pull nomic-embed-text    # embedding model

# in config.json or env:
LLMAI_MEMORY_ENABLED=true
LLMAI_MEMORY_URI=mongodb+srv://USER:PASS@cluster.mongodb.net/

python scripts/setup_atlas_indexes.py   # one-time vector index bootstrap
llmai-server

Three collections: sessions, summaries (vector-embedded), knowledge (extracted facts, vector-embedded). New tool: recall_memory for the agent. On each new session, the 3 most recent prior summaries are auto-injected as a system message.

Full guide: docs/atlas-setup.md

3. Elastic — search org knowledge before writing code

make install-elastic            # adds elasticsearch client
make elastic-up                 # local ES + Kibana via docker
make elastic-setup              # creates llmai-* indices

# optional: pull org knowledge in
export GITLAB_TOKEN=glpat-...
export GITLAB_PROJECT=group/project
make elastic-ingest             # pulls last 500 issues + 50 failed pipelines

export LLMAI_ELASTIC_ENABLED=true
export LLMAI_ELASTIC_URL=http://localhost:9200
llmai-server

Two tools: search_knowledge (hybrid keyword + dense vector, RRF-fused, auto-approved) and query_logs (raw ES|QL, permission-gated). System prompt nudges the model to call search_knowledge before writing code that touches error paths or external APIs.

Full guide: docs/elastic-setup.md

One-command demo bootstrap

make demo-up           # starts Elastic + Kibana + Bindplane
make demo-bootstrap    # pulls embed model + creates ES indices
make demo-status       # health check across the whole stack

Key features

100% local by default — no API keys, no cloud, nothing leaves your machine
Agentic loop — observe → judge → act, up to 20 iterations per turn
Permission gates — read-only auto-approves; writes and shell prompt
Three-layer awareness — operational (Dynatrace), personal (Atlas), org (Elastic) — all opt-in
MCP-compatible tool shapes — recall_memory, search_knowledge, query_logs mirror the official MCP server contracts
Dual tool-calling modes — native OpenAI function calling for capable models; XML fallback for gemma3, phi3, etc.
Context compression — auto-summarizes when conversation exceeds ~50k tokens
Workspace sandboxing — file ops restricted to WORKSPACE_ROOT; dangerous command patterns blocked
Provider-agnostic — Ollama, Gemini, Groq (cloud fallback for the hosted demo)
Dual interface — CLI REPL + WebSocket-streaming Web UI

Tools (when their layer is enabled)

Category	Tools	Default permission
Core (8)	`read_file`, `write_file`, `edit_file`, `run_command`, `list_files`, `search_code`, `fetch_url`, `create_directory`	reads `allow`, writes/shell `ask`
GitLab (11)	`gitlab_list_issues`, `gitlab_get_mr`, `gitlab_get_job_log`, …	reads `allow`, mutations `ask`
Memory (1)	`recall_memory`	`allow`
Elastic (2)	`search_knowledge`, `query_logs`	`allow` / `ask`

Tools are registered conditionally — the model never sees a tool whose backend isn't connected.

Configuration

config.json (or env vars — env always wins):

{
  "ollama_url": "http://localhost:11434",
  "model": "qwen2.5-coder",
  "permissions": { ... },
  "telemetry": { "enabled": false, "endpoint": "http://localhost:4318", ... },
  "memory":    { "enabled": false, "uri": "mongodb+srv://...", ... },
  "elastic":   { "enabled": false, "url": "http://localhost:9200", ... }
}

Full example: config.example.json.

CLI commands

Command	Description
`/reset`	Clear conversation context (and finalize session memory)
`/model <name>`	Switch model
`/models`	List Ollama models locally
`/tokens`	Show estimated token count
`/perms`	Show current permission settings
`/compress`	Force context compression now
`/exit`	Quit

Supported models

Native tool calling (recommended): Qwen 2.5 / 2.5-Coder / 3, Llama 3.1+, Mistral NeMo, FireFunction, Command-R(+).

XML fallback: gemma3, phi3, mistral — anything OpenAI-incompatible. Slightly lower fidelity but functional.

Cloud backends (optional)

The agent speaks the OpenAI /v1/chat/completions contract.

# Gemini (via AI Studio key — also works for Vertex AI compat endpoint)
export GEMINI_API_KEY=...

# Any OpenAI-compat endpoint (LM Studio, vLLM, custom):
# set "ollama_url" in config.json

Project structure

LLMai/
├── llmai/                # Python package (formerly vixcode)
│   ├── agent.py          # Sync CLI agent loop
│   ├── main.py           # CLI REPL entry point
│   ├── llm.py            # OpenAI-compat HTTP client
│   ├── tools.py          # 8 core tools + conditional registration
│   ├── gitlab_tools.py   # 11 GitLab tools
│   ├── permissions.py    # allow / ask / deny system
│   ├── telemetry.py      # OpenTelemetry init + span context managers
│   ├── memory/           # MongoDB Atlas persistent memory
│   │   ├── store.py
│   │   ├── embeddings.py
│   │   └── recall_tool.py
│   └── elastic/          # Elasticsearch knowledge search + log analytics
│       ├── client.py
│       ├── search_tool.py
│       └── query_tool.py
├── server/               # FastAPI + WebSocket Web UI
│   ├── app.py
│   ├── agent_ws.py       # Async agent loop (native + XML modes)
│   └── static/index.html
├── website/              # Landing page (Vercel)
│   └── index.html
├── api/chat.js           # Vercel serverless cloud fallback (Groq)
├── scripts/              # Bootstrap scripts for partner integrations
│   ├── setup_atlas_indexes.py
│   ├── elastic_setup_indexes.py
│   ├── elastic_ingest_gitlab.py
│   └── elastic_ingest_logs.py
├── bindplane/config.yaml             # OTel collector → Dynatrace + Elastic
├── docker-compose.bindplane.yml      # Bindplane container
├── docker-compose.elastic.yml        # Elasticsearch + Kibana
├── Makefile                          # Common dev / demo tasks
├── docs/
│   ├── dynatrace-setup.md
│   ├── atlas-setup.md
│   └── elastic-setup.md
└── tests/                # 103 passing

Documentation

Dynatrace observability — OpenTelemetry spans, metrics, Bindplane pipeline
MongoDB Atlas memory — cross-session continuity, semantic recall
Elastic knowledge search — hybrid search, ES|QL, agent self-logs
Local LLM Setup — best practices for Ollama
Permission System — allow/ask/deny configuration
Tool System — tool definitions and sandboxing
OSS Comparison — vs. Claude Code, Aider, Open Interpreter, Goose

License

MIT

Acknowledgments

Ollama — Local LLM runtime
FastAPI — Web framework
Rich — Terminal formatting
OpenTelemetry — Observability standard
Bindplane — OTel collector
Dynatrace, MongoDB Atlas, Elastic — Hackathon partner backends

Project details

Release history Release notifications | RSS feed

0.2.3

May 27, 2026

This version

0.2.2

May 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmai_agent-0.2.2.tar.gz (74.4 kB view details)

Uploaded May 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmai_agent-0.2.2-py3-none-any.whl (69.7 kB view details)

Uploaded May 27, 2026 Python 3

File details

Details for the file llmai_agent-0.2.2.tar.gz.

File metadata

Download URL: llmai_agent-0.2.2.tar.gz
Upload date: May 27, 2026
Size: 74.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmai_agent-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`f65de472ee03db1ed6ccf9bb1be9a134ca0a3c3de8c1014d0133565ce84ad29b`
MD5	`1529733a6c4e017b18a5de3b9701b7d1`
BLAKE2b-256	`231e675e742f0b91379736355b6b16f20a78c72f69d9bb74fa51e13477a1bbfc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmai_agent-0.2.2.tar.gz:

Publisher: release.yml on sechan9999/LLMai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmai_agent-0.2.2.tar.gz
- Subject digest: f65de472ee03db1ed6ccf9bb1be9a134ca0a3c3de8c1014d0133565ce84ad29b
- Sigstore transparency entry: 1644518343
- Sigstore integration time: May 27, 2026
Source repository:
- Permalink: sechan9999/LLMai@f2d028967ed6cead3283c0ea8f403137022b741a
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/sechan9999
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f2d028967ed6cead3283c0ea8f403137022b741a
- Trigger Event: push

File details

Details for the file llmai_agent-0.2.2-py3-none-any.whl.

File metadata

Download URL: llmai_agent-0.2.2-py3-none-any.whl
Upload date: May 27, 2026
Size: 69.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmai_agent-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8a058fa0551a5e1a358aabf096dc28ab27b816ff5f99770c8d9be76595de4768`
MD5	`9809e2e2d1beb7249c4f94353ae63a83`
BLAKE2b-256	`c5423a97f529222b0abcfea09543b24524bd698e707e20d6ce5be69486706620`

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmai_agent-0.2.2-py3-none-any.whl:

Publisher: release.yml on sechan9999/LLMai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: llmai_agent-0.2.2-py3-none-any.whl
- Subject digest: 8a058fa0551a5e1a358aabf096dc28ab27b816ff5f99770c8d9be76595de4768
- Sigstore transparency entry: 1644518423
- Sigstore integration time: May 27, 2026
Source repository:
- Permalink: sechan9999/LLMai@f2d028967ed6cead3283c0ea8f403137022b741a
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/sechan9999
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@f2d028967ed6cead3283c0ea8f403137022b741a
- Trigger Event: push

llmai-agent 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LLMai

What LLMai is

Stack at a glance

Two interfaces

Quick start (core only, no partner integrations)

Adding the three layers (10-15 min each)

1. Dynatrace — observe every tool call

2. MongoDB Atlas — remember across sessions

3. Elastic — search org knowledge before writing code

One-command demo bootstrap

Key features

Tools (when their layer is enabled)

Configuration

CLI commands

Supported models

Cloud backends (optional)

Project structure

Documentation

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance