Skip to main content

Mnemo AI โ€” a local agentic AI assistant (LangGraph + MCP) that learns and remembers, with multi-provider model support.

Project description

Mnemo AI

Mnemo AI

PyPI Python Version License: MIT Code style: black

A local agentic AI assistant with MCP (Model Context Protocol) integration, RAG capabilities, and intelligent conversation management. Built on LangGraph with LangChain for multi-provider LLM support (Ollama, Amazon Bedrock, OpenAI, Anthropic, Amazon SageMaker AI, LiteLLM).

Demo

๐Ÿ“‘ Table of Contents

โœจ Key Features

  • ๐Ÿค– Multi-Model Support: Ollama (local), Amazon Bedrock, OpenAI, Anthropic (Claude), Amazon SageMaker AI, LiteLLM (100+ providers)
  • ๐Ÿ”ง MCP Tool System: Extensible tool architecture via Model Context Protocol
  • ๐Ÿ“š RAG (Retrieval-Augmented Generation): Automatic document indexing and semantic search (if enabled)
  • ๐Ÿ’ฌ Advanced Chat Interface: Multiline input, command system, conversation save/load
  • ๐Ÿง  User Profile Learning: Automatic learning from interactions for personalized responses
  • ๐Ÿงฉ Episodic Memory: Learns from successful task completions and retrieves similar solutions
  • ๐Ÿ“– ACE Playbook: Learns strategies from successes AND failures via Agentic Context Engineering
  • ๐Ÿ“Š Training Data Collection: Mark high-quality responses for SFT training
  • ๐Ÿ” Web Search: Integrated Brave Search API (if available)
  • ๐ŸŒ Web Crawler: Extract and index content from web pages
  • ๐Ÿ–ผ๏ธ Vision Support: Image analysis with vision models (if available)
  • ๐Ÿ“ File Operations: Read/write/edit with support for text, CSV, JSON, PDF, DOCX
  • โœ๏ธ Precise File Editing: Safe string replacement with validation and uniqueness checking
  • ๐Ÿ”Ž Fast Search Tools: Glob pattern matching and ripgrep content search (10-100x faster)
  • ๐Ÿ“‹ Todo Tracking: Multi-step task management with real-time progress updates
  • โšก Bash Execution: Direct shell command execution with intelligent error handling
  • ๐Ÿ›ก๏ธ Git Safety: Protection against dangerous git operations with smart warnings
  • ๐Ÿ“ Plan Mode: Implementation planning workflow for complex tasks
  • ๐Ÿ”„ Background Tasks: Run long operations in parallel without blocking

๐Ÿ“– Project Structure

mnemoai/                      # repo root
โ”œโ”€โ”€ pyproject.toml                          # Packaging + `mnemoai` CLI entry point
โ”œโ”€โ”€ requirements.txt                        # Dependencies
โ”œโ”€โ”€ README.md                               # This file
โ”œโ”€โ”€ pytest.ini                              # Pytest configuration
โ”œโ”€โ”€ requirements-dev.txt                    # Dev/test dependencies
โ”‚
โ”œโ”€โ”€ src/mnemoai/              # The single package (src layout)
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ __main__.py                         # `python -m mnemoai`
โ”‚   โ”œโ”€โ”€ main.py                             # Entry point (cli())
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ client/                             # Client layer
โ”‚   โ”‚   โ”œโ”€โ”€ client.py                       # LangGraphClient facade (lifecycle, MCP, query)
โ”‚   โ”‚   โ”œโ”€โ”€ mcp_tool_wrapper.py             # MCPโ†’LangChain adapter + MultiMCPClient (built-in + external servers)
โ”‚   โ”‚   โ”œโ”€โ”€ mcp_config.py                   # Loads external MCP servers from mcp.json
โ”‚   โ”‚   โ”œโ”€โ”€ agent/                          # Agent loop
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ agent.py                    # LangGraph StateGraph agent with streaming
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ router.py                   # Query classifier and routing
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ orchestrator.py             # Task decomposition and worker orchestration
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ reasoning_utils.py          # Reasoning/thinking helpers for aux LLM calls
โ”‚   โ”‚   โ”œโ”€โ”€ ui/                             # User interface
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ chat_interface.py           # Chat loop
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ spinner.py                  # Loading animations
โ”‚   โ”‚   โ”œโ”€โ”€ managers/                       # Business logic
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ agent_conversation_manager.py  # Conversation state and token tracking
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ user_profile_manager.py     # User profiling and learning
โ”‚   โ”‚   โ””โ”€โ”€ memory/                         # Memory systems
โ”‚   โ”‚       โ”œโ”€โ”€ episodic_memory.py          # Episodic memory manager
โ”‚   โ”‚       โ”œโ”€โ”€ reflector.py                # ACE Reflector - extracts strategies
โ”‚   โ”‚       โ”œโ”€โ”€ playbook_store.py           # ACE Playbook - stores learned strategies
โ”‚   โ”‚       โ”œโ”€โ”€ faiss_store.py              # FAISS episodic store
โ”‚   โ”‚       โ””โ”€โ”€ chroma_store.py             # ChromaDB episodic store
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ server/                             # MCP server layer
โ”‚   โ”‚   โ”œโ”€โ”€ server.py                       # FastMCP server (run as a subprocess)
โ”‚   โ”‚   โ”œโ”€โ”€ error_handler.py                # @tool_error_handler decorator (shared)
โ”‚   โ”‚   โ””โ”€โ”€ tools/                          # Tool implementations
โ”‚   โ”‚       โ”œโ”€โ”€ tools_manager.py            # Tool registration
โ”‚   โ”‚       โ”œโ”€โ”€ fs_read.py / fs_write.py / file_edit.py / file_search.py
โ”‚   โ”‚       โ”œโ”€โ”€ execute_bash.py / git_safety.py / todo_manager.py / plan_mode.py
โ”‚   โ”‚       โ”œโ”€โ”€ background_tasks.py / web_crawler.py / web_search.py
โ”‚   โ”‚       โ”œโ”€โ”€ describe_image.py / rag_tool.py
โ”‚   โ”‚       โ”œโ”€โ”€ rag/                        # RAG system (session, vector_store_controller, stores)
โ”‚   โ”‚       โ””โ”€โ”€ readers/                    # File readers (csv/json/pdf/docx/line/dir/search + chunking)
โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€ models/                             # Model layer
โ”‚   โ”‚   โ”œโ”€โ”€ provider_params.py              # Single source of truth: per-provider config keys
โ”‚   โ”‚   โ”œโ”€โ”€ mantle_factory.py               # Bedrock Mantle model factory (multi-protocol)
โ”‚   โ”‚   โ”œโ”€โ”€ controllers/                    # Provider-dispatching controllers
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ base_model_controller.py    # Minimal shared base
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ llm_controller.py           # LLM initialization
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ vision_model_controller.py  # Vision model initialization
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ embeddings_controller.py    # Embeddings initialization
โ”‚   โ”‚   โ””โ”€โ”€ chat_models/                    # Concrete LangChain ChatModel subclasses
โ”‚   โ”‚       โ”œโ”€โ”€ chat_ollama_wrapper.py      # Ollama model with penalty support
โ”‚   โ”‚       โ””โ”€โ”€ sagemaker_chat.py           # SageMaker ChatModel for LangChain
โ”‚   โ”‚
โ”‚   โ””โ”€โ”€ utils/                              # Utilities
โ”‚       โ”œโ”€โ”€ config.py                       # Config loader
โ”‚       โ”œโ”€โ”€ configurator.py                 # First-run setup + /config & /model flows
โ”‚       โ”œโ”€โ”€ paths.py                        # Central path helper (~/.mnemoai)
โ”‚       โ”œโ”€โ”€ logger.py                       # Logging utilities
โ”‚       โ”œโ”€โ”€ bm25.py                         # Lightweight BM25 (hybrid search)
โ”‚       โ”œโ”€โ”€ config.yaml.example             # Config templates (also .bedrock / .bedrock.mantle)
โ”‚       โ”œโ”€โ”€ mcp.json.example                # External MCP servers template
โ”‚       โ””โ”€โ”€ formatting/                     # Text formatting (code/url/response)
โ”‚
โ”œโ”€โ”€ tests/                                  # Test suite (pytest)
โ”‚   โ”œโ”€โ”€ conftest.py                         # Puts src/ on sys.path
โ”‚   โ”œโ”€โ”€ unit/                               # Fast, deterministic, no deps
โ”‚   โ””โ”€โ”€ integration/                        # Live agent + Ollama + MCP
โ”‚
โ”œโ”€โ”€ docs/                                   # ARCHITECTURE.md (detailed file map)
โ””โ”€โ”€ bash/                                   # Helper scripts
    โ”œโ”€โ”€ system-command-app/                 # `mnemoai` wrapper script
    โ”œโ”€โ”€ ollama-freeup-vram/                 # VRAM management
    โ””โ”€โ”€ ollama-env-mac/                     # Ollama config

๐Ÿ—๏ธ Architecture

High-Level Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         main.py                             โ”‚
โ”‚                    (Application Entry)                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚                               โ”‚
              โ–ผ                               โ–ผ
      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
      โ”‚ LangGraphClient โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚  MCP Server      โ”‚
      โ”‚  (client.py)    โ”‚            โ”‚  (server.py)     โ”‚
      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚                              โ”‚
          โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”                        โ–ผ
          โ”‚          โ”‚                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ–ผ          โ–ผ                   โ”‚  Tools   โ”‚
      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
      โ”‚  UI    โ”‚ โ”‚ Managers โ”‚                 โ”‚
      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”
          โ”‚          โ”‚                   โ”‚         โ”‚
          โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜                   โ–ผ         โ–ผ
               โ–ผ                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”‚ Readers  โ”‚ โ”‚ RAG โ”‚
          โ”‚LangGraph โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”˜
          โ”‚  Agent   โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿš€ Quick Start

Prerequisites

Required:

  • Python 3.11+
  • At least one LLM provider configured and accessible (see below)

LLM Providers (choose at least one):

Provider Requirements
Ollama (local, recommended for getting started) Install Ollama, then pull a model: ollama pull qwen3:4b
Amazon Bedrock AWS CLI configured (aws configure) with Bedrock access in your region
Amazon SageMaker AI AWS CLI configured with a deployed SageMaker endpoint
OpenAI Set OPENAI_API_KEY environment variable
Anthropic (Claude API) Set ANTHROPIC_API_KEY environment variable
LiteLLM Depends on the underlying provider (see LiteLLM docs)

Optional:

  • ripgrep โ€” 10-100x faster content search (see installation below)
  • Embedding model โ€” Required if you enable RAG, Episodic Memory, or ACE Playbook (see Feature Toggles)
  • Vision model โ€” Required for image analysis (describe_image tool)
  • Brave Search API key โ€” Required for web search (get one here)

Installation

Recommended: install from PyPI

The published package is mnemoai-assistant (the import name and the CLI command are both mnemoai). No clone needed โ€” install it into an isolated environment and get the mnemoai command on your PATH:

uv tool install mnemoai-assistant     # or: pipx install mnemoai-assistant

Or into the current environment with pip:

pip install mnemoai-assistant

Then configure a user config (see step 4 below) and run:

mnemoai            # verbose (shows thinking)
mnemoai --no-verbose

To upgrade: uv tool upgrade mnemoai-assistant (or pip install -U mnemoai-assistant). To remove: uv tool uninstall mnemoai-assistant.

This is the best choice if you just want to use the assistant. Install from a checkout (below) instead if you plan to edit the source.

Install from a checkout

  1. Clone the repository:
git clone https://github.com/brunopistone/mnemoai.git
cd mnemoai
  1. Install the assistant (choose one):

Option 1: install as a CLI command (uv tool install)

This installs the project into its own isolated environment and puts mnemoai on your PATH, so you can run it from any directory (macOS and Linux) without activating anything:

uv tool install .        # or: pipx install .

Then configure a user config (see step 4) and run:

mnemoai            # verbose (shows thinking)
mnemoai --no-verbose

To upgrade after pulling changes: uv tool install --force .. To remove: uv tool uninstall mnemoai.

Pick "run from a checkout" below instead if you plan to actively edit the code, since that runs your working tree directly with no reinstall step.

Option 2: run from a checkout

Set up an environment (choose one), which lets you run the assistant directly from the repo while editing the source live. Because the code uses a src/ layout, run it as a module with src/ on the path:

PYTHONPATH=src python -m mnemoai            # verbose
PYTHONPATH=src python -m mnemoai --no-verbose

(Or pip install -e . once, then just mnemoai.)

Option A: venv

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Option B: uv

uv venv
uv pip install -r requirements.txt

Option C: conda

conda create -n mnemoai python=3.11
conda activate mnemoai
pip install -r requirements.txt

Get the mnemoai command for a checkout install

So you don't have to cd into the repo every time, symlink the bundled wrapper script onto your PATH. It activates the project environment, then runs the app (PYTHONPATH=src python -m mnemoai):

chmod +x bash/system-command-app/mnemoai-wrapper.sh
ln -sf "$(pwd)/bash/system-command-app/mnemoai-wrapper.sh" /usr/local/bin/mnemoai

Now mnemoai works from any directory and always reflects your latest edits. The wrapper auto-activates a project-local .venv (Options A and B) if present, otherwise it falls back to a conda env named mnemoai (Option C) โ€” edit the script if your environment differs.

  1. Install ripgrep (optional but recommended for fast search):

Ripgrep provides 10-100x faster content search than traditional grep. Required for grep_search tool.

macOS:

brew install ripgrep

Ubuntu/Debian:

sudo apt install ripgrep

Fedora/RHEL:

sudo dnf install ripgrep

Windows (via Chocolatey):

choco install ripgrep

From source:

cargo install ripgrep

Verify installation:

rg --version  # Should show ripgrep version

If ripgrep is not installed, the assistant will automatically fall back to using execute_bash with standard grep, but performance will be significantly slower.

  1. Configure the application:

First-run setup (easiest). If you start the assistant and no config is found, an interactive configurator runs automatically. It walks you through: the LLM provider (Ollama / Bedrock / Mantle / OpenAI / Anthropic / Amazon SageMaker AI / LiteLLM) plus chat model, connection details (Ollama host/port; AWS region; for Mantle the API protocol โ€” chat_completions / responses / anthropic; SageMaker region + input format; LiteLLM API base/key; OpenAI uses OPENAI_API_KEY; Anthropic uses ANTHROPIC_API_KEY with an optional base URL), optional max output tokens (blank or none uses the provider default), and a mandatory max context window (defaults to 65536); the vision model (reusing the chat model's host/region, with its own Mantle protocol and optional max output tokens); your profile name; an optional Brave Search key; and each feature toggle (RAG, episodic memory, ACE playbook, web crawler, query routing, orchestration, user profiling). Every prompt is pre-filled with the template's default, so you can press Enter through the ones you don't care about. It then writes a ready-to-use ~/.mnemoai/config/config.yaml from the matching template. Just run:

mnemoai      # or, from a checkout: PYTHONPATH=src python -m mnemoai

and follow the prompts. You can re-edit the generated file any time to fine-tune models, prompts, and feature toggles.

Manual setup. Prefer to write it yourself? Copy a template (they live inside the package, under src/mnemoai/utils/):

cp src/mnemoai/utils/config.yaml.example src/mnemoai/utils/config.yaml

Edit that config.yaml with your settings. This file is git-ignored to protect your API keys. At minimum, configure your LLM provider.

The config file is resolved in this order (first match wins):

  1. $MNEMOAI_CONFIG โ€” explicit path (handy for switching between provider configs)
  2. ~/.mnemoai/config/config.yaml โ€” user config used by the installed mnemoai command
  3. ~/.mnemoai/config.yaml โ€” legacy pre-subfolder location (still read if present)
  4. <package>/utils/config.yaml โ€” package-relative fallback (used when running from a checkout)

On first run mnemoai seeds ~/.mnemoai/config/ and ~/.mnemoai/mcp/ with copies of the bundled examples (config.yaml*.example, mcp.json.example) so you have them to read right next to your live files. If you installed the CLI with uv tool install (the recommended option), put your config in the user location:

# Examples are auto-copied on first run; just copy one to config.yaml and edit:
cp ~/.mnemoai/config/config.yaml.example        ~/.mnemoai/config/config.yaml
# or, for Bedrock / Mantle:
# cp ~/.mnemoai/config/config.yaml.bedrock.example        ~/.mnemoai/config/config.yaml
# cp ~/.mnemoai/config/config.yaml.bedrock.mantle.example ~/.mnemoai/config/config.yaml

At minimum, configure your LLM provider:

For Ollama (quickest setup):

# Pull a model first
ollama pull qwen3:4b
# utils/config.yaml (minimal)
MODEL_ID:
  NAME: qwen3:4b
  TYPE: ollama
  HOST: localhost
  PORT: 11434
  TEMPERATURE: 0.6

# Profile name (used for session data isolation)
PROFILE:
  NAME: default

# Everything else can be left at defaults or disabled
ENABLE_RAG: false
ENABLE_EPISODIC_MEMORY: false
ENABLE_PLAYBOOK: false
ENABLE_WEB_SEARCH: false
ENABLE_WEB_CRAWL: false

See Configuration for all options and Feature Toggles for enabling advanced features.

  1. Run the assistant:

If you installed with uv tool install (recommended), run the command from anywhere:

mnemoai

If you set up a checkout and symlinked the wrapper, the same command works. Otherwise, run it from the repo directory:

PYTHONPATH=src python -m mnemoai

See bash/system-command-app/README.md for details on the wrapper script.

๐Ÿ”€ Feature Toggles

All advanced features can be independently enabled or disabled in your local utils/config.yaml (copied from config.yaml.example). Here is a quick reference:

Feature Config Key Default Dependencies
RAG (document indexing & search) ENABLE_RAG: true true Embedding model (RAG.EMBED_MODEL_ID)
Episodic Memory (learn from past tasks) ENABLE_EPISODIC_MEMORY: true true Embedding model (RAG.EMBED_MODEL_ID)
ACE Playbook (learn strategies from success/failure) ENABLE_PLAYBOOK: true true None (embeddings optional for refinement)
User Profiling (personalized responses) PROFILE.USE_PROFILING: true true Activates after 5+ interactions
Web Search ENABLE_WEB_SEARCH: true true BRAVE_API_KEY configured
Web Crawler ENABLE_WEB_CRAWL: true true None
Vision (image analysis) Configure VISION_MODEL_ID Disabled if not set Vision-capable model
Bash Confirmation (prompt before each shell command) REQUIRE_BASH_CONFIRMATION: true true None (auto-skips when non-interactive)
Write Confirmation (prompt before each file write) REQUIRE_WRITE_CONFIRMATION: true true None (auto-skips when non-interactive)
Verbose Mode (show thinking process) CLI flag --no-verbose Enabled Supported by model

Dependency note: RAG, Episodic Memory, and ACE Playbook refinement all require a working embedding model. If the embedding model is unavailable, the system falls back to SHA256-based deterministic embeddings with degraded semantic search quality. Configure RAG.EMBED_MODEL_ID in config.yaml to use a real embedding model (see Embeddings Model).

๐Ÿ’ก Usage

Basic Chat

Simply type your questions and press Enter. The assistant will respond using available tools when needed.

You: What files are in the current directory?
Assistant: [Uses fs_read tool to list directory contents]

You: Read the README.md file
Assistant: [Uses fs_read tool and displays content]

Commands

Command Description
/exit or /quit Exit the application
/clear Clear conversation history and RAG index
/save Save current conversation
/load <path> Load a saved conversation
/good Mark last response as good (for SFT training)
/compact [focus] Summarize older turns to shrink context (optional focus instructions)
/config Re-run the interactive configurator (overwrites config.yaml, then restarts the app in place to apply)
/model Override just one model โ€” chat (LLM), vision, or embeddings โ€” leaving the rest of config.yaml untouched, then restart in place
/params Tune a model's inference parameters (temperature, top_p, top_k, penalties, reasoning, stop, stream, โ€ฆ) โ€” only the params the chosen provider supports are offered, then restart in place
/mcp List the configured MCP servers (built-in + any from mcp.json), their connection status, and tool counts

Keyboard Shortcuts

  • Ctrl+J: Insert new line in input
  • Enter: Submit message
  • Ctrl+C: Interrupt operation (press twice to exit)

Verbose Mode

Control thinking process visibility:

mnemoai              # Verbose mode (shows thinking)
mnemoai --no-verbose # Hide thinking process
# from a checkout: PYTHONPATH=src python -m mnemoai [--no-verbose]

Component Breakdown

1. Client Layer (client/)

The client manages the conversation flow and user interaction.

  • client.py: Core LangGraph client
    • Initializes MCP connection
    • Manages conversation state
    • Handles model configuration
    • Coordinates managers (profile, conversation)
  • agent.py: LangGraph agent implementation
    • State graph with agent and tools nodes
    • Streaming support with reasoning display
    • Code syntax highlighting
  • router.py: Query classifier and routing
    • Classifies queries into categories (simple_qa, code, research, knowledge, full)
    • Routes each category to a specialized tool subset
    • Configurable classifier prompt via ROUTING_PROMPT in config
  • orchestrator.py: Task decomposition and worker orchestration
    • Decomposes complex tasks into ordered subtasks with category assignments
    • Configurable orchestrator and aggregator prompts via config
  • reasoning_utils.py: Shared reasoning/thinking helpers
    • Temporarily disables reasoning for auxiliary LLM calls (routing, task decomposition) so output lands in the response content
    • Extracts visible text from <think> tags and Bedrock thinking blocks
  • mcp_tool_wrapper.py: MCP to LangChain adapter
    • Wraps MCP tools as LangChain BaseTool
    • Handles async/sync conversion
  • ui/: User interface components
    • chat_interface.py: Interactive chat loop with command handling
    • spinner.py: Loading animations
  • managers/: Business logic
    • agent_conversation_manager.py: Conversation state and token tracking
    • user_profile_manager.py: Automatic user profiling and learning

2. Server Layer (server/)

MCP server that provides tools to the LLM.

  • server.py: FastMCP server initialization
  • error_handler.py: @tool_error_handler decorator (shared by all tools)
  • tools/: Tool implementations
    • tools_manager.py: Centralized tool registration and utilities
    • fs_read.py: File reading (text, CSV, JSON, PDF, DOCX)
    • fs_write.py: File writing (dry-run preview); writes are hard-gated client-side by REQUIRE_WRITE_CONFIRMATION
    • file_edit.py: Precise string replacement with validation and uniqueness checking
    • execute_bash.py: Shell command execution with intelligent error handling
    • file_search.py: Fast file/content search (glob patterns + ripgrep)
    • todo_manager.py: Todo list management for multi-step tasks
    • web_search.py: Brave Search integration
    • web_crawler.py: Web page content extraction with RAG integration
    • describe_image.py: Vision model image analysis
    • rag_tool.py: RAG tools registration
    • rag/: RAG system
      • session.py: Session-scoped RAG management with hybrid search
      • vector_store_controller.py: Vector store abstraction layer
      • faiss_store.py: FAISS vector store implementation
      • chroma_store.py: ChromaDB vector store implementation
    • readers/: Specialized file readers
      • line_reader.py, directory_reader.py, search_reader.py
      • csv_reader.py, json_reader.py
      • pdf_reader.py, docx_reader.py
      • chunking_helper.py: Document chunking for RAG

3. Models Layer (models/)

Model controllers and custom implementations.

  • provider_params.py: Single source of truth for the config keys each provider consumes (per modality); controllers build their client kwargs from it via build_kwargs, and /model prunes unsupported keys from it
  • mantle_factory.py: Bedrock Mantle factory (chat_completions / responses / anthropic protocols), shared by the LLM and vision controllers
  • controllers/ (provider-dispatching model initialization):
    • base_model_controller.py: Minimal shared base type for the controllers
    • llm_controller.py: LLM model initialization (Bedrock, Mantle, Ollama, OpenAI, Anthropic, SageMaker AI, LiteLLM)
    • vision_model_controller.py: Vision model initialization
    • embeddings_controller.py: Embedding model initialization for RAG
  • chat_models/ (concrete LangChain ChatModel subclasses):
    • chat_ollama_wrapper.py: Extends ChatOllama with presence_penalty and frequency_penalty support
    • sagemaker_chat.py: Full LangChain BaseChatModel for SageMaker endpoints (streaming, tool calling, reasoning)

4. Utils Layer (utils/)

Shared utilities and configuration.

  • config.py: Configuration loader
  • configurator.py: First-run interactive setup (when no config resolves) and the /config (full reconfigure) and /model (override one model section) chat commands
  • paths.py: Central path helper โ€” single source of truth for the app home (~/.mnemoai, override with $MNEMOAI_HOME) and all runtime subdirectories (config, plans, tasks, per-profile, per-model)
  • config.yaml.example: Configuration template (copy to config.yaml and add your settings; .bedrock and .bedrock.mantle variants also provided)
  • bm25.py: Lightweight BM25 implementation for hybrid (semantic + keyword) search
  • logger.py: Logging utilities (stderr output)
  • formatting/: Text formatting
    • code_formatter.py: Code syntax highlighting
    • url_formatter.py: URL highlighting
    • response_parser.py: Response processing

Data Flow

  1. User Input โ†’ ChatInterface โ†’ LangGraphClient
  2. Client โ†’ Invokes LangGraph agent with MCP tools
  3. Classifier โ†’ Routes query to a category (simpleqa, code, research, knowledge, full) (_if routing enabled)
  4. Orchestrator โ†’ For full tasks: decomposes into subtasks, spawns workers, aggregates results (if orchestration enabled)
  5. LangGraph โ†’ Executes agent node with route-specific tools, decides to use tools
  6. MCP Server โ†’ Executes tool (e.g., fs_read, web_search, RAG)
  7. Tool Result โ†’ Returned to agent via tools node
  8. LangGraph โ†’ Continues agent loop until response complete
  9. Response โ†’ Displayed to user via ChatInterface

Session Management

Each chat session has a unique ID used for:

  • RAG document indexing (session-scoped)
  • Chunk caching for file summarization
  • Training data collection (SFT markers)

Session data is stored in ~/.mnemoai/{profile_name}/:

~/.mnemoai/
โ””โ”€โ”€ {profile_name}/
    โ”œโ”€โ”€ conversations/           # Saved conversations
    โ”œโ”€โ”€ profiles/                # User profiles
    โ”œโ”€โ”€ todos/                   # Todo list data
    โ”œโ”€โ”€ rag_session_id.txt       # Current RAG session
    โ”œโ”€โ”€ rag_store_*.faiss        # FAISS vector index (or ChromaDB directory)
    โ”œโ”€โ”€ chunk_cache_*.db         # SQLite chunk cache
    โ””โ”€โ”€ models/                  # Per-model memory (isolated by chat model)
        โ””โ”€โ”€ {sanitized_model}/   # e.g. global.anthropic.claude-fable-5
            โ”œโ”€โ”€ episodic_memory/ # Episodic memory store (FAISS or ChromaDB)
            โ””โ”€โ”€ playbook/        # ACE playbook strategies and metrics

Model-scoped memory: episodic memory and the playbook live under models/{model}/ so trying a different chat model doesn't contaminate the memory/strategies learned with another. Conversations, todos, RAG, and the user profile remain shared across models.

Context Compaction

To keep long conversations within the model's context window, the assistant compacts history by summarizing it:

  • Automatic โ€” after a turn pushes the conversation past MAX_CONVERSATION_TOKENS, older messages are summarized into the system prompt while the most recent LLM.KEEP_RECENT_MESSAGES turns are kept verbatim.
  • Manual โ€” run /compact any time (optionally /compact <focus instructions> to steer what the summary emphasizes). Manual compaction keeps a smaller recent window (LLM.MANUAL_COMPACT_KEEP_RECENT).

The kept-verbatim window is bounded by both a message count and a token budget (LLM.KEEP_RECENT_TOKEN_BUDGET, default 25% of MAX_CONVERSATION_TOKENS). Walking newestโ†’oldest, a message that would exceed the budget is summarized instead of kept โ€” so a single oversized recent message (e.g. a pasted document that alone fills the context window) cannot survive compaction verbatim.

The summary preserves topics, decisions, and tool calls/results (which tools ran, their inputs, and outcomes), so the agent retains actionable context after compacting.

๐Ÿš€ Productivity Tools

The assistant includes specialized tools for efficient code and file manipulation:

๐Ÿ“‹ Todo List Management

Track multi-step tasks with automatic status management:

Tools:

  • todo_write(todos): Update the todo list
  • todo_read(): View current todos
  • todo_clear(): Clear all todos

Features:

  • Three states: pending, in_progress, completed
  • Enforces exactly ONE task in progress at a time
  • Real-time progress tracking
  • Stored in ~/.mnemoai/{profile}/todos/current_todos.json

Usage Example:

You: Implement user authentication
Assistant: [Creates todos for: database setup, API endpoints, frontend integration, testing]
Assistant: [Marks first todo as in_progress]
Assistant: [Completes each step, updating todos in real-time]

๐Ÿ”Ž Fast Search Tools

High-performance file and content searching:

Glob Search (File Names)

Find files by name patterns:

glob_search(pattern="**/*.py")  # All Python files recursively
glob_search(pattern="src/**/*.ts", max_results=100)  # TypeScript in src/
glob_search(pattern="test_*.py", sort_by_mtime=False)  # Unsorted for speed

Parameters:

  • pattern: Glob pattern (e.g., **/*.py, *.{yaml,json})
  • path: Directory to search (default: current directory)
  • max_results: Limit results (default: 1000, use 0 for unlimited)
  • sort_by_mtime: Sort by modification time (default: True)

Performance: Best for project/codebase searches. For system-wide searches (entire home directory), the assistant automatically uses find command instead.

Grep Search (File Content)

Search within file contents using ripgrep:

grep_search(pattern="class Foo")  # Find class definitions
grep_search(pattern="TODO|FIXME", file_pattern="*.py", case_insensitive=True)
grep_search(pattern="import React", output_mode="content")  # Show matched lines

Parameters:

  • pattern: Regex pattern to search for
  • path: Directory to search (default: current directory)
  • file_pattern: Filter by file type (e.g., *.py, *.{ts,tsx})
  • case_insensitive: Case-insensitive search (default: False)
  • output_mode: files_with_matches (default), content, or count
  • context_lines: Lines of context around matches
  • max_results: Maximum matches per file (default: 100)

Requirements: Requires ripgrep installed (see Installation section)

Performance: 10-100x faster than traditional grep for large codebases.

โœ๏ธ Precise File Editing

Safe string replacement with validation:

file_edit(
    file_path="/path/to/file.py",
    old_string="def old_function():\n    pass",
    new_string="def new_function():\n    return True",
    replace_all=False  # Requires uniqueness (default)
)

Safety Features:

  • Validates file exists before editing
  • Checks that old_string exists in file
  • Enforces uniqueness (prevents accidental multiple replacements)
  • Provides detailed error messages with troubleshooting steps
  • Returns line count changes

Best Practice Workflow:

  1. Read the file first with fs_read
  2. Copy the EXACT text you want to replace (including whitespace)
  3. Create the new version with your changes
  4. Call file_edit with exact strings

Error Handling: If the string isn't unique, the tool provides the line numbers where it appears so you can add more context.

๐Ÿ›ก๏ธ Enhanced Error Handling

All tools now provide intelligent error messages with troubleshooting guidance:

Example Error Response:

{
  "error": true,
  "error_type": "FileNotFoundError",
  "message": "File or directory not found: /path/to/file.txt",
  "next_steps": [
    "Verify the file path is correct",
    "Use glob_search to find files by pattern",
    "Check with execute_bash('ls -la /parent/dir')",
    "Ensure you have read permissions"
  ],
  "original_error": "..."
}

Handled Error Types:

  • FileNotFoundError
  • PermissionError
  • IsADirectoryError
  • JSONDecodeError
  • Encoding errors
  • Command execution errors
  • Timeout errors

๐Ÿ” Action Confirmation (bash & file writes)

Destructive tools ask for explicit confirmation before they run (Claude Code-style) โ€” shell commands (execute_bash) and file modifications (fs_write, file_edit):

โ–ถ Run shell command?
  rm -rf build/
  Proceed? (y/N):

โ–ถ Write to file?
  create ~/script.py
  Proceed? (y/N):

Only an explicit y/yes proceeds; anything else (including Enter) declines, and the model is told the user declined. This is a hard gate enforced client-side โ€” the prompt always fires regardless of what the model does, because the client owns the terminal (the MCP server is a piped subprocess and can't prompt). For fs_write only the actual write is gated, not its dry_run preview.

  • Toggles: REQUIRE_BASH_CONFIRMATION and REQUIRE_WRITE_CONFIRMATION (both default true). Set either to false for trusted/automation setups.
  • Non-interactive runs (no TTY โ€” tests, pipes, CI) auto-proceed so they don't hang.

๐Ÿ›ก๏ธ Git Safety

Safe git operations with protection against common mistakes:

Tools:

  • git_safe(command="...") - Execute git commands with safety checks
  • git_status_safe() - Comprehensive status with warnings
  • git_commit_safe(message="...", add_all=True) - Safe commits with staging

Protected Operations:

Operation Protection
Force push to main/master Blocked
git reset --hard Warning + confirmation required
git push --force Warning (use --force-with-lease)
git commit --amend Checks if already pushed
Skip hooks (--no-verify) Warning
Force delete branch (-D) Warning

Example:

# Safe - uses git_safe with protections
git_safe(command="push origin feature-branch")

# Dangerous - requires confirmation
git_safe(command="reset --hard HEAD~1", allow_dangerous=True, reason="Discarding failed experiment")

๐Ÿ“ Plan Mode

Implementation planning workflow for complex tasks:

Workflow:

  1. enter_plan_mode(task_description="Add user authentication")
  2. Explore codebase with search tools
  3. add_plan_step(step_number=1, title="Create user model", description="...")
  4. add_plan_file(file_path="models/user.py", action="create")
  5. add_plan_risk(risk="Migration needed", mitigation="Add migration script")
  6. present_plan() - Show user for approval
  7. approve_plan() + exit_plan_mode() - Start implementing

When to Use:

  • New feature with multiple files
  • Architectural decisions needed
  • Multi-step refactoring
  • Unclear requirements

Plan Storage: ~/.mnemoai/plans/current_plan.json Task Output: ~/.mnemoai/tasks/

๐Ÿ”„ Background Tasks

Run long operations in parallel without blocking:

Tools:

  • start_background_task(command="...", description="...") - Start task
  • get_task_status(task_id="...") - Check progress
  • get_task_output(task_id="...") - Get output
  • list_background_tasks() - See all tasks
  • cancel_background_task(task_id="...") - Stop task
  • wait_for_task(task_id="...", timeout_seconds=300) - Wait for completion

When to Use:

  • Running full test suites
  • Building large projects
  • Installing dependencies
  • Running linters on entire codebase
  • Any command > 30 seconds

Example:

# Start tests in background
result = start_background_task(command="pytest", description="Running tests")
# Returns: {"task_id": "abc123", ...}

# Check status later
get_task_status(task_id="abc123")

# Get output when done
get_task_output(task_id="abc123", tail_lines=50)

Task Storage: Output logs saved to ~/.mnemoai/tasks/

๐Ÿ”ง Configuration

Model Configuration

The assistant supports multiple model types:

Amazon Bedrock

MODEL_ID:
  NAME: us.amazon.nova-pro-v1:0
  TYPE: bedrock
  REGION: us-east-1
  TEMPERATURE: 0.1

Note: Newer Claude models on Bedrock reject temperature as deprecated. Omit TEMPERATURE for those โ€” it is only sent when explicitly configured.

Using a named AWS profile (Bedrock, SageMaker, Mantle). These providers use the standard boto3 credential chain (default profile / env vars / instance role). To select a specific named profile instead, set AWS_PROFILE via the config ENV: section โ€” values there are exported as environment variables at startup, and boto3 picks them up automatically. No model-level config key is needed:

ENV:
  AWS_PROFILE: my-bedrock-profile
  # AWS_REGION: us-east-1   # any AWS env var works here too

Using a Bedrock API key (instead of AWS credentials). Bedrock supports short-term API keys (a bedrock-api-key-... value from the console). For standard Bedrock (TYPE: bedrock), set it as AWS_BEARER_TOKEN_BEDROCK โ€” langchain-aws reads it automatically, no model config needed:

ENV:
  AWS_BEARER_TOKEN_BEDROCK: bedrock-api-key-XXXXXXXX

(For Mantle, the same key is supplied differently โ€” see the Mantle section below.)

Amazon Bedrock Mantle

Bedrock Mantle is an OpenAI-compatible API (not the Bedrock Converse API). By default it authenticates with a short-lived bearer token minted from your standard AWS credentials via aws-bedrock-token-generator, so your normal aws configure / SSO setup works โ€” no extra keys to manage. Use TYPE: mantle and a bare model ID from the Mantle catalog.

MODEL_ID:
  NAME: qwen.qwen3-32b # bare Mantle model id (e.g. anthropic.claude-opus-4-8)
  TYPE: mantle
  REGION: us-east-1
  MAX_TOKENS: 8192

Authenticating with a Bedrock API key (no AWS credentials). Instead of minting a token, you can supply a short-term Bedrock API key directly. Mantle reads it from the BEDROCK_API_KEY environment variable (set it via the config ENV: section), or from a per-model API_KEY field. When a key is present it's used as-is; otherwise the app falls back to minting from AWS credentials. (Note: standard Bedrock uses AWS_BEARER_TOKEN_BEDROCK for the same key โ€” Mantle uses BEDROCK_API_KEY.)

# Option A โ€” environment variable (applies to all Mantle calls)
ENV:
  BEDROCK_API_KEY: bedrock-api-key-XXXXXXXX

# Option B โ€” per-model key
MODEL_ID:
  NAME: qwen.qwen3-32b
  TYPE: mantle
  REGION: us-east-1
  API_KEY: bedrock-api-key-XXXXXXXX

API protocols. Mantle serves models under three protocols. Select with API_PROTOCOL (works for both chat and vision):

  • chat_completions (default) โ€” base /v1, OpenAI Chat Completions API. Most models (Qwen, Gemma, GPT-OSS, DeepSeek, โ€ฆ).
  • responses โ€” base /openai/v1, OpenAI Responses API. Required by models that only expose Responses, such as openai.gpt-5.4.
  • anthropic โ€” base /anthropic, Anthropic Messages API. For Claude models (e.g. anthropic.claude-haiku-4-5).
# OpenAI Responses model (e.g. GPT-5.4)
MODEL_ID:
  NAME: openai.gpt-5.4
  TYPE: mantle
  REGION: us-west-2 # gpt-5.4 is in us-west-2, not us-east-1
  API_PROTOCOL: responses
  MAX_TOKENS: 8192

# Anthropic Claude model
MODEL_ID:
  NAME: anthropic.claude-haiku-4-5
  TYPE: mantle
  REGION: us-east-1
  API_PROTOCOL: anthropic
  MAX_TOKENS: 8192
  • ENDPOINT_URL is optional; it defaults to https://bedrock-mantle.<REGION>.api.aws/{v1 | openai/v1 | anthropic} depending on the protocol.
  • The Mantle catalog (Qwen, Mistral, DeepSeek, GLM, Gemma, Claude, GPT-5.4, โ€ฆ) differs from standard Bedrock and varies by account/region.
  • TYPE: mantle works for both MODEL_ID (chat) and VISION_MODEL_ID (image description) โ€” vision-capable models like qwen.qwen3-vl-235b-a22b-instruct are supported.
  • Caveats: Pick the right API_PROTOCOL per model (using the wrong one returns a 400 "does not support the '/v1/โ€ฆ' API" error). anthropic requires the langchain-anthropic package (in requirements.txt). Models like anthropic.claude-fable-5 also require the account's data-retention mode to be provider_data_share, otherwise they report unavailable.

For standard Bedrock (Converse API), ENDPOINT_URL is also accepted on MODEL_ID/VISION_MODEL_ID with TYPE: bedrock to override the default endpoint.

Ollama (Local)

MODEL_ID:
  NAME: qwen3-4b-thinking-2507-q6-k:latest
  TYPE: ollama
  HOST: localhost
  PORT: 11434
  REPETITION_PENALTY: 1.1
  PRESENCE_PENALTY: 1.5
  TEMPERATURE: 0.1
  TOP_P: 0.95

OpenAI

MODEL_ID:
  NAME: gpt-5-mini-2025-08-07
  TYPE: openai
  STREAM: true
  REASONING_EFFORT: medium
# Requires OPENAI_API_KEY environment variable

Anthropic (Claude API)

The direct Anthropic API (api.anthropic.com) via langchain-anthropic. This is distinct from the Bedrock Mantle anthropic protocol (which reaches Claude through Bedrock) โ€” TYPE: anthropic talks to Anthropic directly. STOP maps to Anthropic's stop_sequences, and extended thinking is enabled with REASONING (+ optional REASONING_EFFORT / THINKING_TOKENS).

MODEL_ID:
  NAME: claude-opus-4-8
  TYPE: anthropic
  MAX_TOKENS: 4096
  TEMPERATURE: 0.4
  # REASONING: true          # enable extended thinking
  # REASONING_EFFORT: high   # low | medium | high | max
  # ENDPOINT_URL: https://...  # optional custom base URL
# Requires ANTHROPIC_API_KEY env var, or set MODEL_ID.API_KEY

Amazon SageMaker AI

MODEL_ID:
  NAME: your-endpoint-name
  TYPE: sagemaker
  REGION: us-east-1
  REPETITION_PENALTY: 1.1
  PRESENCE_PENALTY: 1.5
  TEMPERATURE: 0.1
  MAX_TOKENS: 4096

LiteLLM (100+ Providers)

MODEL_ID:
  NAME: openai/your-model-name
  TYPE: litellm
  API_BASE: http://localhost:8000/v1
  API_KEY: your-api-key
  TEMPERATURE: 0.1
  MAX_TOKENS: 4096

Vision Model Configuration

For Bedrock:

VISION_MODEL_ID:
  NAME: global.anthropic.claude-haiku-4-5-20251001-v1:0
  TYPE: bedrock
  REGION: us-east-1
  TEMPERATURE: 0.3

For Ollama:

VISION_MODEL_ID:
  NAME: qwen3-vl:2b
  TYPE: ollama
  HOST: localhost
  PORT: 11434
  TEMPERATURE: 0.3

For OpenAI:

VISION_MODEL_ID:
  NAME: gpt-5-mini-2025-08-07
  TYPE: openai
  STREAM: true
  REASONING_EFFORT: medium

For Anthropic (Claude is multimodal):

VISION_MODEL_ID:
  NAME: claude-opus-4-8
  TYPE: anthropic
  MAX_TOKENS: 1500
  TEMPERATURE: 0.3
# Requires ANTHROPIC_API_KEY env var, or set VISION_MODEL_ID.API_KEY

For SageMaker AI (endpoint must serve a vision-capable model accepting the OpenAI image format):

VISION_MODEL_ID:
  NAME: your-endpoint-name
  TYPE: sagemaker
  REGION: us-east-1
  INPUT_FORMAT: openai_chat
  TEMPERATURE: 0.3

For LiteLLM (any of its vision-capable models):

VISION_MODEL_ID:
  NAME: openai/gpt-4o # provider-prefixed model id
  TYPE: litellm
  API_BASE: http://localhost:4000 # optional (proxy / self-hosted)
  API_KEY: your-api-key # optional (else the provider's env var)

Model Parameters

This is the full reference for what you can put under MODEL_ID, VISION_MODEL_ID, and RAG.EMBED_MODEL_ID. Only NAME and TYPE are required; everything else is optional and omitted keys fall back to the provider/model default. The interactive configurator (/config, /model) sets the common ones โ€” use this reference to hand-tune config.yaml for anything else a provider or model supports.

Identity, connection & auth

Parameter Applies to TYPE Description
NAME all (required) Model id / Ollama model / Bedrock model id / Mantle bare id / SageMaker endpoint name
TYPE all (required) ollama, bedrock, mantle, openai, anthropic, sagemaker, litellm (embeddings: ollama, bedrock, openai, sagemaker, litellm)
HOST ollama Ollama host (default localhost)
PORT ollama Ollama port (default 11434)
REGION bedrock, mantle, sagemaker AWS region (default us-east-1)
API_PROTOCOL mantle chat_completions (default), responses, or anthropic
ENDPOINT_URL bedrock, mantle, anthropic Override the default endpoint URL (Anthropic: custom base URL)
API_KEY mantle, anthropic, litellm Mantle: Bedrock API key (else BEDROCK_API_KEY env / minted token). Anthropic: else ANTHROPIC_API_KEY env. LiteLLM: provider key
API_BASE litellm LiteLLM API base URL
INPUT_FORMAT sagemaker openai_chat (default) or huggingface

Standard Bedrock also reads the AWS_BEARER_TOKEN_BEDROCK env var, and all AWS providers honor AWS_PROFILE โ€” see the API-key/profile notes under Amazon Bedrock.

Inference parameters

Optional generation settings. The Honored by column lists the providers that actually send each one (others ignore it). These apply to MODEL_ID and VISION_MODEL_ID; EMBED_MODEL_ID takes none of them (embeddings only use NAME/TYPE + connection).

This table is derived from models/provider_params.py โ€” the single source of truth that the controllers build their client kwargs from โ€” so it reflects exactly what each provider's init path forwards. (mantle reads TEMPERATURE/MAX_TOKENS/TOP_P via the Mantle factory.)

Parameter Description Honored by (MODEL_ID)
MAX_TOKENS Max output tokens to generate ollama, bedrock, mantle, openai, anthropic, sagemaker, litellm
TEMPERATURE Sampling temperature ollama, bedrock, mantle, openai, anthropic, sagemaker, litellm
TOP_P Top-p (nucleus) sampling ollama, bedrock, mantle, openai, anthropic, sagemaker, litellm
TOP_K Top-k sampling ollama, anthropic, sagemaker
STOP Stop sequences (YAML list) ollama, bedrock, anthropic, sagemaker, litellm
STREAM Stream tokens (default true) mantle, openai, anthropic, litellm
PRESENCE_PENALTY Presence penalty ollama, openai
FREQUENCY_PENALTY Frequency penalty ollama
REPETITION_PENALTY Repetition penalty ollama, litellm
REASONING Enable extended thinking (boolean) bedrock, anthropic
THINKING_TOKENS Thinking token budget (default 2048) bedrock, anthropic
REASONING_EFFORT low/medium/high/max openai, anthropic (also maps to Bedrock thinking budget)

VISION_MODEL_ID supports the same seven providers as MODEL_ID. It accepts a subset of params: MAX_TOKENS/TEMPERATURE/TOP_P across providers, plus TOP_K on ollama/anthropic/sagemaker and STOP on ollama/sagemaker. Connection keys follow the provider (host/port, region, Mantle protocol, SageMaker INPUT_FORMAT, LiteLLM/Anthropic API_BASE/API_KEY/base URL).

Provider-appropriate tuning matters. Newer Claude and GPT models reject TEMPERATURE outright; STOP, penalties, and TOP_K are largely Ollama/SageMaker concepts. When /model switches a section's provider it drops the keys the new provider doesn't consume for you, but for everything else edit config.yaml to match what your specific provider/model accepts.

The context window is set separately, at the top level (it's not part of a model section): MAX_CONVERSATION_TOKENS (see General Parameters below).

General Parameters

# Context window size (passed to model as num_ctx for Ollama)
MAX_CONVERSATION_TOKENS: 65536

# Maximum tokens when reading documents (CSV, JSON, text files)
DOC_MAX_TOKENS: 16384

# Profile configuration
PROFILE:
  NAME: default # Used for session data isolation (~/.mnemoai/{NAME}/)
  USE_PROFILING: true # Enable automatic user profiling

Embeddings Configuration

Embeddings settings are nested under the RAG section:

RAG:
  EMBEDDINGS:
    CACHE_ENABLED: true # LRU cache for embedding vectors (avoids re-embedding same text)
    CACHE_SIZE: 1000 # Maximum cached embeddings
    FALLBACK_ENABLED: true # Fall back to SHA256 if embedding model unavailable
    FALLBACK_TYPE: "sha256" # Fallback type (sha256, random, zeros)

LLM Interaction Configuration

LLM:
  ENABLE_THINKING: true # Enable thinking tags (verbose mode)
  RETRY_ENABLED: true # Retry failed LLM calls
  MAX_RETRIES: 3 # Maximum retry attempts
  RETRY_DELAY: 1.0 # Seconds between retries
  RETRY_BACKOFF: 2.0 # Exponential backoff multiplier
  SUMMARIZATION_THINK: false # Include thinking in summarization
  TOKEN_COUNTING:
    OLLAMA_APPROXIMATION: 1.3 # Chars-to-tokens multiplier for Ollama
    FALLBACK_MODEL: "gpt-4" # Tiktoken model for fallback counting

System Prompt

The system prompt in config.yaml defines the assistant's behavior. Customize the SYSTEM_PROMPT field to change the assistant's personality, instructions, and tool usage patterns. Key sections in the default prompt:

  • <identity>: Basic identity and core principles
  • <reasoning_discipline>: Thinking rules and loop detection
  • <output_format>: Response formatting requirements
  • <information_sources>: RAG vs web vs internal knowledge decision tree
  • <file_operations>: Read/write/edit workflow rules
  • <search_tools>: Glob and grep usage guidance
  • <git_operations>: Git safety rules
  • <task_management>: Todo, plan mode, and background task rules
  • <error_handling>: Error response guidelines
  • <communication>: Style and security rules

RAG Configuration

ENABLE_RAG: true # Master toggle for RAG system
RAG:
  MAX_TOKENS: 8192 # Threshold: documents above this are ingested into RAG
  CHUNK_TOKENS: 1024 # Chunk size in tokens (recommended: 512-2048)
  SEARCH:
    SEMANTIC_WEIGHT: 0.5 # Semantic similarity weight (0-1)
    KEYWORD_WEIGHT: 0.5 # BM25 keyword weight (0-1)
  VECTOR_STORE:
    TYPE: chromadb # Vector store backend: "faiss" or "chromadb"
  EMBEDDINGS:
    CACHE_ENABLED: true
    CACHE_SIZE: 1000
    FALLBACK_ENABLED: true
    FALLBACK_TYPE: "sha256"

Requires: An embedding model configured via RAG.EMBED_MODEL_ID (see Embeddings Model).

Episodic Memory Configuration

ENABLE_EPISODIC_MEMORY: true
EPISODIC_MEMORY:
  STORE_TYPE: chromadb # or faiss
  # Similarity Thresholds
  DUPLICATE_THRESHOLD: 0.95 # Higher = stricter duplicate detection
  RETRIEVAL_THRESHOLD: 0.7 # Minimum similarity to retrieve episodes
  FOLLOW_UP_THRESHOLD: 0.4 # Similarity to detect follow-up questions (skips injection)
  REDUNDANCY_THRESHOLD: 0.5 # Filter episodes redundant with conversation
  # Hybrid Search Weights
  SEMANTIC_WEIGHT: 0.7 # Semantic similarity weight (0-1)
  KEYWORD_WEIGHT: 0.3 # Keyword matching weight (0-1)
  # Token and Size Limits
  MAX_TOKENS_PER_EPISODE: 400 # Max tokens for episode text
  MAX_EPISODES: 1000 # Maximum stored episodes
  MAX_AGE_DAYS: 90 # Maximum episode age in days
  # Success Detection
  SUCCESS_MARKERS: # Phrases that indicate task success
    - thanks
    - perfect
    - great
    - worked
  CORRECTION_MARKERS: # Phrases that indicate errors
    - wrong
    - error
    - fix
    - actually
  # Storage Behavior
  IMMEDIATE_STORAGE: true # Store episodes immediately
  MIN_TOOLS_OR_LENGTH: 300 # Min response length if no tools used
  # Query Enhancement
  ENABLE_QUERY_EXPANSION: true # Expand queries with synonyms
  QUERY_EXPANSION_TERMS: 3 # Max terms to add per query

Requires: An embedding model configured via RAG.EMBED_MODEL_ID (see Embeddings Model).

How it works:

  • Automatically stores successful task completions with full conversation context
  • Uses hybrid search (70% semantic + 30% BM25) to find similar past tasks
  • Conversation-aware injection: Only injects episodic memory when relevant
    • Detects follow-up questions and skips injection (uses conversation context instead)
    • Filters out episodes redundant with current conversation
    • Uses semantic similarity (with embeddings) or Jaccard similarity (fallback)
  • Injects compact context showing: task โ†’ tools used โ†’ outcome
  • Automatic cleanup: keeps max 1000 episodes, removes entries older than 90 days

Success detection:

  • User feedback: "thanks", "perfect", "great"
  • No error markers in response
  • All tools executed successfully
  • Filters out simple greetings and short responses

Embeddings Model

All embedding configuration is nested under RAG::

For Bedrock:

RAG:
  EMBED_MODEL_ID:
    NAME: amazon.titan-embed-text-v2:0
    TYPE: bedrock
    REGION: us-east-1

For Ollama:

RAG:
  EMBED_MODEL_ID:
    NAME: mxbai-embed-large
    TYPE: ollama
    HOST: localhost
    PORT: 11434

For OpenAI:

RAG:
  EMBED_MODEL_ID:
    NAME: text-embedding-ada-002
    TYPE: openai

For SageMaker:

RAG:
  EMBED_MODEL_ID:
    NAME: your-endpoint-name
    TYPE: sagemaker
    REGION: us-east-1

For LiteLLM (any of its 100+ providers via one OpenAI-style API):

RAG:
  EMBED_MODEL_ID:
    NAME: openai/text-embedding-3-small # provider-prefixed model id
    TYPE: litellm
    API_BASE: http://localhost:4000 # optional (proxy / self-hosted)
    API_KEY: your-api-key # optional (else the provider's env var)

Vector Store Options:

  • ChromaDB (default): Persistent vector database with built-in metadata support
  • FAISS: Fast, in-memory vector search with disk persistence

Switch between stores by changing RAG.VECTOR_STORE.TYPE in config. The system uses a controller pattern, so all RAG functionality works identically regardless of the store.

๐Ÿ“š Advanced Features

Query Routing

When enabled, the assistant classifies each query before processing it and routes it to a specialized tool subset. This reduces noise for the model and improves response quality.

Categories:

Route Description Tools Available
simple_qa Greetings, explanations, general knowledge None (direct LLM answer)
code File ops, code editing, git, shell commands fs_read, fs_write, file_edit, bash, git, search, etc
research Web search, URL fetching web_search, web_crawler
knowledge Document reading, indexing, RAG queries pdf/csv/docx/json readers, RAG tools, fs_read
full Multi-category or ambiguous tasks All tools (fallback)

How it works:

  1. A lightweight LLM call classifies the query into one of the categories above
  2. The agent node binds only the tools for that category
  3. If a query spans multiple categories, it routes to full (all tools)
  4. The classifier prompt is customizable via ROUTING_PROMPT in config.yaml

Configuration:

ENABLE_ROUTING: true
ROUTING_PROMPT: |
  # Custom classifier prompt (optional, has a sensible default)
  ...

Orchestrator-Workers

When enabled alongside routing, tasks classified as full (spanning multiple categories) are automatically decomposed into focused subtasks executed by specialized workers.

How it works:

  1. Orchestrator: An LLM call decomposes the complex query into ordered subtasks, each assigned a category (code, research, knowledge, etc.)
  2. Workers: Each subtask is executed by a worker agent with only the tools for its category. Workers run sequentially โ€” each receives context from previously completed subtasks.
  3. Aggregator: If there were multiple subtasks, a final LLM call synthesizes all worker results into a single coherent response.

Example flow for "Read this PDF and write a summary to a file":

Orchestrator decomposes into:
  [Step 1/2: Read and summarize the PDF document]        โ†’ knowledge worker
  [Step 2/2: Write the summary to summary.md]            โ†’ code worker
  [Synthesizing results...]                               โ†’ aggregator

Configuration:

ENABLE_ROUTING: true # Required
ENABLE_ORCHESTRATION: true # Activates orchestrator for 'full' route
# ORCHESTRATOR_PROMPT: |      # Optional: customize decomposition prompt
# AGGREGATOR_PROMPT: |        # Optional: customize synthesis prompt

When orchestration is disabled, full routes use all tools in a single agent loop (the previous behavior). No regression.

Web Search Configuration

This tool uses the Brave Search API. Obtain an API key from Brave Search Developer Portal.

BRAVE_API_KEY: your-api-key-here # For web search

Web Crawler Configuration

Enable web page content extraction with automatic RAG integration:

ENABLE_WEB_CRAWL: true

When enabled, the web_crawler tool:

  • Extracts content from web pages as markdown
  • Automatically ingests large pages (>8K tokens) into RAG (if enabled)
  • Uses the same chunking configuration as PDF/DOCX readers

Browser dependency. Crawling uses a headless Chromium via Playwright, whose browser binary is a separate ~260MB download not pulled in by pip / uv tool install. The tool installs it automatically on the first crawl after a fresh install/upgrade. If that auto-install fails (e.g. offline), run it manually in the same environment: python -m playwright install chromium (for an installed CLI: ~/.local/share/uv/tools/mnemoai/bin/python -m playwright install chromium).

External MCP Servers

mnemoai always runs its own built-in MCP server (file ops, bash, git, web, RAG, vision, planning). You can add more MCP servers by creating ~/.mnemoai/mcp/mcp.json with the standard mcpServers schema (an mcp.json.example is seeded there on first run). Their tools are merged with the built-in ones and made available to the agent.

{
  "mcpServers": {
    "brave-search": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-brave-search"],
      "env": { "BRAVE_API_KEY": "your_brave_api_key" }
    },
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/dir"],
      "disabled": true
    }
  }
}

Per-server fields: command (required), args (optional list), env (optional; merged over the process environment), and disabled (optional; true skips the server). A template ships at ~/.mnemoai/mcp/mcp.json.example (seeded on first run from the bundled src/mnemoai/utils/mcp.json.example).

Behavior:

  • Additive โ€” the built-in server is always on; external servers run alongside it. Tools from all servers are merged into one list.
  • Resilient โ€” if an external server fails to start (bad command, missing binary, crash), it's logged in red and skipped; the app still runs with the built-in server and any others that connected.
  • No shadowing โ€” if an external tool's name collides with a built-in one, the external tool is exposed as servername__tool so core tools are never overridden (the server is still called with the original tool name).
  • Works with routing & orchestration โ€” external tools are appended to every non-empty query route, and when orchestration is enabled the task decomposer is told which external tools exist and steers subtasks that need them to the full category (which binds every tool). So external tools stay reachable whether routing/orchestration is on or off.
  • Run /mcp in the chat to see configured servers, status, and tool counts.

RAG (Retrieval-Augmented Generation)

The RAG system automatically indexes documents for semantic search with hybrid search (semantic embeddings + BM25 keyword scoring).

How it works:

  1. Read a PDF/DOCX file โ†’ Automatically chunked and indexed
  2. Ask questions โ†’ Assistant searches indexed documents first using hybrid search
  3. Session-scoped โ†’ Cleared on /clear or exit

RAG Tools:

  • list_documents(): Show indexed documents
  • search_in_documents(query, top_k): Hybrid semantic + BM25 search
  • clear_documents(): Clear RAG index

Configuration:

  • RAG.CHUNK_TOKENS: Chunk size (recommended: 512-2048)
  • RAG.VECTOR_STORE.TYPE: Choose between faiss or chromadb
  • RAG.SEARCH.SEMANTIC_WEIGHT / RAG.SEARCH.KEYWORD_WEIGHT: Configurable hybrid weights
  • Recursive chunking with 10% overlap
  • Hybrid search: BM25 (Okapi BM25 with TF-IDF, term saturation, length normalization) + semantic similarity
  • Independent candidate retrieval from both BM25 and embeddings, merged and re-ranked

Vector Store Options:

  • ChromaDB: Persistent vector database with metadata support (default)
  • FAISS: Fast in-memory search with disk persistence

The system uses a VectorStoreController for easy switching between stores. All functionality (indexing, searching, clearing) works identically regardless of the chosen store.

User Profile Learning

After 5+ interactions, the assistant builds a profile:

  • Cognitive style: Analytical, creative, pragmatic, systematic
  • Domain expertise: Python, AWS, DevOps, ML, etc.
  • Learning style: Visual, hands-on, theoretical
  • Communication patterns: Tone, complexity, question styles
  • Code preferences: Testing, documentation, type hints

Profile is automatically injected into system prompt for personalization.

Episodic Memory

The episodic memory system learns from successful task completions and retrieves similar solutions for future queries.

How it works:

  1. Automatic Storage: After each successful interaction, stores:

    • Initial user query
    • Full conversation context
    • Tools used with arguments
    • Final solution
    • Timestamp
  2. Hybrid Search: Retrieves similar episodes using:

    • 70% semantic similarity (task intent)
    • 30% BM25 keyword scoring (tool names, action verbs)
  3. Context Injection: Before processing queries, injects compact context:

    [Episodic Memory - Similar Past Tasks]
    1. "read DOCX about ML" โ†’ fs_read โ†’ success (similarity: 0.85)
    2. "analyze PDF report" โ†’ fs_read, web_search โ†’ success (similarity: 0.78)
    
  4. Automatic Cleanup: Maintains bounded memory:

    • Max 1000 episodes
    • Removes entries older than 90 days
    • Runs on startup

Success Detection:

  • User feedback: "thanks", "perfect", "great", "worked"
  • No error markers in response
  • All tools executed successfully
  • Filters out greetings and simple acknowledgments (<300 chars, no tools)

Storage Location:

  • FAISS: ~/.mnemoai/{profile}/models/{model}/episodic_memory/episodic.index
  • ChromaDB: ~/.mnemoai/{profile}/models/{model}/episodic_memory/

Configuration:

ENABLE_EPISODIC_MEMORY: true
EPISODIC_MEMORY:
  STORE_TYPE: chromadb # or faiss
RAG:
  EMBED_MODEL_ID: # Required for both stores
    NAME: mxbai-embed-large
    TYPE: ollama

ACE Playbook (Agentic Context Engineering)

The ACE Playbook learns strategies from both successes AND failures, implementing the Agentic Context Engineering framework for continuous improvement.

How it works:

  1. Reflector: After each interaction, analyzes tool executions:

    • Detects failure patterns (file not found, string not found, permission denied, etc.)
    • Identifies successful strategies for specific tools (file_edit, execute_bash)
    • Extracts specific, actionable insights (not generic summaries)
    • Tracks metrics (success/failure rates, failure types) in metrics.json
  2. Playbook Store: Maintains structured strategy entries:

    {
      "context": "editing python files",
      "strategy": "Read the file first to get exact string including whitespace before using str_replace",
      "source": "Failed file_edit on 2026-02-01: string_not_found",
      "outcome": "failure",
      "tools": ["file_edit"],
      "confidence": 0.9
    }
    
  3. Context Injection: Injects relevant strategies into the system prompt at startup:

    [Playbook - Learned Strategies]
    Avoid these patterns:
      โœ— [editing files]: Read the file first to get exact string before str_replace
    Effective strategies:
      โœ“ [searching files]: Use glob_search instead of find for better performance
    
  4. Lazy Refinement: Only deduplicates when hitting token limits, using semantic similarity if embeddings are configured.

What gets stored:

  • Failures: Specific patterns like string_not_found, file_not_found, permission_denied, command_failed, etc.
  • Successes: Only for tools with reusable patterns (file_edit, execute_bash with specific commands)
  • Not stored: Generic successes without actionable strategies

Key Differences from Episodic Memory:

Feature Episodic Memory ACE Playbook
Stores Full task completions Granular strategies
Learns from Successes only Successes AND failures
Format Conversation context Structured rules
Retrieval Semantic similarity Context + tool matching

Configuration:

ENABLE_PLAYBOOK: true
PLAYBOOK:
  MAX_ENTRIES: 500 # Maximum entries before refinement
  SIMILARITY_THRESHOLD: 0.85 # Threshold for merging similar strategies
  MAX_INJECT: 10 # Maximum entries to inject per query

Storage Location:

  • Strategies: ~/.mnemoai/{profile}/models/{model}/playbook/playbook.json
  • Metrics: ~/.mnemoai/{profile}/models/{model}/playbook/metrics.json

Training Data Collection

Supervised Fine-Tuning (SFT)

  • Use /good to mark high-quality responses
  • Saved conversations include quality markers
  • Extract labeled interactions for training

๐Ÿ“ฆ Dependencies

All Python dependencies are listed in requirements.txt. The new productivity tools use only standard library features:

Tool Python Packages External Tools
TodoWrite Standard library only None
Edit Tool Standard library only None
Glob Search Standard library (glob) None
Grep Search Standard library (subprocess) ripgrep (optional)
Error Handler Standard library (functools) None
Git Safety Standard library (subprocess) git
Plan Mode Standard library (json, os) None
Background Tasks Standard library (threading) None

External Tools:

  • ripgrep: Required for grep_search tool. Install via system package manager (see Installation section). If not installed, the assistant automatically falls back to slower alternatives.

Core Python Packages:

  • langgraph: Agent orchestration framework
  • langchain, langchain-core: LLM abstraction layer
  • langchain-ollama: Ollama integration
  • langchain-aws: AWS Bedrock integration
  • langchain-openai: OpenAI integration (also used for Bedrock Mantle OpenAI/Responses protocols)
  • langchain-anthropic: Anthropic integration (Bedrock Mantle anthropic protocol)
  • aws-bedrock-token-generator: Bearer-token auth for Bedrock Mantle
  • mcp, mcp[cli]: Model Context Protocol
  • ollama: Local LLM support
  • boto3: AWS Bedrock/SageMaker
  • tiktoken: Token counting
  • chromadb, faiss-cpu: Vector stores for RAG
  • PyPDF2, python-docx: Document readers
  • Pygments: Code syntax highlighting
  • prompt_toolkit: Interactive CLI
  • brave-search-python-client: Web search
  • crawl4ai: Web crawling

๐Ÿ› ๏ธ Development

Testing

The test suite uses pytest and is split into two tiers under tests/:

  • tests/unit/ โ€” fast, deterministic tests for pure logic (BM25, reasoning helpers, response parsing, subtask parsing, the tool error handler, git-safety command classification, file editing/search, bash timeout handling, and episodic-memory heuristics). No LLM, Ollama, or network required, so they run in seconds and don't need a config.yaml.
  • tests/integration/ โ€” end-to-end tests that drive the real agent against a live Ollama server and the MCP subprocess (routing, tool calls, bash timeout, no silent empty turns). Marked with @pytest.mark.integration and auto-skipped unless a runtime utils/config.yaml exists and the configured Ollama host is reachable.
# Install test dependencies
pip install -r requirements-dev.txt

# Run everything (integration auto-skips if Ollama/config aren't available)
python -m pytest

# Unit tier only (fast โ€” good for CI and pre-commit)
python -m pytest tests/unit

# Integration tier only (requires Ollama running + a real config.yaml)
python -m pytest -m integration

# Run a single file
python -m pytest tests/unit/test_bm25.py

When adding new code, keep import-time side effects independent of config.yaml so the module stays unit-testable.

Adding New Tools

  1. Create tool file in server/tools/:
from mcp.server.fastmcp import FastMCP

def register_your_tool(mcp: FastMCP):
    @mcp.tool()
    async def your_tool(param: str) -> str:
        """Tool description for the LLM."""
        # Implementation
        return result
  1. Register in tools_manager.py:
from .your_tool import register_your_tool
register_your_tool(mcp)

Adding New File Readers

  1. Create reader in server/tools/readers/:
async def read_your_format(path: str) -> str:
    """Read your custom format."""
    # Implementation
    return content
  1. Register in fs_read.py:
from .readers.your_reader import read_your_format
# Add to file type detection logic

Switching Model Providers

The application uses controller classes for centralized model management. To switch providers, just update config.yaml:

For LLM:

MODEL_ID:
  NAME: your-model-name
  TYPE: ollama # or bedrock, sagemaker

For Vision:

VISION_MODEL_ID:
  NAME: your-vision-model
  TYPE: ollama # or sagemaker

For Embeddings:

RAG:
  EMBED_MODEL_ID:
    NAME: mxbai-embed-large
    TYPE: ollama

The controllers (llm_controller.py, vision_model_controller.py, embeddings_controller.py) handle all provider-specific initialization automatically.

Adding New Model Providers

  1. Update the appropriate controller in models/:
def initialize_model(self):
    if self.model_type == "your_provider":
        # Your provider initialization
        self.model = YourProviderModel(...)
  1. Add configuration in config.yaml

๐Ÿ”ง Ollama Utilities (Optional)

The bash/ directory contains helper scripts for Ollama users on macOS and Linux.

Ollama Environment Setup (macOS)

Sets Ollama performance environment variables at boot and launches the Ollama app:

# Variables set: OLLAMA_FLASH_ATTENTION=1, OLLAMA_KV_CACHE_TYPE=q8_0, OLLAMA_NUM_GPU=999

Setup:

  1. Edit bash/ollama-env-mac/ollama.environment.plist (no changes needed for defaults)
  2. Copy to LaunchAgents:
cp bash/ollama-env-mac/ollama.environment.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/ollama.environment.plist

VRAM Cleaner

Automatically unloads idle Ollama models from VRAM to free GPU memory. Useful when running multiple models or when GPU memory is limited.

macOS (LaunchAgent, runs every 60 seconds):

  1. Edit bash/ollama-freeup-vram/com.ollama.vramcleaner.plist:
    • Replace <PATH_TO_FOLDER> with the actual path to this repository
    • Replace <PATH_TO_USER_HOME> with your home directory
  2. Install:
cp bash/ollama-freeup-vram/com.ollama.vramcleaner.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.ollama.vramcleaner.plist

Linux (systemd):

  1. Edit bash/ollama-freeup-vram/ollama-vram-cleaner.service:
    • Replace <PATH_TO_FOLDER> with the actual path
  2. Install:
sudo cp bash/ollama-freeup-vram/ollama-vram-cleaner.service /etc/systemd/system/
sudo systemctl enable ollama-vram-cleaner
sudo systemctl start ollama-vram-cleaner

See bash/ollama-freeup-vram/README.md and bash/ollama-env-mac/README.md for more details.

๐Ÿ› Troubleshooting

Common Issues

MCP Connection Errors

  • Verify Python path in client.py matches your environment
  • Check server path is correct
  • Ensure all dependencies are installed (pip install -r requirements.txt)

Model Loading Issues

  • Verify model name and type in config.yaml
  • For Ollama: Ensure Ollama is running (ollama serve) and model is pulled (ollama pull model-name)
  • For AWS Bedrock: Check credentials (aws sts get-caller-identity), region, and model access
  • For OpenAI: Ensure OPENAI_API_KEY environment variable is set

RAG / Episodic Memory Not Working

  • Ensure ENABLE_RAG: true (or ENABLE_EPISODIC_MEMORY: true) in config
  • Verify embedding model is configured and available (RAG.EMBED_MODEL_ID in config)
  • For Ollama embeddings: ensure the embedding model is pulled (ollama pull mxbai-embed-large)
  • Check logs for "fallback embeddings" warnings โ€” this means the real model is unreachable
  • Verify documents are being indexed with list_documents()

Permission Errors

  • Ensure write permissions for ~/.mnemoai/
  • Ensure write permissions for ~/.mnemoai/ (the app home: config, plans, tasks, per-profile state)
  • Check file paths in configuration

Import Errors on Startup

  • Some dependencies (chromadb, faiss-cpu, crawl4ai) can be tricky to install. Check platform-specific instructions.
  • On Apple Silicon: faiss-cpu may require pip install faiss-cpu --no-cache-dir

Logging

Logs are output to stderr with configurable level:

LOG_LEVEL=DEBUG mnemoai  # Detailed logs
LOG_LEVEL=INFO mnemoai   # Normal logs (default)

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿค Contributing

This is a personal development project. If you'd like to use or extend it, feel free to fork the repository and adapt it to your needs!

If you use this code in your own projects, attribution to the original repository is appreciated but not required.

๐Ÿ™ Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mnemoai_assistant-0.5.0.tar.gz (29.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mnemoai_assistant-0.5.0-py3-none-any.whl (230.4 kB view details)

Uploaded Python 3

File details

Details for the file mnemoai_assistant-0.5.0.tar.gz.

File metadata

  • Download URL: mnemoai_assistant-0.5.0.tar.gz
  • Upload date:
  • Size: 29.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for mnemoai_assistant-0.5.0.tar.gz
Algorithm Hash digest
SHA256 973bc0d3c7025fc13b67e745cdcdd9478c362d50e51fb8a582a5968d019d4bfa
MD5 2d71c39c420fcaafe7e63309d1f19638
BLAKE2b-256 03b17d27a0387f05b51f1ff7a83be30c61a5c0bcf187463e64cb0f59c45f1eb0

See more details on using hashes here.

File details

Details for the file mnemoai_assistant-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mnemoai_assistant-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 46b3ca967d85603396c0f2960562718323d8fe3f7ba167057da73ab621d89b85
MD5 8f26e0bd59edff2c4cbd7dbe9fa67c73
BLAKE2b-256 85929c37ba0518e1360b1444dcbfeba79e4cab9c3f3bef8cc1c83430936cacb9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page