mnemoai-assistant

Mnemo AI — a local agentic AI assistant (LangGraph + MCP) that learns and remembers, with multi-provider model support.

These details have not been verified by PyPI

Project links

Project description

Mnemo AI

A local agentic AI assistant with MCP (Model Context Protocol) integration, RAG capabilities, and intelligent conversation management. Built on LangGraph with LangChain for multi-provider LLM support (Ollama, Amazon Bedrock, OpenAI, Amazon SageMaker AI, LiteLLM).

Demo

📑 Table of Contents

✨ Key Features
📖 Project Structure
🏗️ Architecture
🚀 Quick Start
🔀 Feature Toggles
💡 Usage
🚀 Productivity Tools
🔧 Configuration
📚 Advanced Features
📦 Dependencies
🛠️ Development
🔧 Ollama Utilities (Optional)
- Ollama Environment Setup (macOS)
- VRAM Cleaner
🐛 Troubleshooting
- Common Issues
- Logging
📄 License
🤝 Contributing
🙏 Acknowledgments

✨ Key Features

🤖 Multi-Model Support: Ollama (local), Amazon Bedrock, Amazon SageMaker AI, LiteLLM (100+ providers)
🔧 MCP Tool System: Extensible tool architecture via Model Context Protocol
📚 RAG (Retrieval-Augmented Generation): Automatic document indexing and semantic search (if enabled)
💬 Advanced Chat Interface: Multiline input, command system, conversation save/load
🧠 User Profile Learning: Automatic learning from interactions for personalized responses
🧩 Episodic Memory: Learns from successful task completions and retrieves similar solutions
📖 ACE Playbook: Learns strategies from successes AND failures via Agentic Context Engineering
📊 Training Data Collection: Mark high-quality responses for SFT training
🔍 Web Search: Integrated Brave Search API (if available)
🌐 Web Crawler: Extract and index content from web pages
🖼️ Vision Support: Image analysis with vision models (if available)
📁 File Operations: Read/write/edit with support for text, CSV, JSON, PDF, DOCX
✏️ Precise File Editing: Safe string replacement with validation and uniqueness checking
🔎 Fast Search Tools: Glob pattern matching and ripgrep content search (10-100x faster)
📋 Todo Tracking: Multi-step task management with real-time progress updates
⚡ Bash Execution: Direct shell command execution with intelligent error handling
🛡️ Git Safety: Protection against dangerous git operations with smart warnings
📝 Plan Mode: Implementation planning workflow for complex tasks
🔄 Background Tasks: Run long operations in parallel without blocking

📖 Project Structure

mnemoai/                      # repo root
├── pyproject.toml                          # Packaging + `mnemoai` CLI entry point
├── requirements.txt                        # Dependencies
├── README.md                               # This file
├── pytest.ini                              # Pytest configuration
├── requirements-dev.txt                    # Dev/test dependencies
│
├── src/mnemoai/              # The single package (src layout)
│   ├── __init__.py
│   ├── __main__.py                         # `python -m mnemoai`
│   ├── main.py                             # Entry point (cli())
│   │
│   ├── client/                             # Client layer
│   │   ├── client.py                       # LangGraphClient facade (lifecycle, MCP, query)
│   │   ├── mcp_tool_wrapper.py             # MCP to LangChain tool adapter
│   │   ├── agent/                          # Agent loop
│   │   │   ├── agent.py                    # LangGraph StateGraph agent with streaming
│   │   │   ├── router.py                   # Query classifier and routing
│   │   │   ├── orchestrator.py             # Task decomposition and worker orchestration
│   │   │   └── reasoning_utils.py          # Reasoning/thinking helpers for aux LLM calls
│   │   ├── ui/                             # User interface
│   │   │   ├── chat_interface.py           # Chat loop
│   │   │   └── spinner.py                  # Loading animations
│   │   ├── managers/                       # Business logic
│   │   │   ├── agent_conversation_manager.py  # Conversation state and token tracking
│   │   │   └── user_profile_manager.py     # User profiling and learning
│   │   └── memory/                         # Memory systems
│   │       ├── episodic_memory.py          # Episodic memory manager
│   │       ├── reflector.py                # ACE Reflector - extracts strategies
│   │       ├── playbook_store.py           # ACE Playbook - stores learned strategies
│   │       ├── faiss_store.py              # FAISS episodic store
│   │       └── chroma_store.py             # ChromaDB episodic store
│   │
│   ├── server/                             # MCP server layer
│   │   ├── server.py                       # FastMCP server (run as a subprocess)
│   │   ├── error_handler.py                # @tool_error_handler decorator (shared)
│   │   └── tools/                          # Tool implementations
│   │       ├── tools_manager.py            # Tool registration
│   │       ├── fs_read.py / fs_write.py / file_edit.py / file_search.py
│   │       ├── execute_bash.py / git_safety.py / todo_manager.py / plan_mode.py
│   │       ├── background_tasks.py / web_crawler.py / web_search.py
│   │       ├── describe_image.py / rag_tool.py
│   │       ├── rag/                        # RAG system (session, vector_store_controller, stores)
│   │       └── readers/                    # File readers (csv/json/pdf/docx/line/dir/search + chunking)
│   │
│   ├── models/                             # Model layer
│   │   ├── provider_params.py              # Single source of truth: per-provider config keys
│   │   ├── mantle_factory.py               # Bedrock Mantle model factory (multi-protocol)
│   │   ├── controllers/                    # Provider-dispatching controllers
│   │   │   ├── base_model_controller.py    # Minimal shared base
│   │   │   ├── llm_controller.py           # LLM initialization
│   │   │   ├── vision_model_controller.py  # Vision model initialization
│   │   │   └── embeddings_controller.py    # Embeddings initialization
│   │   └── chat_models/                    # Concrete LangChain ChatModel subclasses
│   │       ├── chat_ollama_wrapper.py      # Ollama model with penalty support
│   │       └── sagemaker_chat.py           # SageMaker ChatModel for LangChain
│   │
│   └── utils/                              # Utilities
│       ├── config.py                       # Config loader
│       ├── configurator.py                 # First-run setup + /config & /model flows
│       ├── paths.py                        # Central path helper (~/.mnemoai)
│       ├── logger.py                       # Logging utilities
│       ├── bm25.py                         # Lightweight BM25 (hybrid search)
│       ├── config.yaml.example             # Config templates (also .bedrock / .bedrock.mantle)
│       └── formatting/                     # Text formatting (code/url/response)
│
├── tests/                                  # Test suite (pytest)
│   ├── conftest.py                         # Puts src/ on sys.path
│   ├── unit/                               # Fast, deterministic, no deps
│   └── integration/                        # Live agent + Ollama + MCP
│
├── docs/                                   # ARCHITECTURE.md (detailed file map)
└── bash/                                   # Helper scripts
    ├── system-command-app/                 # `mnemoai` wrapper script
    ├── ollama-freeup-vram/                 # VRAM management
    └── ollama-env-mac/                     # Ollama config

🏗️ Architecture

High-Level Overview

┌─────────────────────────────────────────────────────────────┐
│                         main.py                             │
│                    (Application Entry)                      │
└─────────────────────────────┬───────────────────────────────┘
                              │
              ┌───────────────┴───────────────┐
              │                               │
              ▼                               ▼
      ┌─────────────────┐            ┌──────────────────┐
      │ LangGraphClient │◄──────────►│  MCP Server      │
      │  (client.py)    │            │  (server.py)     │
      └────────┬────────┘            └────────┬─────────┘
               │                              │
          ┌────┴─────┐                        ▼
          │          │                   ┌──────────┐
          ▼          ▼                   │  Tools   │
      ┌────────┐ ┌──────────┐            └────┬─────┘
      │  UI    │ │ Managers │                 │
      └────────┘ └──────────┘            ┌────┴────┐
          │          │                   │         │
          └────┬─────┘                   ▼         ▼
               ▼                    ┌──────────┐ ┌─────┐
          ┌──────────┐              │ Readers  │ │ RAG │
          │LangGraph │              └──────────┘ └─────┘
          │  Agent   │
          └──────────┘

🚀 Quick Start

Prerequisites

Required:

Python 3.11+
At least one LLM provider configured and accessible (see below)

LLM Providers (choose at least one):

Provider	Requirements
Ollama (local, recommended for getting started)	Install Ollama, then pull a model: `ollama pull qwen3:4b`
Amazon Bedrock	AWS CLI configured (`aws configure`) with Bedrock access in your region
Amazon SageMaker AI	AWS CLI configured with a deployed SageMaker endpoint
OpenAI	Set `OPENAI_API_KEY` environment variable
LiteLLM	Depends on the underlying provider (see LiteLLM docs)

Optional:

ripgrep — 10-100x faster content search (see installation below)
Embedding model — Required if you enable RAG, Episodic Memory, or ACE Playbook (see Feature Toggles)
Vision model — Required for image analysis (describe_image tool)
Brave Search API key — Required for web search (get one here)

Installation

Recommended: install from PyPI

The published package is mnemoai-assistant (the import name and the CLI command are both mnemoai). No clone needed — install it into an isolated environment and get the mnemoai command on your PATH:

uv tool install mnemoai-assistant     # or: pipx install mnemoai-assistant

Or into the current environment with pip:

pip install mnemoai-assistant

Then configure a user config (see step 4 below) and run:

mnemoai            # verbose (shows thinking)
mnemoai --no-verbose

To upgrade: uv tool upgrade mnemoai-assistant (or pip install -U mnemoai-assistant). To remove: uv tool uninstall mnemoai-assistant.

This is the best choice if you just want to use the assistant. Install from a checkout (below) instead if you plan to edit the source.

Install from a checkout

Clone the repository:

git clone https://github.com/brunopistone/mnemoai.git
cd mnemoai

Install the assistant (choose one):

Option 1: install as a CLI command (`uv tool install`)

This installs the project into its own isolated environment and puts mnemoai on your PATH, so you can run it from any directory (macOS and Linux) without activating anything:

uv tool install .        # or: pipx install .

Then configure a user config (see step 4) and run:

mnemoai            # verbose (shows thinking)
mnemoai --no-verbose

To upgrade after pulling changes: uv tool install --force .. To remove: uv tool uninstall mnemoai.

Pick "run from a checkout" below instead if you plan to actively edit the code, since that runs your working tree directly with no reinstall step.

Option 2: run from a checkout

Set up an environment (choose one), which lets you run the assistant directly from the repo while editing the source live. Because the code uses a src/ layout, run it as a module with src/ on the path:

PYTHONPATH=src python -m mnemoai            # verbose
PYTHONPATH=src python -m mnemoai --no-verbose

(Or pip install -e . once, then just mnemoai.)

Option A: venv

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Option B: uv

uv venv
uv pip install -r requirements.txt

Option C: conda

conda create -n mnemoai python=3.11
conda activate mnemoai
pip install -r requirements.txt

Get the mnemoai command for a checkout install

So you don't have to cd into the repo every time, symlink the bundled wrapper script onto your PATH. It activates the project environment, then runs the app (PYTHONPATH=src python -m mnemoai):

chmod +x bash/system-command-app/mnemoai-wrapper.sh
ln -sf "$(pwd)/bash/system-command-app/mnemoai-wrapper.sh" /usr/local/bin/mnemoai

Now mnemoai works from any directory and always reflects your latest edits. The wrapper auto-activates a project-local .venv (Options A and B) if present, otherwise it falls back to a conda env named mnemoai (Option C) — edit the script if your environment differs.

Install ripgrep (optional but recommended for fast search):

Ripgrep provides 10-100x faster content search than traditional grep. Required for grep_search tool.

macOS:

brew install ripgrep

Ubuntu/Debian:

sudo apt install ripgrep

Fedora/RHEL:

sudo dnf install ripgrep

Windows (via Chocolatey):

choco install ripgrep

From source:

cargo install ripgrep

Verify installation:

rg --version  # Should show ripgrep version

If ripgrep is not installed, the assistant will automatically fall back to using execute_bash with standard grep, but performance will be significantly slower.

Configure the application:

First-run setup (easiest). If you start the assistant and no config is found, an interactive configurator runs automatically. It walks you through: the LLM provider (Ollama / Bedrock / Mantle / OpenAI / Amazon SageMaker AI / LiteLLM) plus chat model, connection details (Ollama host/port; AWS region; for Mantle the API protocol — chat_completions / responses / anthropic; SageMaker region + input format; LiteLLM API base/key; OpenAI uses OPENAI_API_KEY), optional max output tokens (blank or none uses the provider default), and a mandatory max context window (defaults to 65536); the vision model (reusing the chat model's host/region, with its own Mantle protocol and optional max output tokens); your profile name; an optional Brave Search key; and each feature toggle (RAG, episodic memory, ACE playbook, web crawler, query routing, orchestration, user profiling). Every prompt is pre-filled with the template's default, so you can press Enter through the ones you don't care about. It then writes a ready-to-use ~/.mnemoai/config.yaml from the matching template. Just run:

mnemoai      # or, from a checkout: PYTHONPATH=src python -m mnemoai

and follow the prompts. You can re-edit the generated file any time to fine-tune models, prompts, and feature toggles.

Manual setup. Prefer to write it yourself? Copy a template (they live inside the package, under src/mnemoai/utils/):

cp src/mnemoai/utils/config.yaml.example src/mnemoai/utils/config.yaml

Edit that config.yaml with your settings. This file is git-ignored to protect your API keys. At minimum, configure your LLM provider.

The config file is resolved in this order (first match wins):

$MNEMOAI_CONFIG — explicit path (handy for switching between provider configs)
~/.mnemoai/config.yaml — user config used by the installed mnemoai command
<package>/utils/config.yaml — package-relative fallback (used when running from a checkout)

If you installed the CLI with uv tool install (the recommended option), put your config in the user location instead:

mkdir -p ~/.mnemoai
cp src/mnemoai/utils/config.yaml.example ~/.mnemoai/config.yaml
# or, for Bedrock / Mantle:
# cp src/mnemoai/utils/config.yaml.bedrock.example        ~/.mnemoai/config.yaml
# cp src/mnemoai/utils/config.yaml.bedrock.mantle.example ~/.mnemoai/config.yaml

At minimum, configure your LLM provider:

For Ollama (quickest setup):

# Pull a model first
ollama pull qwen3:4b

# utils/config.yaml (minimal)
MODEL_ID:
  NAME: qwen3:4b
  TYPE: ollama
  HOST: localhost
  PORT: 11434
  TEMPERATURE: 0.6

# Profile name (used for session data isolation)
PROFILE:
  NAME: default

# Everything else can be left at defaults or disabled
ENABLE_RAG: false
ENABLE_EPISODIC_MEMORY: false
ENABLE_PLAYBOOK: false
ENABLE_WEB_SEARCH: false
ENABLE_WEB_CRAWL: false

See Configuration for all options and Feature Toggles for enabling advanced features.

Run the assistant:

If you installed with uv tool install (recommended), run the command from anywhere:

mnemoai

If you set up a checkout and symlinked the wrapper, the same command works. Otherwise, run it from the repo directory:

PYTHONPATH=src python -m mnemoai

See bash/system-command-app/README.md for details on the wrapper script.

🔀 Feature Toggles

All advanced features can be independently enabled or disabled in your local utils/config.yaml (copied from config.yaml.example). Here is a quick reference:

Feature	Config Key	Default	Dependencies
RAG (document indexing & search)	`ENABLE_RAG: true`	`true`	Embedding model (`RAG.EMBED_MODEL_ID`)
Episodic Memory (learn from past tasks)	`ENABLE_EPISODIC_MEMORY: true`	`true`	Embedding model (`RAG.EMBED_MODEL_ID`)
ACE Playbook (learn strategies from success/failure)	`ENABLE_PLAYBOOK: true`	`true`	None (embeddings optional for refinement)
User Profiling (personalized responses)	`PROFILE.USE_PROFILING: true`	`true`	Activates after 5+ interactions
Web Search	`ENABLE_WEB_SEARCH: true`	`true`	`BRAVE_API_KEY` configured
Web Crawler	`ENABLE_WEB_CRAWL: true`	`true`	None
Vision (image analysis)	Configure `VISION_MODEL_ID`	Disabled if not set	Vision-capable model
Verbose Mode (show thinking process)	CLI flag `--no-verbose`	Enabled	Supported by model

Dependency note: RAG, Episodic Memory, and ACE Playbook refinement all require a working embedding model. If the embedding model is unavailable, the system falls back to SHA256-based deterministic embeddings with degraded semantic search quality. Configure RAG.EMBED_MODEL_ID in config.yaml to use a real embedding model (see Embeddings Model).

💡 Usage

Basic Chat

Simply type your questions and press Enter. The assistant will respond using available tools when needed.

You: What files are in the current directory?
Assistant: [Uses fs_read tool to list directory contents]

You: Read the README.md file
Assistant: [Uses fs_read tool and displays content]

Commands

Command	Description
`/exit` or `/quit`	Exit the application
`/clear`	Clear conversation history and RAG index
`/save`	Save current conversation
`/load <path>`	Load a saved conversation
`/good`	Mark last response as good (for SFT training)
`/compact [focus]`	Summarize older turns to shrink context (optional focus instructions)
`/config`	Re-run the interactive configurator (overwrites `config.yaml`, then restarts the app in place to apply)
`/model`	Override just one model — chat (LLM), vision, or embeddings — leaving the rest of `config.yaml` untouched, then restart in place
`/params`	Tune a model's inference parameters (temperature, top_p, top_k, penalties, reasoning, stop, stream, …) — only the params the chosen provider supports are offered, then restart in place

Keyboard Shortcuts

Ctrl+J: Insert new line in input
Enter: Submit message
Ctrl+C: Interrupt operation (press twice to exit)

Verbose Mode

Control thinking process visibility:

mnemoai              # Verbose mode (shows thinking)
mnemoai --no-verbose # Hide thinking process
# from a checkout: PYTHONPATH=src python -m mnemoai [--no-verbose]

Component Breakdown

1. Client Layer (`client/`)

The client manages the conversation flow and user interaction.

client.py: Core LangGraph client
- Initializes MCP connection
- Manages conversation state
- Handles model configuration
- Coordinates managers (profile, conversation)
agent.py: LangGraph agent implementation
- State graph with agent and tools nodes
- Streaming support with reasoning display
- Code syntax highlighting
router.py: Query classifier and routing
- Classifies queries into categories (simple_qa, code, research, knowledge, full)
- Routes each category to a specialized tool subset
- Configurable classifier prompt via ROUTING_PROMPT in config
orchestrator.py: Task decomposition and worker orchestration
- Decomposes complex tasks into ordered subtasks with category assignments
- Configurable orchestrator and aggregator prompts via config
reasoning_utils.py: Shared reasoning/thinking helpers
- Temporarily disables reasoning for auxiliary LLM calls (routing, task decomposition) so output lands in the response content
- Extracts visible text from <think> tags and Bedrock thinking blocks
mcp_tool_wrapper.py: MCP to LangChain adapter
- Wraps MCP tools as LangChain BaseTool
- Handles async/sync conversion
ui/: User interface components
- chat_interface.py: Interactive chat loop with command handling
- spinner.py: Loading animations
managers/: Business logic
- agent_conversation_manager.py: Conversation state and token tracking
- user_profile_manager.py: Automatic user profiling and learning

2. Server Layer (`server/`)

MCP server that provides tools to the LLM.

server.py: FastMCP server initialization
error_handler.py: @tool_error_handler decorator (shared by all tools)
tools/: Tool implementations
- tools_manager.py: Centralized tool registration and utilities
- fs_read.py: File reading (text, CSV, JSON, PDF, DOCX)
- fs_write.py: File writing with mandatory user confirmation (dry-run preview)
- file_edit.py: Precise string replacement with validation and uniqueness checking
- execute_bash.py: Shell command execution with intelligent error handling
- file_search.py: Fast file/content search (glob patterns + ripgrep)
- todo_manager.py: Todo list management for multi-step tasks
- web_search.py: Brave Search integration
- web_crawler.py: Web page content extraction with RAG integration
- describe_image.py: Vision model image analysis
- rag_tool.py: RAG tools registration
- rag/: RAG system
  - session.py: Session-scoped RAG management with hybrid search
  - vector_store_controller.py: Vector store abstraction layer
  - faiss_store.py: FAISS vector store implementation
  - chroma_store.py: ChromaDB vector store implementation
- readers/: Specialized file readers
  - line_reader.py, directory_reader.py, search_reader.py
  - csv_reader.py, json_reader.py
  - pdf_reader.py, docx_reader.py
  - chunking_helper.py: Document chunking for RAG

3. Models Layer (`models/`)

Model controllers and custom implementations.

provider_params.py: Single source of truth for the config keys each provider consumes (per modality); controllers build their client kwargs from it via build_kwargs, and /model prunes unsupported keys from it
mantle_factory.py: Bedrock Mantle factory (chat_completions / responses / anthropic protocols), shared by the LLM and vision controllers
controllers/ (provider-dispatching model initialization):
- base_model_controller.py: Minimal shared base type for the controllers
- llm_controller.py: LLM model initialization (Bedrock, Mantle, Ollama, OpenAI, SageMaker AI, LiteLLM)
- vision_model_controller.py: Vision model initialization
- embeddings_controller.py: Embedding model initialization for RAG
chat_models/ (concrete LangChain ChatModel subclasses):
- chat_ollama_wrapper.py: Extends ChatOllama with presence_penalty and frequency_penalty support
- sagemaker_chat.py: Full LangChain BaseChatModel for SageMaker endpoints (streaming, tool calling, reasoning)

4. Utils Layer (`utils/`)

Shared utilities and configuration.

config.py: Configuration loader
configurator.py: First-run interactive setup (when no config resolves) and the /config (full reconfigure) and /model (override one model section) chat commands
paths.py: Central path helper — single source of truth for the app home (~/.mnemoai, override with $MNEMOAI_HOME) and all runtime subdirectories (config, plans, tasks, per-profile, per-model)
config.yaml.example: Configuration template (copy to config.yaml and add your settings; .bedrock and .bedrock.mantle variants also provided)
bm25.py: Lightweight BM25 implementation for hybrid (semantic + keyword) search
logger.py: Logging utilities (stderr output)
formatting/: Text formatting
- code_formatter.py: Code syntax highlighting
- url_formatter.py: URL highlighting
- response_parser.py: Response processing

Data Flow

User Input → ChatInterface → LangGraphClient
Client → Invokes LangGraph agent with MCP tools
Classifier → Routes query to a category (simpleqa, code, research, knowledge, full) (_if routing enabled)
Orchestrator → For full tasks: decomposes into subtasks, spawns workers, aggregates results (if orchestration enabled)
LangGraph → Executes agent node with route-specific tools, decides to use tools
MCP Server → Executes tool (e.g., fs_read, web_search, RAG)
Tool Result → Returned to agent via tools node
LangGraph → Continues agent loop until response complete
Response → Displayed to user via ChatInterface

Session Management

Each chat session has a unique ID used for:

RAG document indexing (session-scoped)
Chunk caching for file summarization
Training data collection (SFT markers)

Session data is stored in ~/.mnemoai/{profile_name}/:

~/.mnemoai/
└── {profile_name}/
    ├── conversations/           # Saved conversations
    ├── profiles/                # User profiles
    ├── todos/                   # Todo list data
    ├── rag_session_id.txt       # Current RAG session
    ├── rag_store_*.faiss        # FAISS vector index (or ChromaDB directory)
    ├── chunk_cache_*.db         # SQLite chunk cache
    └── models/                  # Per-model memory (isolated by chat model)
        └── {sanitized_model}/   # e.g. global.anthropic.claude-fable-5
            ├── episodic_memory/ # Episodic memory store (FAISS or ChromaDB)
            └── playbook/        # ACE playbook strategies and metrics

Model-scoped memory: episodic memory and the playbook live under models/{model}/ so trying a different chat model doesn't contaminate the memory/strategies learned with another. Conversations, todos, RAG, and the user profile remain shared across models.

Context Compaction

To keep long conversations within the model's context window, the assistant compacts history by summarizing it:

Automatic — after a turn pushes the conversation past MAX_CONVERSATION_TOKENS, older messages are summarized into the system prompt while the most recent LLM.KEEP_RECENT_MESSAGES turns are kept verbatim.
Manual — run /compact any time (optionally /compact <focus instructions> to steer what the summary emphasizes). Manual compaction keeps a smaller recent window (LLM.MANUAL_COMPACT_KEEP_RECENT).

The kept-verbatim window is bounded by both a message count and a token budget (LLM.KEEP_RECENT_TOKEN_BUDGET, default 25% of MAX_CONVERSATION_TOKENS). Walking newest→oldest, a message that would exceed the budget is summarized instead of kept — so a single oversized recent message (e.g. a pasted document that alone fills the context window) cannot survive compaction verbatim.

The summary preserves topics, decisions, and tool calls/results (which tools ran, their inputs, and outcomes), so the agent retains actionable context after compacting.

🚀 Productivity Tools

The assistant includes specialized tools for efficient code and file manipulation:

📋 Todo List Management

Track multi-step tasks with automatic status management:

Tools:

todo_write(todos): Update the todo list
todo_read(): View current todos
todo_clear(): Clear all todos

Features:

Three states: pending, in_progress, completed
Enforces exactly ONE task in progress at a time
Real-time progress tracking
Stored in ~/.mnemoai/{profile}/todos/current_todos.json

Usage Example:

You: Implement user authentication
Assistant: [Creates todos for: database setup, API endpoints, frontend integration, testing]
Assistant: [Marks first todo as in_progress]
Assistant: [Completes each step, updating todos in real-time]

🔎 Fast Search Tools

High-performance file and content searching:

Glob Search (File Names)

Find files by name patterns:

glob_search(pattern="**/*.py")  # All Python files recursively
glob_search(pattern="src/**/*.ts", max_results=100)  # TypeScript in src/
glob_search(pattern="test_*.py", sort_by_mtime=False)  # Unsorted for speed

Parameters:

pattern: Glob pattern (e.g., **/*.py, *.{yaml,json})
path: Directory to search (default: current directory)
max_results: Limit results (default: 1000, use 0 for unlimited)
sort_by_mtime: Sort by modification time (default: True)

Performance: Best for project/codebase searches. For system-wide searches (entire home directory), the assistant automatically uses find command instead.

Grep Search (File Content)

Search within file contents using ripgrep:

grep_search(pattern="class Foo")  # Find class definitions
grep_search(pattern="TODO|FIXME", file_pattern="*.py", case_insensitive=True)
grep_search(pattern="import React", output_mode="content")  # Show matched lines

Parameters:

pattern: Regex pattern to search for
path: Directory to search (default: current directory)
file_pattern: Filter by file type (e.g., *.py, *.{ts,tsx})
case_insensitive: Case-insensitive search (default: False)
output_mode: files_with_matches (default), content, or count
context_lines: Lines of context around matches
max_results: Maximum matches per file (default: 100)

Requirements: Requires ripgrep installed (see Installation section)

Performance: 10-100x faster than traditional grep for large codebases.

✏️ Precise File Editing

Safe string replacement with validation:

file_edit(
    file_path="/path/to/file.py",
    old_string="def old_function():\n    pass",
    new_string="def new_function():\n    return True",
    replace_all=False  # Requires uniqueness (default)
)

Safety Features:

Validates file exists before editing
Checks that old_string exists in file
Enforces uniqueness (prevents accidental multiple replacements)
Provides detailed error messages with troubleshooting steps
Returns line count changes

Best Practice Workflow:

Read the file first with fs_read
Copy the EXACT text you want to replace (including whitespace)
Create the new version with your changes
Call file_edit with exact strings

Error Handling: If the string isn't unique, the tool provides the line numbers where it appears so you can add more context.

🛡️ Enhanced Error Handling

All tools now provide intelligent error messages with troubleshooting guidance:

Example Error Response:

{
  "error": true,
  "error_type": "FileNotFoundError",
  "message": "File or directory not found: /path/to/file.txt",
  "next_steps": [
    "Verify the file path is correct",
    "Use glob_search to find files by pattern",
    "Check with execute_bash('ls -la /parent/dir')",
    "Ensure you have read permissions"
  ],
  "original_error": "..."
}

Handled Error Types:

FileNotFoundError
PermissionError
IsADirectoryError
JSONDecodeError
Encoding errors
Command execution errors
Timeout errors

📁 File Write Confirmation

fs_write now requires mandatory user confirmation:

Two-Step Process:

Preview (dry_run=True): Shows what will happen
Confirm: User explicitly approves
Execute (confirmed=True): Actually performs the operation

This prevents accidental file overwrites and gives users control over file system modifications.

🛡️ Git Safety

Safe git operations with protection against common mistakes:

Tools:

git_safe(command="...") - Execute git commands with safety checks
git_status_safe() - Comprehensive status with warnings
git_commit_safe(message="...", add_all=True) - Safe commits with staging

Protected Operations:

Operation	Protection
Force push to main/master	Blocked
`git reset --hard`	Warning + confirmation required
`git push --force`	Warning (use `--force-with-lease`)
`git commit --amend`	Checks if already pushed
Skip hooks (`--no-verify`)	Warning
Force delete branch (`-D`)	Warning

Example:

# Safe - uses git_safe with protections
git_safe(command="push origin feature-branch")

# Dangerous - requires confirmation
git_safe(command="reset --hard HEAD~1", allow_dangerous=True, reason="Discarding failed experiment")

📝 Plan Mode

Implementation planning workflow for complex tasks:

Workflow:

enter_plan_mode(task_description="Add user authentication")
Explore codebase with search tools
add_plan_step(step_number=1, title="Create user model", description="...")
add_plan_file(file_path="models/user.py", action="create")
add_plan_risk(risk="Migration needed", mitigation="Add migration script")
present_plan() - Show user for approval
approve_plan() + exit_plan_mode() - Start implementing

When to Use:

New feature with multiple files
Architectural decisions needed
Multi-step refactoring
Unclear requirements

Plan Storage: ~/.mnemoai/plans/current_plan.json Task Output: ~/.mnemoai/tasks/

🔄 Background Tasks

Run long operations in parallel without blocking:

Tools:

start_background_task(command="...", description="...") - Start task
get_task_status(task_id="...") - Check progress
get_task_output(task_id="...") - Get output
list_background_tasks() - See all tasks
cancel_background_task(task_id="...") - Stop task
wait_for_task(task_id="...", timeout_seconds=300) - Wait for completion

When to Use:

Running full test suites
Building large projects
Installing dependencies
Running linters on entire codebase
Any command > 30 seconds

Example:

# Start tests in background
result = start_background_task(command="pytest", description="Running tests")
# Returns: {"task_id": "abc123", ...}

# Check status later
get_task_status(task_id="abc123")

# Get output when done
get_task_output(task_id="abc123", tail_lines=50)

Task Storage: Output logs saved to ~/.mnemoai/tasks/

🔧 Configuration

Model Configuration

The assistant supports multiple model types:

Amazon Bedrock

MODEL_ID:
  NAME: us.amazon.nova-pro-v1:0
  TYPE: bedrock
  REGION: us-east-1
  TEMPERATURE: 0.1

Note: Newer Claude models on Bedrock reject temperature as deprecated. Omit TEMPERATURE for those — it is only sent when explicitly configured.

Using a named AWS profile (Bedrock, SageMaker, Mantle). These providers use the standard boto3 credential chain (default profile / env vars / instance role). To select a specific named profile instead, set AWS_PROFILE via the config ENV: section — values there are exported as environment variables at startup, and boto3 picks them up automatically. No model-level config key is needed:
ENV:
  AWS_PROFILE: my-bedrock-profile
  # AWS_REGION: us-east-1   # any AWS env var works here too

Using a Bedrock API key (instead of AWS credentials). Bedrock supports short-term API keys (a bedrock-api-key-... value from the console). For standard Bedrock (TYPE: bedrock), set it as AWS_BEARER_TOKEN_BEDROCK — langchain-aws reads it automatically, no model config needed:
ENV:
  AWS_BEARER_TOKEN_BEDROCK: bedrock-api-key-XXXXXXXX
(For Mantle, the same key is supplied differently — see the Mantle section below.)

Amazon Bedrock Mantle

Bedrock Mantle is an OpenAI-compatible API (not the Bedrock Converse API). By default it authenticates with a short-lived bearer token minted from your standard AWS credentials via aws-bedrock-token-generator, so your normal aws configure / SSO setup works — no extra keys to manage. Use TYPE: mantle and a bare model ID from the Mantle catalog.

MODEL_ID:
  NAME: qwen.qwen3-32b # bare Mantle model id (e.g. anthropic.claude-opus-4-8)
  TYPE: mantle
  REGION: us-east-1
  MAX_TOKENS: 8192

Authenticating with a Bedrock API key (no AWS credentials). Instead of minting a token, you can supply a short-term Bedrock API key directly. Mantle reads it from the BEDROCK_API_KEY environment variable (set it via the config ENV: section), or from a per-model API_KEY field. When a key is present it's used as-is; otherwise the app falls back to minting from AWS credentials. (Note: standard Bedrock uses AWS_BEARER_TOKEN_BEDROCK for the same key — Mantle uses BEDROCK_API_KEY.)

# Option A — environment variable (applies to all Mantle calls)
ENV:
  BEDROCK_API_KEY: bedrock-api-key-XXXXXXXX

# Option B — per-model key
MODEL_ID:
  NAME: qwen.qwen3-32b
  TYPE: mantle
  REGION: us-east-1
  API_KEY: bedrock-api-key-XXXXXXXX

API protocols. Mantle serves models under three protocols. Select with API_PROTOCOL (works for both chat and vision):

chat_completions (default) — base /v1, OpenAI Chat Completions API. Most models (Qwen, Gemma, GPT-OSS, DeepSeek, …).
responses — base /openai/v1, OpenAI Responses API. Required by models that only expose Responses, such as openai.gpt-5.4.
anthropic — base /anthropic, Anthropic Messages API. For Claude models (e.g. anthropic.claude-haiku-4-5).

# OpenAI Responses model (e.g. GPT-5.4)
MODEL_ID:
  NAME: openai.gpt-5.4
  TYPE: mantle
  REGION: us-west-2 # gpt-5.4 is in us-west-2, not us-east-1
  API_PROTOCOL: responses
  MAX_TOKENS: 8192

# Anthropic Claude model
MODEL_ID:
  NAME: anthropic.claude-haiku-4-5
  TYPE: mantle
  REGION: us-east-1
  API_PROTOCOL: anthropic
  MAX_TOKENS: 8192

ENDPOINT_URL is optional; it defaults to https://bedrock-mantle.<REGION>.api.aws/{v1 | openai/v1 | anthropic} depending on the protocol.
The Mantle catalog (Qwen, Mistral, DeepSeek, GLM, Gemma, Claude, GPT-5.4, …) differs from standard Bedrock and varies by account/region.
TYPE: mantle works for both MODEL_ID (chat) and VISION_MODEL_ID (image description) — vision-capable models like qwen.qwen3-vl-235b-a22b-instruct are supported.
Caveats: Pick the right API_PROTOCOL per model (using the wrong one returns a 400 "does not support the '/v1/…' API" error). anthropic requires the langchain-anthropic package (in requirements.txt). Models like anthropic.claude-fable-5 also require the account's data-retention mode to be provider_data_share, otherwise they report unavailable.

For standard Bedrock (Converse API), ENDPOINT_URL is also accepted on MODEL_ID/VISION_MODEL_ID with TYPE: bedrock to override the default endpoint.

Ollama (Local)

MODEL_ID:
  NAME: qwen3-4b-thinking-2507-q6-k:latest
  TYPE: ollama
  HOST: localhost
  PORT: 11434
  REPETITION_PENALTY: 1.1
  PRESENCE_PENALTY: 1.5
  TEMPERATURE: 0.1
  TOP_P: 0.95

OpenAI

MODEL_ID:
  NAME: gpt-5-mini-2025-08-07
  TYPE: openai
  STREAM: true
  REASONING_EFFORT: medium
# Requires OPENAI_API_KEY environment variable

Amazon SageMaker AI

MODEL_ID:
  NAME: your-endpoint-name
  TYPE: sagemaker
  REGION: us-east-1
  REPETITION_PENALTY: 1.1
  PRESENCE_PENALTY: 1.5
  TEMPERATURE: 0.1
  MAX_TOKENS: 4096

LiteLLM (100+ Providers)

MODEL_ID:
  NAME: openai/your-model-name
  TYPE: litellm
  API_BASE: http://localhost:8000/v1
  API_KEY: your-api-key
  TEMPERATURE: 0.1
  MAX_TOKENS: 4096

Vision Model Configuration

For Bedrock:

VISION_MODEL_ID:
  NAME: global.anthropic.claude-haiku-4-5-20251001-v1:0
  TYPE: bedrock
  REGION: us-east-1
  TEMPERATURE: 0.3

For Ollama:

VISION_MODEL_ID:
  NAME: qwen3-vl:2b
  TYPE: ollama
  HOST: localhost
  PORT: 11434
  TEMPERATURE: 0.3

For OpenAI:

VISION_MODEL_ID:
  NAME: gpt-5-mini-2025-08-07
  TYPE: openai
  STREAM: true
  REASONING_EFFORT: medium

For SageMaker AI (endpoint must serve a vision-capable model accepting the OpenAI image format):

VISION_MODEL_ID:
  NAME: your-endpoint-name
  TYPE: sagemaker
  REGION: us-east-1
  INPUT_FORMAT: openai_chat
  TEMPERATURE: 0.3

For LiteLLM (any of its vision-capable models):

VISION_MODEL_ID:
  NAME: openai/gpt-4o # provider-prefixed model id
  TYPE: litellm
  API_BASE: http://localhost:4000 # optional (proxy / self-hosted)
  API_KEY: your-api-key # optional (else the provider's env var)

Model Parameters

This is the full reference for what you can put under MODEL_ID, VISION_MODEL_ID, and RAG.EMBED_MODEL_ID. Only NAME and TYPE are required; everything else is optional and omitted keys fall back to the provider/model default. The interactive configurator (/config, /model) sets the common ones — use this reference to hand-tune config.yaml for anything else a provider or model supports.

Identity, connection & auth

Parameter	Applies to `TYPE`	Description
`NAME`	all (required)	Model id / Ollama model / Bedrock model id / Mantle bare id / SageMaker endpoint name
`TYPE`	all (required)	`ollama`, `bedrock`, `mantle`, `openai`, `sagemaker`, `litellm` (embeddings: `ollama`, `bedrock`, `openai`, `sagemaker`)
`HOST`	`ollama`	Ollama host (default `localhost`)
`PORT`	`ollama`	Ollama port (default `11434`)
`REGION`	`bedrock`, `mantle`, `sagemaker`	AWS region (default `us-east-1`)
`API_PROTOCOL`	`mantle`	`chat_completions` (default), `responses`, or `anthropic`
`ENDPOINT_URL`	`bedrock`, `mantle`	Override the default endpoint URL
`API_KEY`	`mantle`, `litellm`	Mantle: Bedrock API key (else `BEDROCK_API_KEY` env / minted token). LiteLLM: provider key
`API_BASE`	`litellm`	LiteLLM API base URL
`INPUT_FORMAT`	`sagemaker`	`openai_chat` (default) or `huggingface`

Standard Bedrock also reads the AWS_BEARER_TOKEN_BEDROCK env var, and all AWS providers honor AWS_PROFILE — see the API-key/profile notes under Amazon Bedrock.

Inference parameters

Optional generation settings. The Honored by column lists the providers that actually send each one (others ignore it). These apply to MODEL_ID and VISION_MODEL_ID; EMBED_MODEL_ID takes none of them (embeddings only use NAME/TYPE + connection).

This table is derived from models/provider_params.py — the single source of truth that the controllers build their client kwargs from — so it reflects exactly what each provider's init path forwards. (mantle reads TEMPERATURE/MAX_TOKENS/TOP_P via the Mantle factory.)

Parameter	Description	Honored by (`MODEL_ID`)
`MAX_TOKENS`	Max output tokens to generate	ollama, bedrock, mantle, openai, sagemaker, litellm
`TEMPERATURE`	Sampling temperature	ollama, bedrock, mantle, openai, sagemaker, litellm
`TOP_P`	Top-p (nucleus) sampling	ollama, bedrock, mantle, openai, sagemaker, litellm
`TOP_K`	Top-k sampling	ollama, sagemaker
`STOP`	Stop sequences (YAML list)	ollama, bedrock, sagemaker, litellm
`STREAM`	Stream tokens (default `true`)	mantle, openai, litellm
`PRESENCE_PENALTY`	Presence penalty	ollama, openai
`FREQUENCY_PENALTY`	Frequency penalty	ollama
`REPETITION_PENALTY`	Repetition penalty	ollama, litellm
`REASONING`	Enable extended thinking (boolean)	bedrock
`THINKING_TOKENS`	Thinking token budget (default `2048`)	bedrock
`REASONING_EFFORT`	`low`/`medium`/`high`/`max`	openai (also maps to Bedrock thinking budget)

VISION_MODEL_ID supports the same six providers as MODEL_ID. It accepts a subset of params: MAX_TOKENS/TEMPERATURE/TOP_P across providers, plus TOP_K/STOP on ollama and sagemaker. Connection keys follow the provider (host/port, region, Mantle protocol, SageMaker INPUT_FORMAT, LiteLLM API_BASE/API_KEY).

Provider-appropriate tuning matters. Newer Claude and GPT models reject TEMPERATURE outright; STOP, penalties, and TOP_K are largely Ollama/SageMaker concepts. When /model switches a section's provider it drops the keys the new provider doesn't consume for you, but for everything else edit config.yaml to match what your specific provider/model accepts.

The context window is set separately, at the top level (it's not part of a model section): MAX_CONVERSATION_TOKENS (see General Parameters below).

General Parameters

# Context window size (passed to model as num_ctx for Ollama)
MAX_CONVERSATION_TOKENS: 65536

# Maximum tokens when reading documents (CSV, JSON, text files)
DOC_MAX_TOKENS: 16384

# Profile configuration
PROFILE:
  NAME: default # Used for session data isolation (~/.mnemoai/{NAME}/)
  USE_PROFILING: true # Enable automatic user profiling

Embeddings Configuration

Embeddings settings are nested under the RAG section:

RAG:
  EMBEDDINGS:
    CACHE_ENABLED: true # LRU cache for embedding vectors (avoids re-embedding same text)
    CACHE_SIZE: 1000 # Maximum cached embeddings
    FALLBACK_ENABLED: true # Fall back to SHA256 if embedding model unavailable
    FALLBACK_TYPE: "sha256" # Fallback type (sha256, random, zeros)

LLM Interaction Configuration

LLM:
  ENABLE_THINKING: true # Enable thinking tags (verbose mode)
  RETRY_ENABLED: true # Retry failed LLM calls
  MAX_RETRIES: 3 # Maximum retry attempts
  RETRY_DELAY: 1.0 # Seconds between retries
  RETRY_BACKOFF: 2.0 # Exponential backoff multiplier
  SUMMARIZATION_THINK: false # Include thinking in summarization
  TOKEN_COUNTING:
    OLLAMA_APPROXIMATION: 1.3 # Chars-to-tokens multiplier for Ollama
    FALLBACK_MODEL: "gpt-4" # Tiktoken model for fallback counting

System Prompt

The system prompt in config.yaml defines the assistant's behavior. Customize the SYSTEM_PROMPT field to change the assistant's personality, instructions, and tool usage patterns. Key sections in the default prompt:

<identity>: Basic identity and core principles
<reasoning_discipline>: Thinking rules and loop detection
<output_format>: Response formatting requirements
<information_sources>: RAG vs web vs internal knowledge decision tree
<file_operations>: Read/write/edit workflow rules
<search_tools>: Glob and grep usage guidance
<git_operations>: Git safety rules
<task_management>: Todo, plan mode, and background task rules
<error_handling>: Error response guidelines
<communication>: Style and security rules

RAG Configuration

ENABLE_RAG: true # Master toggle for RAG system
RAG:
  MAX_TOKENS: 8192 # Threshold: documents above this are ingested into RAG
  CHUNK_TOKENS: 1024 # Chunk size in tokens (recommended: 512-2048)
  SEARCH:
    SEMANTIC_WEIGHT: 0.5 # Semantic similarity weight (0-1)
    KEYWORD_WEIGHT: 0.5 # BM25 keyword weight (0-1)
  VECTOR_STORE:
    TYPE: chromadb # Vector store backend: "faiss" or "chromadb"
  EMBEDDINGS:
    CACHE_ENABLED: true
    CACHE_SIZE: 1000
    FALLBACK_ENABLED: true
    FALLBACK_TYPE: "sha256"

Requires: An embedding model configured via RAG.EMBED_MODEL_ID (see Embeddings Model).

Episodic Memory Configuration

ENABLE_EPISODIC_MEMORY: true
EPISODIC_MEMORY:
  STORE_TYPE: chromadb # or faiss
  # Similarity Thresholds
  DUPLICATE_THRESHOLD: 0.95 # Higher = stricter duplicate detection
  RETRIEVAL_THRESHOLD: 0.7 # Minimum similarity to retrieve episodes
  FOLLOW_UP_THRESHOLD: 0.4 # Similarity to detect follow-up questions (skips injection)
  REDUNDANCY_THRESHOLD: 0.5 # Filter episodes redundant with conversation
  # Hybrid Search Weights
  SEMANTIC_WEIGHT: 0.7 # Semantic similarity weight (0-1)
  KEYWORD_WEIGHT: 0.3 # Keyword matching weight (0-1)
  # Token and Size Limits
  MAX_TOKENS_PER_EPISODE: 400 # Max tokens for episode text
  MAX_EPISODES: 1000 # Maximum stored episodes
  MAX_AGE_DAYS: 90 # Maximum episode age in days
  # Success Detection
  SUCCESS_MARKERS: # Phrases that indicate task success
    - thanks
    - perfect
    - great
    - worked
  CORRECTION_MARKERS: # Phrases that indicate errors
    - wrong
    - error
    - fix
    - actually
  # Storage Behavior
  IMMEDIATE_STORAGE: true # Store episodes immediately
  MIN_TOOLS_OR_LENGTH: 300 # Min response length if no tools used
  # Query Enhancement
  ENABLE_QUERY_EXPANSION: true # Expand queries with synonyms
  QUERY_EXPANSION_TERMS: 3 # Max terms to add per query

Requires: An embedding model configured via RAG.EMBED_MODEL_ID (see Embeddings Model).

How it works:

Automatically stores successful task completions with full conversation context
Uses hybrid search (70% semantic + 30% BM25) to find similar past tasks
Conversation-aware injection: Only injects episodic memory when relevant
- Detects follow-up questions and skips injection (uses conversation context instead)
- Filters out episodes redundant with current conversation
- Uses semantic similarity (with embeddings) or Jaccard similarity (fallback)
Injects compact context showing: task → tools used → outcome
Automatic cleanup: keeps max 1000 episodes, removes entries older than 90 days

Success detection:

User feedback: "thanks", "perfect", "great"
No error markers in response
All tools executed successfully
Filters out simple greetings and short responses

Embeddings Model

All embedding configuration is nested under RAG::

For Bedrock:

RAG:
  EMBED_MODEL_ID:
    NAME: amazon.titan-embed-text-v2:0
    TYPE: bedrock
    REGION: us-east-1

For Ollama:

RAG:
  EMBED_MODEL_ID:
    NAME: mxbai-embed-large
    TYPE: ollama
    HOST: localhost
    PORT: 11434

For OpenAI:

RAG:
  EMBED_MODEL_ID:
    NAME: text-embedding-ada-002
    TYPE: openai

For SageMaker:

RAG:
  EMBED_MODEL_ID:
    NAME: your-endpoint-name
    TYPE: sagemaker
    REGION: us-east-1

For LiteLLM (any of its 100+ providers via one OpenAI-style API):

RAG:
  EMBED_MODEL_ID:
    NAME: openai/text-embedding-3-small # provider-prefixed model id
    TYPE: litellm
    API_BASE: http://localhost:4000 # optional (proxy / self-hosted)
    API_KEY: your-api-key # optional (else the provider's env var)

Vector Store Options:

ChromaDB (default): Persistent vector database with built-in metadata support
FAISS: Fast, in-memory vector search with disk persistence

Switch between stores by changing RAG.VECTOR_STORE.TYPE in config. The system uses a controller pattern, so all RAG functionality works identically regardless of the store.

📚 Advanced Features

Query Routing

When enabled, the assistant classifies each query before processing it and routes it to a specialized tool subset. This reduces noise for the model and improves response quality.

Categories:

Route	Description	Tools Available
`simple_qa`	Greetings, explanations, general knowledge	None (direct LLM answer)
`code`	File ops, code editing, git, shell commands	fs_read, fs_write, file_edit, bash, git, search, etc
`research`	Web search, URL fetching	web_search, web_crawler
`knowledge`	Document reading, indexing, RAG queries	pdf/csv/docx/json readers, RAG tools, fs_read
`full`	Multi-category or ambiguous tasks	All tools (fallback)

How it works:

A lightweight LLM call classifies the query into one of the categories above
The agent node binds only the tools for that category
If a query spans multiple categories, it routes to full (all tools)
The classifier prompt is customizable via ROUTING_PROMPT in config.yaml

Configuration:

ENABLE_ROUTING: true
ROUTING_PROMPT: |
  # Custom classifier prompt (optional, has a sensible default)
  ...

Orchestrator-Workers

When enabled alongside routing, tasks classified as full (spanning multiple categories) are automatically decomposed into focused subtasks executed by specialized workers.

How it works:

Orchestrator: An LLM call decomposes the complex query into ordered subtasks, each assigned a category (code, research, knowledge, etc.)
Workers: Each subtask is executed by a worker agent with only the tools for its category. Workers run sequentially — each receives context from previously completed subtasks.
Aggregator: If there were multiple subtasks, a final LLM call synthesizes all worker results into a single coherent response.

Example flow for "Read this PDF and write a summary to a file":

Orchestrator decomposes into:
  [Step 1/2: Read and summarize the PDF document]        → knowledge worker
  [Step 2/2: Write the summary to summary.md]            → code worker
  [Synthesizing results...]                               → aggregator

Configuration:

ENABLE_ROUTING: true # Required
ENABLE_ORCHESTRATION: true # Activates orchestrator for 'full' route
# ORCHESTRATOR_PROMPT: |      # Optional: customize decomposition prompt
# AGGREGATOR_PROMPT: |        # Optional: customize synthesis prompt

When orchestration is disabled, full routes use all tools in a single agent loop (the previous behavior). No regression.

Web Search Configuration

This tool uses the Brave Search API. Obtain an API key from Brave Search Developer Portal.

BRAVE_API_KEY: your-api-key-here # For web search

Web Crawler Configuration

Enable web page content extraction with automatic RAG integration:

ENABLE_WEB_CRAWL: true

When enabled, the web_crawler tool:

Extracts content from web pages as markdown
Automatically ingests large pages (>8K tokens) into RAG (if enabled)
Uses the same chunking configuration as PDF/DOCX readers

Browser dependency. Crawling uses a headless Chromium via Playwright, whose browser binary is a separate ~260MB download not pulled in by pip / uv tool install. The tool installs it automatically on the first crawl after a fresh install/upgrade. If that auto-install fails (e.g. offline), run it manually in the same environment: python -m playwright install chromium (for an installed CLI: ~/.local/share/uv/tools/mnemoai/bin/python -m playwright install chromium).

RAG (Retrieval-Augmented Generation)

The RAG system automatically indexes documents for semantic search with hybrid search (semantic embeddings + BM25 keyword scoring).

How it works:

Read a PDF/DOCX file → Automatically chunked and indexed
Ask questions → Assistant searches indexed documents first using hybrid search
Session-scoped → Cleared on /clear or exit

RAG Tools:

list_documents(): Show indexed documents
search_in_documents(query, top_k): Hybrid semantic + BM25 search
clear_documents(): Clear RAG index

Configuration:

RAG.CHUNK_TOKENS: Chunk size (recommended: 512-2048)
RAG.VECTOR_STORE.TYPE: Choose between faiss or chromadb
RAG.SEARCH.SEMANTIC_WEIGHT / RAG.SEARCH.KEYWORD_WEIGHT: Configurable hybrid weights
Recursive chunking with 10% overlap
Hybrid search: BM25 (Okapi BM25 with TF-IDF, term saturation, length normalization) + semantic similarity
Independent candidate retrieval from both BM25 and embeddings, merged and re-ranked

Vector Store Options:

ChromaDB: Persistent vector database with metadata support (default)
FAISS: Fast in-memory search with disk persistence

The system uses a VectorStoreController for easy switching between stores. All functionality (indexing, searching, clearing) works identically regardless of the chosen store.

User Profile Learning

After 5+ interactions, the assistant builds a profile:

Cognitive style: Analytical, creative, pragmatic, systematic
Domain expertise: Python, AWS, DevOps, ML, etc.
Learning style: Visual, hands-on, theoretical
Communication patterns: Tone, complexity, question styles
Code preferences: Testing, documentation, type hints

Profile is automatically injected into system prompt for personalization.

Episodic Memory

The episodic memory system learns from successful task completions and retrieves similar solutions for future queries.

How it works:

Automatic Storage: After each successful interaction, stores:
- Initial user query
- Full conversation context
- Tools used with arguments
- Final solution
- Timestamp
Hybrid Search: Retrieves similar episodes using:
- 70% semantic similarity (task intent)
- 30% BM25 keyword scoring (tool names, action verbs)

Context Injection: Before processing queries, injects compact context:

[Episodic Memory - Similar Past Tasks]
1. "read DOCX about ML" → fs_read → success (similarity: 0.85)
2. "analyze PDF report" → fs_read, web_search → success (similarity: 0.78)

Automatic Cleanup: Maintains bounded memory:
- Max 1000 episodes
- Removes entries older than 90 days
- Runs on startup

Success Detection:

User feedback: "thanks", "perfect", "great", "worked"
No error markers in response
All tools executed successfully
Filters out greetings and simple acknowledgments (<300 chars, no tools)

Storage Location:

FAISS: ~/.mnemoai/{profile}/models/{model}/episodic_memory/episodic.index
ChromaDB: ~/.mnemoai/{profile}/models/{model}/episodic_memory/

Configuration:

ENABLE_EPISODIC_MEMORY: true
EPISODIC_MEMORY:
  STORE_TYPE: chromadb # or faiss
RAG:
  EMBED_MODEL_ID: # Required for both stores
    NAME: mxbai-embed-large
    TYPE: ollama

ACE Playbook (Agentic Context Engineering)

The ACE Playbook learns strategies from both successes AND failures, implementing the Agentic Context Engineering framework for continuous improvement.

How it works:

Reflector: After each interaction, analyzes tool executions:
- Detects failure patterns (file not found, string not found, permission denied, etc.)
- Identifies successful strategies for specific tools (file_edit, execute_bash)
- Extracts specific, actionable insights (not generic summaries)
- Tracks metrics (success/failure rates, failure types) in metrics.json

Playbook Store: Maintains structured strategy entries:

{
  "context": "editing python files",
  "strategy": "Read the file first to get exact string including whitespace before using str_replace",
  "source": "Failed file_edit on 2026-02-01: string_not_found",
  "outcome": "failure",
  "tools": ["file_edit"],
  "confidence": 0.9
}

Context Injection: Injects relevant strategies into the system prompt at startup:

[Playbook - Learned Strategies]
Avoid these patterns:
  ✗ [editing files]: Read the file first to get exact string before str_replace
Effective strategies:
  ✓ [searching files]: Use glob_search instead of find for better performance

Lazy Refinement: Only deduplicates when hitting token limits, using semantic similarity if embeddings are configured.

What gets stored:

Failures: Specific patterns like string_not_found, file_not_found, permission_denied, command_failed, etc.
Successes: Only for tools with reusable patterns (file_edit, execute_bash with specific commands)
Not stored: Generic successes without actionable strategies

Key Differences from Episodic Memory:

Feature	Episodic Memory	ACE Playbook
Stores	Full task completions	Granular strategies
Learns from	Successes only	Successes AND failures
Format	Conversation context	Structured rules
Retrieval	Semantic similarity	Context + tool matching

Configuration:

ENABLE_PLAYBOOK: true
PLAYBOOK:
  MAX_ENTRIES: 500 # Maximum entries before refinement
  SIMILARITY_THRESHOLD: 0.85 # Threshold for merging similar strategies
  MAX_INJECT: 10 # Maximum entries to inject per query

Storage Location:

Strategies: ~/.mnemoai/{profile}/models/{model}/playbook/playbook.json
Metrics: ~/.mnemoai/{profile}/models/{model}/playbook/metrics.json

Training Data Collection

Supervised Fine-Tuning (SFT)

Use /good to mark high-quality responses
Saved conversations include quality markers
Extract labeled interactions for training

📦 Dependencies

All Python dependencies are listed in requirements.txt. The new productivity tools use only standard library features:

Tool	Python Packages	External Tools
TodoWrite	Standard library only	None
Edit Tool	Standard library only	None
Glob Search	Standard library (`glob`)	None
Grep Search	Standard library (`subprocess`)	ripgrep (optional)
Error Handler	Standard library (`functools`)	None
Git Safety	Standard library (`subprocess`)	git
Plan Mode	Standard library (`json`, `os`)	None
Background Tasks	Standard library (`threading`)	None

External Tools:

ripgrep: Required for grep_search tool. Install via system package manager (see Installation section). If not installed, the assistant automatically falls back to slower alternatives.

Core Python Packages:

langgraph: Agent orchestration framework
langchain, langchain-core: LLM abstraction layer
langchain-ollama: Ollama integration
langchain-aws: AWS Bedrock integration
langchain-openai: OpenAI integration (also used for Bedrock Mantle OpenAI/Responses protocols)
langchain-anthropic: Anthropic integration (Bedrock Mantle anthropic protocol)
aws-bedrock-token-generator: Bearer-token auth for Bedrock Mantle
mcp, mcp[cli]: Model Context Protocol
ollama: Local LLM support
boto3: AWS Bedrock/SageMaker
tiktoken: Token counting
chromadb, faiss-cpu: Vector stores for RAG
PyPDF2, python-docx: Document readers
Pygments: Code syntax highlighting
prompt_toolkit: Interactive CLI
brave-search-python-client: Web search
crawl4ai: Web crawling

🛠️ Development

Testing

The test suite uses pytest and is split into two tiers under tests/:

tests/unit/ — fast, deterministic tests for pure logic (BM25, reasoning helpers, response parsing, subtask parsing, the tool error handler, git-safety command classification, file editing/search, bash timeout handling, and episodic-memory heuristics). No LLM, Ollama, or network required, so they run in seconds and don't need a config.yaml.
tests/integration/ — end-to-end tests that drive the real agent against a live Ollama server and the MCP subprocess (routing, tool calls, bash timeout, no silent empty turns). Marked with @pytest.mark.integration and auto-skipped unless a runtime utils/config.yaml exists and the configured Ollama host is reachable.

# Install test dependencies
pip install -r requirements-dev.txt

# Run everything (integration auto-skips if Ollama/config aren't available)
python -m pytest

# Unit tier only (fast — good for CI and pre-commit)
python -m pytest tests/unit

# Integration tier only (requires Ollama running + a real config.yaml)
python -m pytest -m integration

# Run a single file
python -m pytest tests/unit/test_bm25.py

When adding new code, keep import-time side effects independent of config.yaml so the module stays unit-testable.

Adding New Tools

Create tool file in server/tools/:

from mcp.server.fastmcp import FastMCP

def register_your_tool(mcp: FastMCP):
    @mcp.tool()
    async def your_tool(param: str) -> str:
        """Tool description for the LLM."""
        # Implementation
        return result

from .your_tool import register_your_tool
register_your_tool(mcp)

Adding New File Readers

Create reader in server/tools/readers/:

async def read_your_format(path: str) -> str:
    """Read your custom format."""
    # Implementation
    return content

from .readers.your_reader import read_your_format
# Add to file type detection logic

Switching Model Providers

The application uses controller classes for centralized model management. To switch providers, just update config.yaml:

For LLM:

MODEL_ID:
  NAME: your-model-name
  TYPE: ollama # or bedrock, sagemaker

For Vision:

VISION_MODEL_ID:
  NAME: your-vision-model
  TYPE: ollama # or sagemaker

For Embeddings:

RAG:
  EMBED_MODEL_ID:
    NAME: mxbai-embed-large
    TYPE: ollama

The controllers (llm_controller.py, vision_model_controller.py, embeddings_controller.py) handle all provider-specific initialization automatically.

Adding New Model Providers

Update the appropriate controller in models/:

def initialize_model(self):
    if self.model_type == "your_provider":
        # Your provider initialization
        self.model = YourProviderModel(...)

Add configuration in config.yaml

🔧 Ollama Utilities (Optional)

The bash/ directory contains helper scripts for Ollama users on macOS and Linux.

Ollama Environment Setup (macOS)

Sets Ollama performance environment variables at boot and launches the Ollama app:

# Variables set: OLLAMA_FLASH_ATTENTION=1, OLLAMA_KV_CACHE_TYPE=q8_0, OLLAMA_NUM_GPU=999

Setup:

Edit bash/ollama-env-mac/ollama.environment.plist (no changes needed for defaults)
Copy to LaunchAgents:

cp bash/ollama-env-mac/ollama.environment.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/ollama.environment.plist

VRAM Cleaner

Automatically unloads idle Ollama models from VRAM to free GPU memory. Useful when running multiple models or when GPU memory is limited.

macOS (LaunchAgent, runs every 60 seconds):

Edit bash/ollama-freeup-vram/com.ollama.vramcleaner.plist:
- Replace <PATH_TO_FOLDER> with the actual path to this repository
- Replace <PATH_TO_USER_HOME> with your home directory
Install:

cp bash/ollama-freeup-vram/com.ollama.vramcleaner.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.ollama.vramcleaner.plist

Linux (systemd):

Edit bash/ollama-freeup-vram/ollama-vram-cleaner.service:
- Replace <PATH_TO_FOLDER> with the actual path
Install:

sudo cp bash/ollama-freeup-vram/ollama-vram-cleaner.service /etc/systemd/system/
sudo systemctl enable ollama-vram-cleaner
sudo systemctl start ollama-vram-cleaner

See bash/ollama-freeup-vram/README.md and bash/ollama-env-mac/README.md for more details.

🐛 Troubleshooting

Common Issues

MCP Connection Errors

Verify Python path in client.py matches your environment
Check server path is correct
Ensure all dependencies are installed (pip install -r requirements.txt)

Model Loading Issues

Verify model name and type in config.yaml
For Ollama: Ensure Ollama is running (ollama serve) and model is pulled (ollama pull model-name)
For AWS Bedrock: Check credentials (aws sts get-caller-identity), region, and model access
For OpenAI: Ensure OPENAI_API_KEY environment variable is set

RAG / Episodic Memory Not Working

Ensure ENABLE_RAG: true (or ENABLE_EPISODIC_MEMORY: true) in config
Verify embedding model is configured and available (RAG.EMBED_MODEL_ID in config)
For Ollama embeddings: ensure the embedding model is pulled (ollama pull mxbai-embed-large)
Check logs for "fallback embeddings" warnings — this means the real model is unreachable
Verify documents are being indexed with list_documents()

Permission Errors

Ensure write permissions for ~/.mnemoai/
Ensure write permissions for ~/.mnemoai/ (the app home: config, plans, tasks, per-profile state)
Check file paths in configuration

Import Errors on Startup

Some dependencies (chromadb, faiss-cpu, crawl4ai) can be tricky to install. Check platform-specific instructions.
On Apple Silicon: faiss-cpu may require pip install faiss-cpu --no-cache-dir

Logging

Logs are output to stderr with configurable level:

LOG_LEVEL=DEBUG mnemoai  # Detailed logs
LOG_LEVEL=INFO mnemoai   # Normal logs (default)

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

This is a personal development project. If you'd like to use or extend it, feel free to fork the repository and adapt it to your needs!

If you use this code in your own projects, attribution to the original repository is appreciated but not required.

🙏 Acknowledgments

Built with LangGraph and LangChain
Uses FastMCP for Model Context Protocol
Powered by Ollama, Amazon Bedrock, and Amazon SageMaker AI

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.21

Jun 25, 2026

0.8.20

Jun 25, 2026

0.8.19

Jun 25, 2026

0.8.18

Jun 25, 2026

0.8.17

Jun 25, 2026

0.8.16

Jun 24, 2026

0.8.15

Jun 24, 2026

0.8.14

Jun 24, 2026

0.8.13

Jun 24, 2026

0.8.12

Jun 24, 2026

0.8.11

Jun 24, 2026

0.8.10

Jun 23, 2026

0.8.9

Jun 23, 2026

0.8.8

Jun 22, 2026

0.8.7

Jun 22, 2026

0.8.6

Jun 22, 2026

0.8.4

Jun 22, 2026

0.8.3

Jun 22, 2026

0.8.2

Jun 22, 2026

0.8.1

Jun 22, 2026

0.8.0

Jun 22, 2026

0.7.0

Jun 19, 2026

0.6.1

Jun 19, 2026

0.6.0

Jun 19, 2026

0.5.2

Jun 19, 2026

0.5.1

Jun 19, 2026

0.5.0

Jun 19, 2026

0.4.0

Jun 19, 2026

0.3.0

Jun 19, 2026

0.2.1

Jun 19, 2026

This version

0.2.0

Jun 19, 2026

0.1.0

Jun 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mnemoai_assistant-0.2.0.tar.gz (29.2 MB view details)

Uploaded Jun 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mnemoai_assistant-0.2.0-py3-none-any.whl (219.5 kB view details)

Uploaded Jun 19, 2026 Python 3

File details

Details for the file mnemoai_assistant-0.2.0.tar.gz.

File metadata

Download URL: mnemoai_assistant-0.2.0.tar.gz
Upload date: Jun 19, 2026
Size: 29.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for mnemoai_assistant-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`d9ff8478e34ed6f213bdab3ed90ef8d3fe7a11e205a48f4a1fec671437125931`
MD5	`7c02ada7423c3546a2a1f86b1e2536a2`
BLAKE2b-256	`4cb985a85ed6d974ff8acddddd3ab86a0bb0e19a26f9d8c7c69d7b03f363b85f`

See more details on using hashes here.

File details

Details for the file mnemoai_assistant-0.2.0-py3-none-any.whl.

File metadata

Download URL: mnemoai_assistant-0.2.0-py3-none-any.whl
Upload date: Jun 19, 2026
Size: 219.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for mnemoai_assistant-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c074a14a2e0da3806f243faf28fdbc89919175a5601812efdb0e361592bcd1e9`
MD5	`b1a9cf63d8fa7b8536fb9789a159ea09`
BLAKE2b-256	`8bbb0d6d72148b697abbd4e233c2a9cb8c6615596a490fe488b7976b9f8bcd99`

See more details on using hashes here.

mnemoai-assistant 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Mnemo AI

📑 Table of Contents

✨ Key Features

📖 Project Structure

🏗️ Architecture

High-Level Overview

🚀 Quick Start

Prerequisites

Installation

Recommended: install from PyPI

Install from a checkout

Option 1: install as a CLI command (uv tool install)

Option 2: run from a checkout

🔀 Feature Toggles

💡 Usage

Basic Chat

Commands

Keyboard Shortcuts

Verbose Mode

Component Breakdown

1. Client Layer (client/)

2. Server Layer (server/)

3. Models Layer (models/)

4. Utils Layer (utils/)

Data Flow

Session Management

Context Compaction

🚀 Productivity Tools

📋 Todo List Management

🔎 Fast Search Tools

Glob Search (File Names)

Grep Search (File Content)

✏️ Precise File Editing

🛡️ Enhanced Error Handling

📁 File Write Confirmation

🛡️ Git Safety

📝 Plan Mode

🔄 Background Tasks

🔧 Configuration

Model Configuration

Amazon Bedrock

Amazon Bedrock Mantle

Ollama (Local)

OpenAI

Amazon SageMaker AI

LiteLLM (100+ Providers)

Vision Model Configuration

Model Parameters

Identity, connection & auth

Inference parameters

General Parameters

Embeddings Configuration

LLM Interaction Configuration

System Prompt

RAG Configuration

Episodic Memory Configuration

Embeddings Model

📚 Advanced Features

Query Routing

Orchestrator-Workers

Web Search Configuration

Web Crawler Configuration

RAG (Retrieval-Augmented Generation)

User Profile Learning

Episodic Memory

ACE Playbook (Agentic Context Engineering)

Training Data Collection

Supervised Fine-Tuning (SFT)

📦 Dependencies

🛠️ Development

Testing

Option 1: install as a CLI command (`uv tool install`)

1. Client Layer (`client/`)

2. Server Layer (`server/`)

3. Models Layer (`models/`)

4. Utils Layer (`utils/`)