Mnemo AI โ a local agentic AI assistant (LangGraph + MCP) that learns and remembers, with multi-provider model support.
Project description
Mnemo AI
A local agentic AI assistant with MCP (Model Context Protocol) integration, RAG capabilities, and intelligent conversation management. Built on LangGraph with LangChain for multi-provider LLM support (Ollama, Amazon Bedrock, OpenAI, Amazon SageMaker AI, LiteLLM).
๐ Table of Contents
- โจ Key Features
- ๐ Project Structure
- ๐๏ธ Architecture
- ๐ Quick Start
- ๐ Feature Toggles
- ๐ก Usage
- ๐ Productivity Tools
- ๐ง Configuration
- ๐ Advanced Features
- ๐ฆ Dependencies
- ๐ ๏ธ Development
- ๐ง Ollama Utilities (Optional)
- ๐ Troubleshooting
- ๐ License
- ๐ค Contributing
- ๐ Acknowledgments
โจ Key Features
- ๐ค Multi-Model Support: Ollama (local), Amazon Bedrock, Amazon SageMaker AI, LiteLLM (100+ providers)
- ๐ง MCP Tool System: Extensible tool architecture via Model Context Protocol
- ๐ RAG (Retrieval-Augmented Generation): Automatic document indexing and semantic search (if enabled)
- ๐ฌ Advanced Chat Interface: Multiline input, command system, conversation save/load
- ๐ง User Profile Learning: Automatic learning from interactions for personalized responses
- ๐งฉ Episodic Memory: Learns from successful task completions and retrieves similar solutions
- ๐ ACE Playbook: Learns strategies from successes AND failures via Agentic Context Engineering
- ๐ Training Data Collection: Mark high-quality responses for SFT training
- ๐ Web Search: Integrated Brave Search API (if available)
- ๐ Web Crawler: Extract and index content from web pages
- ๐ผ๏ธ Vision Support: Image analysis with vision models (if available)
- ๐ File Operations: Read/write/edit with support for text, CSV, JSON, PDF, DOCX
- โ๏ธ Precise File Editing: Safe string replacement with validation and uniqueness checking
- ๐ Fast Search Tools: Glob pattern matching and ripgrep content search (10-100x faster)
- ๐ Todo Tracking: Multi-step task management with real-time progress updates
- โก Bash Execution: Direct shell command execution with intelligent error handling
- ๐ก๏ธ Git Safety: Protection against dangerous git operations with smart warnings
- ๐ Plan Mode: Implementation planning workflow for complex tasks
- ๐ Background Tasks: Run long operations in parallel without blocking
๐ Project Structure
mnemoai/ # repo root
โโโ pyproject.toml # Packaging + `mnemoai` CLI entry point
โโโ requirements.txt # Dependencies
โโโ README.md # This file
โโโ pytest.ini # Pytest configuration
โโโ requirements-dev.txt # Dev/test dependencies
โ
โโโ src/mnemoai/ # The single package (src layout)
โ โโโ __init__.py
โ โโโ __main__.py # `python -m mnemoai`
โ โโโ main.py # Entry point (cli())
โ โ
โ โโโ client/ # Client layer
โ โ โโโ client.py # LangGraphClient facade (lifecycle, MCP, query)
โ โ โโโ mcp_tool_wrapper.py # MCP to LangChain tool adapter
โ โ โโโ agent/ # Agent loop
โ โ โ โโโ agent.py # LangGraph StateGraph agent with streaming
โ โ โ โโโ router.py # Query classifier and routing
โ โ โ โโโ orchestrator.py # Task decomposition and worker orchestration
โ โ โ โโโ reasoning_utils.py # Reasoning/thinking helpers for aux LLM calls
โ โ โโโ ui/ # User interface
โ โ โ โโโ chat_interface.py # Chat loop
โ โ โ โโโ spinner.py # Loading animations
โ โ โโโ managers/ # Business logic
โ โ โ โโโ agent_conversation_manager.py # Conversation state and token tracking
โ โ โ โโโ user_profile_manager.py # User profiling and learning
โ โ โโโ memory/ # Memory systems
โ โ โโโ episodic_memory.py # Episodic memory manager
โ โ โโโ reflector.py # ACE Reflector - extracts strategies
โ โ โโโ playbook_store.py # ACE Playbook - stores learned strategies
โ โ โโโ faiss_store.py # FAISS episodic store
โ โ โโโ chroma_store.py # ChromaDB episodic store
โ โ
โ โโโ server/ # MCP server layer
โ โ โโโ server.py # FastMCP server (run as a subprocess)
โ โ โโโ error_handler.py # @tool_error_handler decorator (shared)
โ โ โโโ tools/ # Tool implementations
โ โ โโโ tools_manager.py # Tool registration
โ โ โโโ fs_read.py / fs_write.py / file_edit.py / file_search.py
โ โ โโโ execute_bash.py / git_safety.py / todo_manager.py / plan_mode.py
โ โ โโโ background_tasks.py / web_crawler.py / web_search.py
โ โ โโโ describe_image.py / rag_tool.py
โ โ โโโ rag/ # RAG system (session, vector_store_controller, stores)
โ โ โโโ readers/ # File readers (csv/json/pdf/docx/line/dir/search + chunking)
โ โ
โ โโโ models/ # Model layer
โ โ โโโ provider_params.py # Single source of truth: per-provider config keys
โ โ โโโ mantle_factory.py # Bedrock Mantle model factory (multi-protocol)
โ โ โโโ controllers/ # Provider-dispatching controllers
โ โ โ โโโ base_model_controller.py # Minimal shared base
โ โ โ โโโ llm_controller.py # LLM initialization
โ โ โ โโโ vision_model_controller.py # Vision model initialization
โ โ โ โโโ embeddings_controller.py # Embeddings initialization
โ โ โโโ chat_models/ # Concrete LangChain ChatModel subclasses
โ โ โโโ chat_ollama_wrapper.py # Ollama model with penalty support
โ โ โโโ sagemaker_chat.py # SageMaker ChatModel for LangChain
โ โ
โ โโโ utils/ # Utilities
โ โโโ config.py # Config loader
โ โโโ configurator.py # First-run setup + /config & /model flows
โ โโโ paths.py # Central path helper (~/.mnemoai)
โ โโโ logger.py # Logging utilities
โ โโโ bm25.py # Lightweight BM25 (hybrid search)
โ โโโ config.yaml.example # Config templates (also .bedrock / .bedrock.mantle)
โ โโโ formatting/ # Text formatting (code/url/response)
โ
โโโ tests/ # Test suite (pytest)
โ โโโ conftest.py # Puts src/ on sys.path
โ โโโ unit/ # Fast, deterministic, no deps
โ โโโ integration/ # Live agent + Ollama + MCP
โ
โโโ docs/ # ARCHITECTURE.md (detailed file map)
โโโ bash/ # Helper scripts
โโโ system-command-app/ # `mnemoai` wrapper script
โโโ ollama-freeup-vram/ # VRAM management
โโโ ollama-env-mac/ # Ollama config
๐๏ธ Architecture
High-Level Overview
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ main.py โ
โ (Application Entry) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ
โ LangGraphClient โโโโโโโโโโโโโบโ MCP Server โ
โ (client.py) โ โ (server.py) โ
โโโโโโโโโโฌโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโโ
โ โ
โโโโโโดโโโโโโ โผ
โ โ โโโโโโโโโโโโ
โผ โผ โ Tools โ
โโโโโโโโโโ โโโโโโโโโโโโ โโโโโโฌโโโโโโ
โ UI โ โ Managers โ โ
โโโโโโโโโโ โโโโโโโโโโโโ โโโโโโดโโโโโ
โ โ โ โ
โโโโโโฌโโโโโโ โผ โผ
โผ โโโโโโโโโโโโ โโโโโโโ
โโโโโโโโโโโโ โ Readers โ โ RAG โ
โLangGraph โ โโโโโโโโโโโโ โโโโโโโ
โ Agent โ
โโโโโโโโโโโโ
๐ Quick Start
Prerequisites
Required:
- Python 3.11+
- At least one LLM provider configured and accessible (see below)
LLM Providers (choose at least one):
| Provider | Requirements |
|---|---|
| Ollama (local, recommended for getting started) | Install Ollama, then pull a model: ollama pull qwen3:4b |
| Amazon Bedrock | AWS CLI configured (aws configure) with Bedrock access in your region |
| Amazon SageMaker AI | AWS CLI configured with a deployed SageMaker endpoint |
| OpenAI | Set OPENAI_API_KEY environment variable |
| LiteLLM | Depends on the underlying provider (see LiteLLM docs) |
Optional:
- ripgrep โ 10-100x faster content search (see installation below)
- Embedding model โ Required if you enable RAG, Episodic Memory, or ACE Playbook (see Feature Toggles)
- Vision model โ Required for image analysis (
describe_imagetool) - Brave Search API key โ Required for web search (get one here)
Installation
Recommended: install from PyPI
The published package is mnemoai-assistant (the import name and the CLI command are both mnemoai). No clone needed โ install it into an isolated environment and get the mnemoai command on your PATH:
uv tool install mnemoai-assistant # or: pipx install mnemoai-assistant
Or into the current environment with pip:
pip install mnemoai-assistant
Then configure a user config (see step 4 below) and run:
mnemoai # verbose (shows thinking)
mnemoai --no-verbose
To upgrade: uv tool upgrade mnemoai-assistant (or pip install -U mnemoai-assistant). To remove: uv tool uninstall mnemoai-assistant.
This is the best choice if you just want to use the assistant. Install from a checkout (below) instead if you plan to edit the source.
Install from a checkout
- Clone the repository:
git clone https://github.com/brunopistone/mnemoai.git
cd mnemoai
- Install the assistant (choose one):
Option 1: install as a CLI command (uv tool install)
This installs the project into its own isolated environment and puts mnemoai on your PATH, so you can run it from any directory (macOS and Linux) without activating anything:
uv tool install . # or: pipx install .
Then configure a user config (see step 4) and run:
mnemoai # verbose (shows thinking)
mnemoai --no-verbose
To upgrade after pulling changes: uv tool install --force .. To remove: uv tool uninstall mnemoai.
Pick "run from a checkout" below instead if you plan to actively edit the code, since that runs your working tree directly with no reinstall step.
Option 2: run from a checkout
Set up an environment (choose one), which lets you run the assistant directly from the repo while editing the source live. Because the code uses a src/ layout, run it as a module with src/ on the path:
PYTHONPATH=src python -m mnemoai # verbose
PYTHONPATH=src python -m mnemoai --no-verbose
(Or pip install -e . once, then just mnemoai.)
Option A: venv
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Option B: uv
uv venv
uv pip install -r requirements.txt
Option C: conda
conda create -n mnemoai python=3.11
conda activate mnemoai
pip install -r requirements.txt
Get the mnemoai command for a checkout install
So you don't have to cd into the repo every time, symlink the bundled wrapper script onto your PATH. It activates the project environment, then runs the app (PYTHONPATH=src python -m mnemoai):
chmod +x bash/system-command-app/mnemoai-wrapper.sh
ln -sf "$(pwd)/bash/system-command-app/mnemoai-wrapper.sh" /usr/local/bin/mnemoai
Now mnemoai works from any directory and always reflects your latest edits. The wrapper auto-activates a project-local .venv (Options A and B) if present, otherwise it falls back to a conda env named mnemoai (Option C) โ edit the script if your environment differs.
- Install ripgrep (optional but recommended for fast search):
Ripgrep provides 10-100x faster content search than traditional grep. Required for grep_search tool.
macOS:
brew install ripgrep
Ubuntu/Debian:
sudo apt install ripgrep
Fedora/RHEL:
sudo dnf install ripgrep
Windows (via Chocolatey):
choco install ripgrep
From source:
cargo install ripgrep
Verify installation:
rg --version # Should show ripgrep version
If ripgrep is not installed, the assistant will automatically fall back to using execute_bash with standard grep, but performance will be significantly slower.
- Configure the application:
First-run setup (easiest). If you start the assistant and no config is found, an interactive configurator runs automatically. It walks you through: the LLM provider (Ollama / Bedrock / Mantle / OpenAI / Amazon SageMaker AI / LiteLLM) plus chat model, connection details (Ollama host/port; AWS region; for Mantle the API protocol โ chat_completions / responses / anthropic; SageMaker region + input format; LiteLLM API base/key; OpenAI uses OPENAI_API_KEY), optional max output tokens (blank or none uses the provider default), and a mandatory max context window (defaults to 65536); the vision model (reusing the chat model's host/region, with its own Mantle protocol and optional max output tokens); your profile name; an optional Brave Search key; and each feature toggle (RAG, episodic memory, ACE playbook, web crawler, query routing, orchestration, user profiling). Every prompt is pre-filled with the template's default, so you can press Enter through the ones you don't care about. It then writes a ready-to-use ~/.mnemoai/config.yaml from the matching template. Just run:
mnemoai # or, from a checkout: PYTHONPATH=src python -m mnemoai
and follow the prompts. You can re-edit the generated file any time to fine-tune models, prompts, and feature toggles.
Manual setup. Prefer to write it yourself? Copy a template (they live inside the package, under src/mnemoai/utils/):
cp src/mnemoai/utils/config.yaml.example src/mnemoai/utils/config.yaml
Edit that config.yaml with your settings. This file is git-ignored to protect your API keys. At minimum, configure your LLM provider.
The config file is resolved in this order (first match wins):
$MNEMOAI_CONFIGโ explicit path (handy for switching between provider configs)~/.mnemoai/config.yamlโ user config used by the installedmnemoaicommand<package>/utils/config.yamlโ package-relative fallback (used when running from a checkout)
If you installed the CLI with uv tool install (the recommended option), put your config in the user location instead:
mkdir -p ~/.mnemoai
cp src/mnemoai/utils/config.yaml.example ~/.mnemoai/config.yaml
# or, for Bedrock / Mantle:
# cp src/mnemoai/utils/config.yaml.bedrock.example ~/.mnemoai/config.yaml
# cp src/mnemoai/utils/config.yaml.bedrock.mantle.example ~/.mnemoai/config.yaml
At minimum, configure your LLM provider:
For Ollama (quickest setup):
# Pull a model first
ollama pull qwen3:4b
# utils/config.yaml (minimal)
MODEL_ID:
NAME: qwen3:4b
TYPE: ollama
HOST: localhost
PORT: 11434
TEMPERATURE: 0.6
# Profile name (used for session data isolation)
PROFILE:
NAME: default
# Everything else can be left at defaults or disabled
ENABLE_RAG: false
ENABLE_EPISODIC_MEMORY: false
ENABLE_PLAYBOOK: false
ENABLE_WEB_SEARCH: false
ENABLE_WEB_CRAWL: false
See Configuration for all options and Feature Toggles for enabling advanced features.
- Run the assistant:
If you installed with uv tool install (recommended), run the command from anywhere:
mnemoai
If you set up a checkout and symlinked the wrapper, the same command works. Otherwise, run it from the repo directory:
PYTHONPATH=src python -m mnemoai
See bash/system-command-app/README.md for details on the wrapper script.
๐ Feature Toggles
All advanced features can be independently enabled or disabled in your local utils/config.yaml (copied from config.yaml.example). Here is a quick reference:
| Feature | Config Key | Default | Dependencies |
|---|---|---|---|
| RAG (document indexing & search) | ENABLE_RAG: true |
true |
Embedding model (RAG.EMBED_MODEL_ID) |
| Episodic Memory (learn from past tasks) | ENABLE_EPISODIC_MEMORY: true |
true |
Embedding model (RAG.EMBED_MODEL_ID) |
| ACE Playbook (learn strategies from success/failure) | ENABLE_PLAYBOOK: true |
true |
None (embeddings optional for refinement) |
| User Profiling (personalized responses) | PROFILE.USE_PROFILING: true |
true |
Activates after 5+ interactions |
| Web Search | ENABLE_WEB_SEARCH: true |
true |
BRAVE_API_KEY configured |
| Web Crawler | ENABLE_WEB_CRAWL: true |
true |
None |
| Vision (image analysis) | Configure VISION_MODEL_ID |
Disabled if not set | Vision-capable model |
| Verbose Mode (show thinking process) | CLI flag --no-verbose |
Enabled | Supported by model |
Dependency note: RAG, Episodic Memory, and ACE Playbook refinement all require a working embedding model. If the embedding model is unavailable, the system falls back to SHA256-based deterministic embeddings with degraded semantic search quality. Configure RAG.EMBED_MODEL_ID in config.yaml to use a real embedding model (see Embeddings Model).
๐ก Usage
Basic Chat
Simply type your questions and press Enter. The assistant will respond using available tools when needed.
You: What files are in the current directory?
Assistant: [Uses fs_read tool to list directory contents]
You: Read the README.md file
Assistant: [Uses fs_read tool and displays content]
Commands
| Command | Description |
|---|---|
/exit or /quit |
Exit the application |
/clear |
Clear conversation history and RAG index |
/save |
Save current conversation |
/load <path> |
Load a saved conversation |
/good |
Mark last response as good (for SFT training) |
/compact [focus] |
Summarize older turns to shrink context (optional focus instructions) |
/config |
Re-run the interactive configurator (overwrites config.yaml, then restarts the app in place to apply) |
/model |
Override just one model โ chat (LLM), vision, or embeddings โ leaving the rest of config.yaml untouched, then restart in place |
/params |
Tune a model's inference parameters (temperature, top_p, top_k, penalties, reasoning, stop, stream, โฆ) โ only the params the chosen provider supports are offered, then restart in place |
Keyboard Shortcuts
Ctrl+J: Insert new line in inputEnter: Submit messageCtrl+C: Interrupt operation (press twice to exit)
Verbose Mode
Control thinking process visibility:
mnemoai # Verbose mode (shows thinking)
mnemoai --no-verbose # Hide thinking process
# from a checkout: PYTHONPATH=src python -m mnemoai [--no-verbose]
Component Breakdown
1. Client Layer (client/)
The client manages the conversation flow and user interaction.
client.py: Core LangGraph client- Initializes MCP connection
- Manages conversation state
- Handles model configuration
- Coordinates managers (profile, conversation)
agent.py: LangGraph agent implementation- State graph with agent and tools nodes
- Streaming support with reasoning display
- Code syntax highlighting
router.py: Query classifier and routing- Classifies queries into categories (simple_qa, code, research, knowledge, full)
- Routes each category to a specialized tool subset
- Configurable classifier prompt via
ROUTING_PROMPTin config
orchestrator.py: Task decomposition and worker orchestration- Decomposes complex tasks into ordered subtasks with category assignments
- Configurable orchestrator and aggregator prompts via config
reasoning_utils.py: Shared reasoning/thinking helpers- Temporarily disables reasoning for auxiliary LLM calls (routing, task decomposition) so output lands in the response content
- Extracts visible text from
<think>tags and Bedrock thinking blocks
mcp_tool_wrapper.py: MCP to LangChain adapter- Wraps MCP tools as LangChain BaseTool
- Handles async/sync conversion
ui/: User interface componentschat_interface.py: Interactive chat loop with command handlingspinner.py: Loading animations
managers/: Business logicagent_conversation_manager.py: Conversation state and token trackinguser_profile_manager.py: Automatic user profiling and learning
2. Server Layer (server/)
MCP server that provides tools to the LLM.
server.py: FastMCP server initializationerror_handler.py:@tool_error_handlerdecorator (shared by all tools)tools/: Tool implementationstools_manager.py: Centralized tool registration and utilitiesfs_read.py: File reading (text, CSV, JSON, PDF, DOCX)fs_write.py: File writing with mandatory user confirmation (dry-run preview)file_edit.py: Precise string replacement with validation and uniqueness checkingexecute_bash.py: Shell command execution with intelligent error handlingfile_search.py: Fast file/content search (glob patterns + ripgrep)todo_manager.py: Todo list management for multi-step tasksweb_search.py: Brave Search integrationweb_crawler.py: Web page content extraction with RAG integrationdescribe_image.py: Vision model image analysisrag_tool.py: RAG tools registrationrag/: RAG systemsession.py: Session-scoped RAG management with hybrid searchvector_store_controller.py: Vector store abstraction layerfaiss_store.py: FAISS vector store implementationchroma_store.py: ChromaDB vector store implementation
readers/: Specialized file readersline_reader.py,directory_reader.py,search_reader.pycsv_reader.py,json_reader.pypdf_reader.py,docx_reader.pychunking_helper.py: Document chunking for RAG
3. Models Layer (models/)
Model controllers and custom implementations.
provider_params.py: Single source of truth for the config keys each provider consumes (per modality); controllers build their client kwargs from it viabuild_kwargs, and/modelprunes unsupported keys from itmantle_factory.py: Bedrock Mantle factory (chat_completions / responses / anthropic protocols), shared by the LLM and vision controllerscontrollers/(provider-dispatching model initialization):base_model_controller.py: Minimal shared base type for the controllersllm_controller.py: LLM model initialization (Bedrock, Mantle, Ollama, OpenAI, SageMaker AI, LiteLLM)vision_model_controller.py: Vision model initializationembeddings_controller.py: Embedding model initialization for RAG
chat_models/(concrete LangChainChatModelsubclasses):chat_ollama_wrapper.py: Extends ChatOllama withpresence_penaltyandfrequency_penaltysupportsagemaker_chat.py: Full LangChainBaseChatModelfor SageMaker endpoints (streaming, tool calling, reasoning)
4. Utils Layer (utils/)
Shared utilities and configuration.
config.py: Configuration loaderconfigurator.py: First-run interactive setup (when no config resolves) and the/config(full reconfigure) and/model(override one model section) chat commandspaths.py: Central path helper โ single source of truth for the app home (~/.mnemoai, override with$MNEMOAI_HOME) and all runtime subdirectories (config, plans, tasks, per-profile, per-model)config.yaml.example: Configuration template (copy toconfig.yamland add your settings;.bedrockand.bedrock.mantlevariants also provided)bm25.py: Lightweight BM25 implementation for hybrid (semantic + keyword) searchlogger.py: Logging utilities (stderr output)formatting/: Text formattingcode_formatter.py: Code syntax highlightingurl_formatter.py: URL highlightingresponse_parser.py: Response processing
Data Flow
- User Input โ
ChatInterfaceโLangGraphClient - Client โ Invokes LangGraph agent with MCP tools
- Classifier โ Routes query to a category (simpleqa, code, research, knowledge, full) (_if routing enabled)
- Orchestrator โ For
fulltasks: decomposes into subtasks, spawns workers, aggregates results (if orchestration enabled) - LangGraph โ Executes agent node with route-specific tools, decides to use tools
- MCP Server โ Executes tool (e.g., fs_read, web_search, RAG)
- Tool Result โ Returned to agent via tools node
- LangGraph โ Continues agent loop until response complete
- Response โ Displayed to user via
ChatInterface
Session Management
Each chat session has a unique ID used for:
- RAG document indexing (session-scoped)
- Chunk caching for file summarization
- Training data collection (SFT markers)
Session data is stored in ~/.mnemoai/{profile_name}/:
~/.mnemoai/
โโโ {profile_name}/
โโโ conversations/ # Saved conversations
โโโ profiles/ # User profiles
โโโ todos/ # Todo list data
โโโ rag_session_id.txt # Current RAG session
โโโ rag_store_*.faiss # FAISS vector index (or ChromaDB directory)
โโโ chunk_cache_*.db # SQLite chunk cache
โโโ models/ # Per-model memory (isolated by chat model)
โโโ {sanitized_model}/ # e.g. global.anthropic.claude-fable-5
โโโ episodic_memory/ # Episodic memory store (FAISS or ChromaDB)
โโโ playbook/ # ACE playbook strategies and metrics
Model-scoped memory: episodic memory and the playbook live under
models/{model}/so trying a different chat model doesn't contaminate the memory/strategies learned with another. Conversations, todos, RAG, and the user profile remain shared across models.
Context Compaction
To keep long conversations within the model's context window, the assistant compacts history by summarizing it:
- Automatic โ after a turn pushes the conversation past
MAX_CONVERSATION_TOKENS, older messages are summarized into the system prompt while the most recentLLM.KEEP_RECENT_MESSAGESturns are kept verbatim. - Manual โ run
/compactany time (optionally/compact <focus instructions>to steer what the summary emphasizes). Manual compaction keeps a smaller recent window (LLM.MANUAL_COMPACT_KEEP_RECENT).
The kept-verbatim window is bounded by both a message count and a token budget (LLM.KEEP_RECENT_TOKEN_BUDGET, default 25% of MAX_CONVERSATION_TOKENS). Walking newestโoldest, a message that would exceed the budget is summarized instead of kept โ so a single oversized recent message (e.g. a pasted document that alone fills the context window) cannot survive compaction verbatim.
The summary preserves topics, decisions, and tool calls/results (which tools ran, their inputs, and outcomes), so the agent retains actionable context after compacting.
๐ Productivity Tools
The assistant includes specialized tools for efficient code and file manipulation:
๐ Todo List Management
Track multi-step tasks with automatic status management:
Tools:
todo_write(todos): Update the todo listtodo_read(): View current todostodo_clear(): Clear all todos
Features:
- Three states:
pending,in_progress,completed - Enforces exactly ONE task in progress at a time
- Real-time progress tracking
- Stored in
~/.mnemoai/{profile}/todos/current_todos.json
Usage Example:
You: Implement user authentication
Assistant: [Creates todos for: database setup, API endpoints, frontend integration, testing]
Assistant: [Marks first todo as in_progress]
Assistant: [Completes each step, updating todos in real-time]
๐ Fast Search Tools
High-performance file and content searching:
Glob Search (File Names)
Find files by name patterns:
glob_search(pattern="**/*.py") # All Python files recursively
glob_search(pattern="src/**/*.ts", max_results=100) # TypeScript in src/
glob_search(pattern="test_*.py", sort_by_mtime=False) # Unsorted for speed
Parameters:
pattern: Glob pattern (e.g.,**/*.py,*.{yaml,json})path: Directory to search (default: current directory)max_results: Limit results (default: 1000, use 0 for unlimited)sort_by_mtime: Sort by modification time (default: True)
Performance: Best for project/codebase searches. For system-wide searches (entire home directory), the assistant automatically uses find command instead.
Grep Search (File Content)
Search within file contents using ripgrep:
grep_search(pattern="class Foo") # Find class definitions
grep_search(pattern="TODO|FIXME", file_pattern="*.py", case_insensitive=True)
grep_search(pattern="import React", output_mode="content") # Show matched lines
Parameters:
pattern: Regex pattern to search forpath: Directory to search (default: current directory)file_pattern: Filter by file type (e.g.,*.py,*.{ts,tsx})case_insensitive: Case-insensitive search (default: False)output_mode:files_with_matches(default),content, orcountcontext_lines: Lines of context around matchesmax_results: Maximum matches per file (default: 100)
Requirements: Requires ripgrep installed (see Installation section)
Performance: 10-100x faster than traditional grep for large codebases.
โ๏ธ Precise File Editing
Safe string replacement with validation:
file_edit(
file_path="/path/to/file.py",
old_string="def old_function():\n pass",
new_string="def new_function():\n return True",
replace_all=False # Requires uniqueness (default)
)
Safety Features:
- Validates file exists before editing
- Checks that
old_stringexists in file - Enforces uniqueness (prevents accidental multiple replacements)
- Provides detailed error messages with troubleshooting steps
- Returns line count changes
Best Practice Workflow:
- Read the file first with
fs_read - Copy the EXACT text you want to replace (including whitespace)
- Create the new version with your changes
- Call
file_editwith exact strings
Error Handling: If the string isn't unique, the tool provides the line numbers where it appears so you can add more context.
๐ก๏ธ Enhanced Error Handling
All tools now provide intelligent error messages with troubleshooting guidance:
Example Error Response:
{
"error": true,
"error_type": "FileNotFoundError",
"message": "File or directory not found: /path/to/file.txt",
"next_steps": [
"Verify the file path is correct",
"Use glob_search to find files by pattern",
"Check with execute_bash('ls -la /parent/dir')",
"Ensure you have read permissions"
],
"original_error": "..."
}
Handled Error Types:
- FileNotFoundError
- PermissionError
- IsADirectoryError
- JSONDecodeError
- Encoding errors
- Command execution errors
- Timeout errors
๐ File Write Confirmation
fs_write now requires mandatory user confirmation:
Two-Step Process:
- Preview (dry_run=True): Shows what will happen
- Confirm: User explicitly approves
- Execute (confirmed=True): Actually performs the operation
This prevents accidental file overwrites and gives users control over file system modifications.
๐ก๏ธ Git Safety
Safe git operations with protection against common mistakes:
Tools:
git_safe(command="...")- Execute git commands with safety checksgit_status_safe()- Comprehensive status with warningsgit_commit_safe(message="...", add_all=True)- Safe commits with staging
Protected Operations:
| Operation | Protection |
|---|---|
| Force push to main/master | Blocked |
git reset --hard |
Warning + confirmation required |
git push --force |
Warning (use --force-with-lease) |
git commit --amend |
Checks if already pushed |
Skip hooks (--no-verify) |
Warning |
Force delete branch (-D) |
Warning |
Example:
# Safe - uses git_safe with protections
git_safe(command="push origin feature-branch")
# Dangerous - requires confirmation
git_safe(command="reset --hard HEAD~1", allow_dangerous=True, reason="Discarding failed experiment")
๐ Plan Mode
Implementation planning workflow for complex tasks:
Workflow:
enter_plan_mode(task_description="Add user authentication")- Explore codebase with search tools
add_plan_step(step_number=1, title="Create user model", description="...")add_plan_file(file_path="models/user.py", action="create")add_plan_risk(risk="Migration needed", mitigation="Add migration script")present_plan()- Show user for approvalapprove_plan()+exit_plan_mode()- Start implementing
When to Use:
- New feature with multiple files
- Architectural decisions needed
- Multi-step refactoring
- Unclear requirements
Plan Storage: ~/.mnemoai/plans/current_plan.json
Task Output: ~/.mnemoai/tasks/
๐ Background Tasks
Run long operations in parallel without blocking:
Tools:
start_background_task(command="...", description="...")- Start taskget_task_status(task_id="...")- Check progressget_task_output(task_id="...")- Get outputlist_background_tasks()- See all taskscancel_background_task(task_id="...")- Stop taskwait_for_task(task_id="...", timeout_seconds=300)- Wait for completion
When to Use:
- Running full test suites
- Building large projects
- Installing dependencies
- Running linters on entire codebase
- Any command > 30 seconds
Example:
# Start tests in background
result = start_background_task(command="pytest", description="Running tests")
# Returns: {"task_id": "abc123", ...}
# Check status later
get_task_status(task_id="abc123")
# Get output when done
get_task_output(task_id="abc123", tail_lines=50)
Task Storage: Output logs saved to ~/.mnemoai/tasks/
๐ง Configuration
Model Configuration
The assistant supports multiple model types:
Amazon Bedrock
MODEL_ID:
NAME: us.amazon.nova-pro-v1:0
TYPE: bedrock
REGION: us-east-1
TEMPERATURE: 0.1
Note: Newer Claude models on Bedrock reject
temperatureas deprecated. OmitTEMPERATUREfor those โ it is only sent when explicitly configured.
Using a named AWS profile (Bedrock, SageMaker, Mantle). These providers use the standard boto3 credential chain (default profile / env vars / instance role). To select a specific named profile instead, set
AWS_PROFILEvia the configENV:section โ values there are exported as environment variables at startup, and boto3 picks them up automatically. No model-level config key is needed:ENV: AWS_PROFILE: my-bedrock-profile # AWS_REGION: us-east-1 # any AWS env var works here too
Using a Bedrock API key (instead of AWS credentials). Bedrock supports short-term API keys (a
bedrock-api-key-...value from the console). For standard Bedrock (TYPE: bedrock), set it asAWS_BEARER_TOKEN_BEDROCKโlangchain-awsreads it automatically, no model config needed:ENV: AWS_BEARER_TOKEN_BEDROCK: bedrock-api-key-XXXXXXXX(For Mantle, the same key is supplied differently โ see the Mantle section below.)
Amazon Bedrock Mantle
Bedrock Mantle is an OpenAI-compatible API (not the Bedrock Converse API). By default it authenticates with a short-lived bearer token minted from your standard AWS credentials via aws-bedrock-token-generator, so your normal aws configure / SSO setup works โ no extra keys to manage. Use TYPE: mantle and a bare model ID from the Mantle catalog.
MODEL_ID:
NAME: qwen.qwen3-32b # bare Mantle model id (e.g. anthropic.claude-opus-4-8)
TYPE: mantle
REGION: us-east-1
MAX_TOKENS: 8192
Authenticating with a Bedrock API key (no AWS credentials). Instead of minting a token, you can supply a short-term Bedrock API key directly. Mantle reads it from the BEDROCK_API_KEY environment variable (set it via the config ENV: section), or from a per-model API_KEY field. When a key is present it's used as-is; otherwise the app falls back to minting from AWS credentials. (Note: standard Bedrock uses AWS_BEARER_TOKEN_BEDROCK for the same key โ Mantle uses BEDROCK_API_KEY.)
# Option A โ environment variable (applies to all Mantle calls)
ENV:
BEDROCK_API_KEY: bedrock-api-key-XXXXXXXX
# Option B โ per-model key
MODEL_ID:
NAME: qwen.qwen3-32b
TYPE: mantle
REGION: us-east-1
API_KEY: bedrock-api-key-XXXXXXXX
API protocols. Mantle serves models under three protocols. Select with API_PROTOCOL (works for both chat and vision):
chat_completions(default) โ base/v1, OpenAI Chat Completions API. Most models (Qwen, Gemma, GPT-OSS, DeepSeek, โฆ).responsesโ base/openai/v1, OpenAI Responses API. Required by models that only expose Responses, such asopenai.gpt-5.4.anthropicโ base/anthropic, Anthropic Messages API. For Claude models (e.g.anthropic.claude-haiku-4-5).
# OpenAI Responses model (e.g. GPT-5.4)
MODEL_ID:
NAME: openai.gpt-5.4
TYPE: mantle
REGION: us-west-2 # gpt-5.4 is in us-west-2, not us-east-1
API_PROTOCOL: responses
MAX_TOKENS: 8192
# Anthropic Claude model
MODEL_ID:
NAME: anthropic.claude-haiku-4-5
TYPE: mantle
REGION: us-east-1
API_PROTOCOL: anthropic
MAX_TOKENS: 8192
ENDPOINT_URLis optional; it defaults tohttps://bedrock-mantle.<REGION>.api.aws/{v1 | openai/v1 | anthropic}depending on the protocol.- The Mantle catalog (Qwen, Mistral, DeepSeek, GLM, Gemma, Claude, GPT-5.4, โฆ) differs from standard Bedrock and varies by account/region.
TYPE: mantleworks for bothMODEL_ID(chat) andVISION_MODEL_ID(image description) โ vision-capable models likeqwen.qwen3-vl-235b-a22b-instructare supported.- Caveats: Pick the right
API_PROTOCOLper model (using the wrong one returns a 400 "does not support the '/v1/โฆ' API" error).anthropicrequires thelangchain-anthropicpackage (inrequirements.txt). Models likeanthropic.claude-fable-5also require the account's data-retention mode to beprovider_data_share, otherwise they reportunavailable.
For standard Bedrock (Converse API),
ENDPOINT_URLis also accepted onMODEL_ID/VISION_MODEL_IDwithTYPE: bedrockto override the default endpoint.
Ollama (Local)
MODEL_ID:
NAME: qwen3-4b-thinking-2507-q6-k:latest
TYPE: ollama
HOST: localhost
PORT: 11434
REPETITION_PENALTY: 1.1
PRESENCE_PENALTY: 1.5
TEMPERATURE: 0.1
TOP_P: 0.95
OpenAI
MODEL_ID:
NAME: gpt-5-mini-2025-08-07
TYPE: openai
STREAM: true
REASONING_EFFORT: medium
# Requires OPENAI_API_KEY environment variable
Amazon SageMaker AI
MODEL_ID:
NAME: your-endpoint-name
TYPE: sagemaker
REGION: us-east-1
REPETITION_PENALTY: 1.1
PRESENCE_PENALTY: 1.5
TEMPERATURE: 0.1
MAX_TOKENS: 4096
LiteLLM (100+ Providers)
MODEL_ID:
NAME: openai/your-model-name
TYPE: litellm
API_BASE: http://localhost:8000/v1
API_KEY: your-api-key
TEMPERATURE: 0.1
MAX_TOKENS: 4096
Vision Model Configuration
For Bedrock:
VISION_MODEL_ID:
NAME: global.anthropic.claude-haiku-4-5-20251001-v1:0
TYPE: bedrock
REGION: us-east-1
TEMPERATURE: 0.3
For Ollama:
VISION_MODEL_ID:
NAME: qwen3-vl:2b
TYPE: ollama
HOST: localhost
PORT: 11434
TEMPERATURE: 0.3
For OpenAI:
VISION_MODEL_ID:
NAME: gpt-5-mini-2025-08-07
TYPE: openai
STREAM: true
REASONING_EFFORT: medium
For SageMaker AI (endpoint must serve a vision-capable model accepting the OpenAI image format):
VISION_MODEL_ID:
NAME: your-endpoint-name
TYPE: sagemaker
REGION: us-east-1
INPUT_FORMAT: openai_chat
TEMPERATURE: 0.3
For LiteLLM (any of its vision-capable models):
VISION_MODEL_ID:
NAME: openai/gpt-4o # provider-prefixed model id
TYPE: litellm
API_BASE: http://localhost:4000 # optional (proxy / self-hosted)
API_KEY: your-api-key # optional (else the provider's env var)
Model Parameters
This is the full reference for what you can put under MODEL_ID,
VISION_MODEL_ID, and RAG.EMBED_MODEL_ID. Only NAME and TYPE are
required; everything else is optional and omitted keys fall back to the
provider/model default. The interactive configurator (/config, /model)
sets the common ones โ use this reference to hand-tune config.yaml for
anything else a provider or model supports.
Identity, connection & auth
| Parameter | Applies to TYPE |
Description |
|---|---|---|
NAME |
all (required) | Model id / Ollama model / Bedrock model id / Mantle bare id / SageMaker endpoint name |
TYPE |
all (required) | ollama, bedrock, mantle, openai, sagemaker, litellm (embeddings: ollama, bedrock, openai, sagemaker) |
HOST |
ollama |
Ollama host (default localhost) |
PORT |
ollama |
Ollama port (default 11434) |
REGION |
bedrock, mantle, sagemaker |
AWS region (default us-east-1) |
API_PROTOCOL |
mantle |
chat_completions (default), responses, or anthropic |
ENDPOINT_URL |
bedrock, mantle |
Override the default endpoint URL |
API_KEY |
mantle, litellm |
Mantle: Bedrock API key (else BEDROCK_API_KEY env / minted token). LiteLLM: provider key |
API_BASE |
litellm |
LiteLLM API base URL |
INPUT_FORMAT |
sagemaker |
openai_chat (default) or huggingface |
Standard Bedrock also reads the
AWS_BEARER_TOKEN_BEDROCKenv var, and all AWS providers honorAWS_PROFILEโ see the API-key/profile notes under Amazon Bedrock.
Inference parameters
Optional generation settings. The Honored by column lists the providers that
actually send each one (others ignore it). These apply to MODEL_ID and
VISION_MODEL_ID; EMBED_MODEL_ID takes none of them (embeddings only use
NAME/TYPE + connection).
This table is derived from models/provider_params.py โ the single source of
truth that the controllers build their client kwargs from โ so it reflects
exactly what each provider's init path forwards. (mantle reads
TEMPERATURE/MAX_TOKENS/TOP_P via the Mantle factory.)
| Parameter | Description | Honored by (MODEL_ID) |
|---|---|---|
MAX_TOKENS |
Max output tokens to generate | ollama, bedrock, mantle, openai, sagemaker, litellm |
TEMPERATURE |
Sampling temperature | ollama, bedrock, mantle, openai, sagemaker, litellm |
TOP_P |
Top-p (nucleus) sampling | ollama, bedrock, mantle, openai, sagemaker, litellm |
TOP_K |
Top-k sampling | ollama, sagemaker |
STOP |
Stop sequences (YAML list) | ollama, bedrock, sagemaker, litellm |
STREAM |
Stream tokens (default true) |
mantle, openai, litellm |
PRESENCE_PENALTY |
Presence penalty | ollama, openai |
FREQUENCY_PENALTY |
Frequency penalty | ollama |
REPETITION_PENALTY |
Repetition penalty | ollama, litellm |
REASONING |
Enable extended thinking (boolean) | bedrock |
THINKING_TOKENS |
Thinking token budget (default 2048) |
bedrock |
REASONING_EFFORT |
low/medium/high/max |
openai (also maps to Bedrock thinking budget) |
VISION_MODEL_ID supports the same six providers as MODEL_ID. It accepts a
subset of params: MAX_TOKENS/TEMPERATURE/TOP_P across providers, plus
TOP_K/STOP on ollama and sagemaker. Connection keys follow the provider
(host/port, region, Mantle protocol, SageMaker INPUT_FORMAT, LiteLLM
API_BASE/API_KEY).
Provider-appropriate tuning matters. Newer Claude and GPT models reject
TEMPERATUREoutright;STOP, penalties, andTOP_Kare largely Ollama/SageMaker concepts. When/modelswitches a section's provider it drops the keys the new provider doesn't consume for you, but for everything else editconfig.yamlto match what your specific provider/model accepts.
The context window is set separately, at the top level (it's not part of a model
section): MAX_CONVERSATION_TOKENS (see General Parameters below).
General Parameters
# Context window size (passed to model as num_ctx for Ollama)
MAX_CONVERSATION_TOKENS: 65536
# Maximum tokens when reading documents (CSV, JSON, text files)
DOC_MAX_TOKENS: 16384
# Profile configuration
PROFILE:
NAME: default # Used for session data isolation (~/.mnemoai/{NAME}/)
USE_PROFILING: true # Enable automatic user profiling
Embeddings Configuration
Embeddings settings are nested under the RAG section:
RAG:
EMBEDDINGS:
CACHE_ENABLED: true # LRU cache for embedding vectors (avoids re-embedding same text)
CACHE_SIZE: 1000 # Maximum cached embeddings
FALLBACK_ENABLED: true # Fall back to SHA256 if embedding model unavailable
FALLBACK_TYPE: "sha256" # Fallback type (sha256, random, zeros)
LLM Interaction Configuration
LLM:
ENABLE_THINKING: true # Enable thinking tags (verbose mode)
RETRY_ENABLED: true # Retry failed LLM calls
MAX_RETRIES: 3 # Maximum retry attempts
RETRY_DELAY: 1.0 # Seconds between retries
RETRY_BACKOFF: 2.0 # Exponential backoff multiplier
SUMMARIZATION_THINK: false # Include thinking in summarization
TOKEN_COUNTING:
OLLAMA_APPROXIMATION: 1.3 # Chars-to-tokens multiplier for Ollama
FALLBACK_MODEL: "gpt-4" # Tiktoken model for fallback counting
System Prompt
The system prompt in config.yaml defines the assistant's behavior. Customize the SYSTEM_PROMPT field to change the assistant's personality, instructions, and tool usage patterns. Key sections in the default prompt:
<identity>: Basic identity and core principles<reasoning_discipline>: Thinking rules and loop detection<output_format>: Response formatting requirements<information_sources>: RAG vs web vs internal knowledge decision tree<file_operations>: Read/write/edit workflow rules<search_tools>: Glob and grep usage guidance<git_operations>: Git safety rules<task_management>: Todo, plan mode, and background task rules<error_handling>: Error response guidelines<communication>: Style and security rules
RAG Configuration
ENABLE_RAG: true # Master toggle for RAG system
RAG:
MAX_TOKENS: 8192 # Threshold: documents above this are ingested into RAG
CHUNK_TOKENS: 1024 # Chunk size in tokens (recommended: 512-2048)
SEARCH:
SEMANTIC_WEIGHT: 0.5 # Semantic similarity weight (0-1)
KEYWORD_WEIGHT: 0.5 # BM25 keyword weight (0-1)
VECTOR_STORE:
TYPE: chromadb # Vector store backend: "faiss" or "chromadb"
EMBEDDINGS:
CACHE_ENABLED: true
CACHE_SIZE: 1000
FALLBACK_ENABLED: true
FALLBACK_TYPE: "sha256"
Requires: An embedding model configured via RAG.EMBED_MODEL_ID (see Embeddings Model).
Episodic Memory Configuration
ENABLE_EPISODIC_MEMORY: true
EPISODIC_MEMORY:
STORE_TYPE: chromadb # or faiss
# Similarity Thresholds
DUPLICATE_THRESHOLD: 0.95 # Higher = stricter duplicate detection
RETRIEVAL_THRESHOLD: 0.7 # Minimum similarity to retrieve episodes
FOLLOW_UP_THRESHOLD: 0.4 # Similarity to detect follow-up questions (skips injection)
REDUNDANCY_THRESHOLD: 0.5 # Filter episodes redundant with conversation
# Hybrid Search Weights
SEMANTIC_WEIGHT: 0.7 # Semantic similarity weight (0-1)
KEYWORD_WEIGHT: 0.3 # Keyword matching weight (0-1)
# Token and Size Limits
MAX_TOKENS_PER_EPISODE: 400 # Max tokens for episode text
MAX_EPISODES: 1000 # Maximum stored episodes
MAX_AGE_DAYS: 90 # Maximum episode age in days
# Success Detection
SUCCESS_MARKERS: # Phrases that indicate task success
- thanks
- perfect
- great
- worked
CORRECTION_MARKERS: # Phrases that indicate errors
- wrong
- error
- fix
- actually
# Storage Behavior
IMMEDIATE_STORAGE: true # Store episodes immediately
MIN_TOOLS_OR_LENGTH: 300 # Min response length if no tools used
# Query Enhancement
ENABLE_QUERY_EXPANSION: true # Expand queries with synonyms
QUERY_EXPANSION_TERMS: 3 # Max terms to add per query
Requires: An embedding model configured via RAG.EMBED_MODEL_ID (see Embeddings Model).
How it works:
- Automatically stores successful task completions with full conversation context
- Uses hybrid search (70% semantic + 30% BM25) to find similar past tasks
- Conversation-aware injection: Only injects episodic memory when relevant
- Detects follow-up questions and skips injection (uses conversation context instead)
- Filters out episodes redundant with current conversation
- Uses semantic similarity (with embeddings) or Jaccard similarity (fallback)
- Injects compact context showing: task โ tools used โ outcome
- Automatic cleanup: keeps max 1000 episodes, removes entries older than 90 days
Success detection:
- User feedback: "thanks", "perfect", "great"
- No error markers in response
- All tools executed successfully
- Filters out simple greetings and short responses
Embeddings Model
All embedding configuration is nested under RAG::
For Bedrock:
RAG:
EMBED_MODEL_ID:
NAME: amazon.titan-embed-text-v2:0
TYPE: bedrock
REGION: us-east-1
For Ollama:
RAG:
EMBED_MODEL_ID:
NAME: mxbai-embed-large
TYPE: ollama
HOST: localhost
PORT: 11434
For OpenAI:
RAG:
EMBED_MODEL_ID:
NAME: text-embedding-ada-002
TYPE: openai
For SageMaker:
RAG:
EMBED_MODEL_ID:
NAME: your-endpoint-name
TYPE: sagemaker
REGION: us-east-1
For LiteLLM (any of its 100+ providers via one OpenAI-style API):
RAG:
EMBED_MODEL_ID:
NAME: openai/text-embedding-3-small # provider-prefixed model id
TYPE: litellm
API_BASE: http://localhost:4000 # optional (proxy / self-hosted)
API_KEY: your-api-key # optional (else the provider's env var)
Vector Store Options:
- ChromaDB (default): Persistent vector database with built-in metadata support
- FAISS: Fast, in-memory vector search with disk persistence
Switch between stores by changing RAG.VECTOR_STORE.TYPE in config. The system uses a controller pattern, so all RAG functionality works identically regardless of the store.
๐ Advanced Features
Query Routing
When enabled, the assistant classifies each query before processing it and routes it to a specialized tool subset. This reduces noise for the model and improves response quality.
Categories:
| Route | Description | Tools Available |
|---|---|---|
simple_qa |
Greetings, explanations, general knowledge | None (direct LLM answer) |
code |
File ops, code editing, git, shell commands | fs_read, fs_write, file_edit, bash, git, search, etc |
research |
Web search, URL fetching | web_search, web_crawler |
knowledge |
Document reading, indexing, RAG queries | pdf/csv/docx/json readers, RAG tools, fs_read |
full |
Multi-category or ambiguous tasks | All tools (fallback) |
How it works:
- A lightweight LLM call classifies the query into one of the categories above
- The agent node binds only the tools for that category
- If a query spans multiple categories, it routes to
full(all tools) - The classifier prompt is customizable via
ROUTING_PROMPTinconfig.yaml
Configuration:
ENABLE_ROUTING: true
ROUTING_PROMPT: |
# Custom classifier prompt (optional, has a sensible default)
...
Orchestrator-Workers
When enabled alongside routing, tasks classified as full (spanning multiple categories) are automatically decomposed into focused subtasks executed by specialized workers.
How it works:
- Orchestrator: An LLM call decomposes the complex query into ordered subtasks, each assigned a category (code, research, knowledge, etc.)
- Workers: Each subtask is executed by a worker agent with only the tools for its category. Workers run sequentially โ each receives context from previously completed subtasks.
- Aggregator: If there were multiple subtasks, a final LLM call synthesizes all worker results into a single coherent response.
Example flow for "Read this PDF and write a summary to a file":
Orchestrator decomposes into:
[Step 1/2: Read and summarize the PDF document] โ knowledge worker
[Step 2/2: Write the summary to summary.md] โ code worker
[Synthesizing results...] โ aggregator
Configuration:
ENABLE_ROUTING: true # Required
ENABLE_ORCHESTRATION: true # Activates orchestrator for 'full' route
# ORCHESTRATOR_PROMPT: | # Optional: customize decomposition prompt
# AGGREGATOR_PROMPT: | # Optional: customize synthesis prompt
When orchestration is disabled, full routes use all tools in a single agent loop (the previous behavior). No regression.
Web Search Configuration
This tool uses the Brave Search API. Obtain an API key from Brave Search Developer Portal.
BRAVE_API_KEY: your-api-key-here # For web search
Web Crawler Configuration
Enable web page content extraction with automatic RAG integration:
ENABLE_WEB_CRAWL: true
When enabled, the web_crawler tool:
- Extracts content from web pages as markdown
- Automatically ingests large pages (>8K tokens) into RAG (if enabled)
- Uses the same chunking configuration as PDF/DOCX readers
Browser dependency. Crawling uses a headless Chromium via Playwright, whose browser binary is a separate ~260MB download not pulled in by
pip/uv tool install. The tool installs it automatically on the first crawl after a fresh install/upgrade. If that auto-install fails (e.g. offline), run it manually in the same environment:python -m playwright install chromium(for an installed CLI:~/.local/share/uv/tools/mnemoai/bin/python -m playwright install chromium).
RAG (Retrieval-Augmented Generation)
The RAG system automatically indexes documents for semantic search with hybrid search (semantic embeddings + BM25 keyword scoring).
How it works:
- Read a PDF/DOCX file โ Automatically chunked and indexed
- Ask questions โ Assistant searches indexed documents first using hybrid search
- Session-scoped โ Cleared on
/clearor exit
RAG Tools:
list_documents(): Show indexed documentssearch_in_documents(query, top_k): Hybrid semantic + BM25 searchclear_documents(): Clear RAG index
Configuration:
RAG.CHUNK_TOKENS: Chunk size (recommended: 512-2048)RAG.VECTOR_STORE.TYPE: Choose betweenfaissorchromadbRAG.SEARCH.SEMANTIC_WEIGHT/RAG.SEARCH.KEYWORD_WEIGHT: Configurable hybrid weights- Recursive chunking with 10% overlap
- Hybrid search: BM25 (Okapi BM25 with TF-IDF, term saturation, length normalization) + semantic similarity
- Independent candidate retrieval from both BM25 and embeddings, merged and re-ranked
Vector Store Options:
- ChromaDB: Persistent vector database with metadata support (default)
- FAISS: Fast in-memory search with disk persistence
The system uses a VectorStoreController for easy switching between stores. All functionality (indexing, searching, clearing) works identically regardless of the chosen store.
User Profile Learning
After 5+ interactions, the assistant builds a profile:
- Cognitive style: Analytical, creative, pragmatic, systematic
- Domain expertise: Python, AWS, DevOps, ML, etc.
- Learning style: Visual, hands-on, theoretical
- Communication patterns: Tone, complexity, question styles
- Code preferences: Testing, documentation, type hints
Profile is automatically injected into system prompt for personalization.
Episodic Memory
The episodic memory system learns from successful task completions and retrieves similar solutions for future queries.
How it works:
-
Automatic Storage: After each successful interaction, stores:
- Initial user query
- Full conversation context
- Tools used with arguments
- Final solution
- Timestamp
-
Hybrid Search: Retrieves similar episodes using:
- 70% semantic similarity (task intent)
- 30% BM25 keyword scoring (tool names, action verbs)
-
Context Injection: Before processing queries, injects compact context:
[Episodic Memory - Similar Past Tasks] 1. "read DOCX about ML" โ fs_read โ success (similarity: 0.85) 2. "analyze PDF report" โ fs_read, web_search โ success (similarity: 0.78) -
Automatic Cleanup: Maintains bounded memory:
- Max 1000 episodes
- Removes entries older than 90 days
- Runs on startup
Success Detection:
- User feedback: "thanks", "perfect", "great", "worked"
- No error markers in response
- All tools executed successfully
- Filters out greetings and simple acknowledgments (<300 chars, no tools)
Storage Location:
- FAISS:
~/.mnemoai/{profile}/models/{model}/episodic_memory/episodic.index - ChromaDB:
~/.mnemoai/{profile}/models/{model}/episodic_memory/
Configuration:
ENABLE_EPISODIC_MEMORY: true
EPISODIC_MEMORY:
STORE_TYPE: chromadb # or faiss
RAG:
EMBED_MODEL_ID: # Required for both stores
NAME: mxbai-embed-large
TYPE: ollama
ACE Playbook (Agentic Context Engineering)
The ACE Playbook learns strategies from both successes AND failures, implementing the Agentic Context Engineering framework for continuous improvement.
How it works:
-
Reflector: After each interaction, analyzes tool executions:
- Detects failure patterns (file not found, string not found, permission denied, etc.)
- Identifies successful strategies for specific tools (file_edit, execute_bash)
- Extracts specific, actionable insights (not generic summaries)
- Tracks metrics (success/failure rates, failure types) in
metrics.json
-
Playbook Store: Maintains structured strategy entries:
{ "context": "editing python files", "strategy": "Read the file first to get exact string including whitespace before using str_replace", "source": "Failed file_edit on 2026-02-01: string_not_found", "outcome": "failure", "tools": ["file_edit"], "confidence": 0.9 }
-
Context Injection: Injects relevant strategies into the system prompt at startup:
[Playbook - Learned Strategies] Avoid these patterns: โ [editing files]: Read the file first to get exact string before str_replace Effective strategies: โ [searching files]: Use glob_search instead of find for better performance -
Lazy Refinement: Only deduplicates when hitting token limits, using semantic similarity if embeddings are configured.
What gets stored:
- Failures: Specific patterns like
string_not_found,file_not_found,permission_denied,command_failed, etc. - Successes: Only for tools with reusable patterns (file_edit, execute_bash with specific commands)
- Not stored: Generic successes without actionable strategies
Key Differences from Episodic Memory:
| Feature | Episodic Memory | ACE Playbook |
|---|---|---|
| Stores | Full task completions | Granular strategies |
| Learns from | Successes only | Successes AND failures |
| Format | Conversation context | Structured rules |
| Retrieval | Semantic similarity | Context + tool matching |
Configuration:
ENABLE_PLAYBOOK: true
PLAYBOOK:
MAX_ENTRIES: 500 # Maximum entries before refinement
SIMILARITY_THRESHOLD: 0.85 # Threshold for merging similar strategies
MAX_INJECT: 10 # Maximum entries to inject per query
Storage Location:
- Strategies:
~/.mnemoai/{profile}/models/{model}/playbook/playbook.json - Metrics:
~/.mnemoai/{profile}/models/{model}/playbook/metrics.json
Training Data Collection
Supervised Fine-Tuning (SFT)
- Use
/goodto mark high-quality responses - Saved conversations include quality markers
- Extract labeled interactions for training
๐ฆ Dependencies
All Python dependencies are listed in requirements.txt. The new productivity tools use only standard library features:
| Tool | Python Packages | External Tools |
|---|---|---|
| TodoWrite | Standard library only | None |
| Edit Tool | Standard library only | None |
| Glob Search | Standard library (glob) |
None |
| Grep Search | Standard library (subprocess) |
ripgrep (optional) |
| Error Handler | Standard library (functools) |
None |
| Git Safety | Standard library (subprocess) |
git |
| Plan Mode | Standard library (json, os) |
None |
| Background Tasks | Standard library (threading) |
None |
External Tools:
- ripgrep: Required for
grep_searchtool. Install via system package manager (see Installation section). If not installed, the assistant automatically falls back to slower alternatives.
Core Python Packages:
langgraph: Agent orchestration frameworklangchain,langchain-core: LLM abstraction layerlangchain-ollama: Ollama integrationlangchain-aws: AWS Bedrock integrationlangchain-openai: OpenAI integration (also used for Bedrock Mantle OpenAI/Responses protocols)langchain-anthropic: Anthropic integration (Bedrock Mantleanthropicprotocol)aws-bedrock-token-generator: Bearer-token auth for Bedrock Mantlemcp,mcp[cli]: Model Context Protocolollama: Local LLM supportboto3: AWS Bedrock/SageMakertiktoken: Token countingchromadb,faiss-cpu: Vector stores for RAGPyPDF2,python-docx: Document readersPygments: Code syntax highlightingprompt_toolkit: Interactive CLIbrave-search-python-client: Web searchcrawl4ai: Web crawling
๐ ๏ธ Development
Testing
The test suite uses pytest and is split into two tiers under tests/:
tests/unit/โ fast, deterministic tests for pure logic (BM25, reasoning helpers, response parsing, subtask parsing, the tool error handler, git-safety command classification, file editing/search, bash timeout handling, and episodic-memory heuristics). No LLM, Ollama, or network required, so they run in seconds and don't need aconfig.yaml.tests/integration/โ end-to-end tests that drive the real agent against a live Ollama server and the MCP subprocess (routing, tool calls, bash timeout, no silent empty turns). Marked with@pytest.mark.integrationand auto-skipped unless a runtimeutils/config.yamlexists and the configured Ollama host is reachable.
# Install test dependencies
pip install -r requirements-dev.txt
# Run everything (integration auto-skips if Ollama/config aren't available)
python -m pytest
# Unit tier only (fast โ good for CI and pre-commit)
python -m pytest tests/unit
# Integration tier only (requires Ollama running + a real config.yaml)
python -m pytest -m integration
# Run a single file
python -m pytest tests/unit/test_bm25.py
When adding new code, keep import-time side effects independent of config.yaml so the module stays unit-testable.
Adding New Tools
- Create tool file in
server/tools/:
from mcp.server.fastmcp import FastMCP
def register_your_tool(mcp: FastMCP):
@mcp.tool()
async def your_tool(param: str) -> str:
"""Tool description for the LLM."""
# Implementation
return result
- Register in
tools_manager.py:
from .your_tool import register_your_tool
register_your_tool(mcp)
Adding New File Readers
- Create reader in
server/tools/readers/:
async def read_your_format(path: str) -> str:
"""Read your custom format."""
# Implementation
return content
- Register in
fs_read.py:
from .readers.your_reader import read_your_format
# Add to file type detection logic
Switching Model Providers
The application uses controller classes for centralized model management. To switch providers, just update config.yaml:
For LLM:
MODEL_ID:
NAME: your-model-name
TYPE: ollama # or bedrock, sagemaker
For Vision:
VISION_MODEL_ID:
NAME: your-vision-model
TYPE: ollama # or sagemaker
For Embeddings:
RAG:
EMBED_MODEL_ID:
NAME: mxbai-embed-large
TYPE: ollama
The controllers (llm_controller.py, vision_model_controller.py, embeddings_controller.py) handle all provider-specific initialization automatically.
Adding New Model Providers
- Update the appropriate controller in
models/:
def initialize_model(self):
if self.model_type == "your_provider":
# Your provider initialization
self.model = YourProviderModel(...)
- Add configuration in
config.yaml
๐ง Ollama Utilities (Optional)
The bash/ directory contains helper scripts for Ollama users on macOS and Linux.
Ollama Environment Setup (macOS)
Sets Ollama performance environment variables at boot and launches the Ollama app:
# Variables set: OLLAMA_FLASH_ATTENTION=1, OLLAMA_KV_CACHE_TYPE=q8_0, OLLAMA_NUM_GPU=999
Setup:
- Edit
bash/ollama-env-mac/ollama.environment.plist(no changes needed for defaults) - Copy to LaunchAgents:
cp bash/ollama-env-mac/ollama.environment.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/ollama.environment.plist
VRAM Cleaner
Automatically unloads idle Ollama models from VRAM to free GPU memory. Useful when running multiple models or when GPU memory is limited.
macOS (LaunchAgent, runs every 60 seconds):
- Edit
bash/ollama-freeup-vram/com.ollama.vramcleaner.plist:- Replace
<PATH_TO_FOLDER>with the actual path to this repository - Replace
<PATH_TO_USER_HOME>with your home directory
- Replace
- Install:
cp bash/ollama-freeup-vram/com.ollama.vramcleaner.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.ollama.vramcleaner.plist
Linux (systemd):
- Edit
bash/ollama-freeup-vram/ollama-vram-cleaner.service:- Replace
<PATH_TO_FOLDER>with the actual path
- Replace
- Install:
sudo cp bash/ollama-freeup-vram/ollama-vram-cleaner.service /etc/systemd/system/
sudo systemctl enable ollama-vram-cleaner
sudo systemctl start ollama-vram-cleaner
See bash/ollama-freeup-vram/README.md and bash/ollama-env-mac/README.md for more details.
๐ Troubleshooting
Common Issues
MCP Connection Errors
- Verify Python path in
client.pymatches your environment - Check server path is correct
- Ensure all dependencies are installed (
pip install -r requirements.txt)
Model Loading Issues
- Verify model name and type in
config.yaml - For Ollama: Ensure Ollama is running (
ollama serve) and model is pulled (ollama pull model-name) - For AWS Bedrock: Check credentials (
aws sts get-caller-identity), region, and model access - For OpenAI: Ensure
OPENAI_API_KEYenvironment variable is set
RAG / Episodic Memory Not Working
- Ensure
ENABLE_RAG: true(orENABLE_EPISODIC_MEMORY: true) in config - Verify embedding model is configured and available (
RAG.EMBED_MODEL_IDin config) - For Ollama embeddings: ensure the embedding model is pulled (
ollama pull mxbai-embed-large) - Check logs for "fallback embeddings" warnings โ this means the real model is unreachable
- Verify documents are being indexed with
list_documents()
Permission Errors
- Ensure write permissions for
~/.mnemoai/ - Ensure write permissions for
~/.mnemoai/(the app home: config, plans, tasks, per-profile state) - Check file paths in configuration
Import Errors on Startup
- Some dependencies (chromadb, faiss-cpu, crawl4ai) can be tricky to install. Check platform-specific instructions.
- On Apple Silicon:
faiss-cpumay requirepip install faiss-cpu --no-cache-dir
Logging
Logs are output to stderr with configurable level:
LOG_LEVEL=DEBUG mnemoai # Detailed logs
LOG_LEVEL=INFO mnemoai # Normal logs (default)
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ค Contributing
This is a personal development project. If you'd like to use or extend it, feel free to fork the repository and adapt it to your needs!
If you use this code in your own projects, attribution to the original repository is appreciated but not required.
๐ Acknowledgments
- Built with LangGraph and LangChain
- Uses FastMCP for Model Context Protocol
- Powered by Ollama, Amazon Bedrock, and Amazon SageMaker AI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mnemoai_assistant-0.2.0.tar.gz.
File metadata
- Download URL: mnemoai_assistant-0.2.0.tar.gz
- Upload date:
- Size: 29.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9ff8478e34ed6f213bdab3ed90ef8d3fe7a11e205a48f4a1fec671437125931
|
|
| MD5 |
7c02ada7423c3546a2a1f86b1e2536a2
|
|
| BLAKE2b-256 |
4cb985a85ed6d974ff8acddddd3ab86a0bb0e19a26f9d8c7c69d7b03f363b85f
|
File details
Details for the file mnemoai_assistant-0.2.0-py3-none-any.whl.
File metadata
- Download URL: mnemoai_assistant-0.2.0-py3-none-any.whl
- Upload date:
- Size: 219.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c074a14a2e0da3806f243faf28fdbc89919175a5601812efdb0e361592bcd1e9
|
|
| MD5 |
b1a9cf63d8fa7b8536fb9789a159ea09
|
|
| BLAKE2b-256 |
8bbb0d6d72148b697abbd4e233c2a9cb8c6615596a490fe488b7976b9f8bcd99
|