Local-first context engineering and style adaptation for LLM-assisted coding
Project description
memory-layer
A local-first context engineering and style adaptation system for LLM-assisted coding.
memory-layer runs alongside your codebase, maintains a compact semantic graph of your code, and feeds that compressed representation to any LLM on every prompt. A weekly LoRA fine-tuning job adapts a local base model to your style — your formatting conventions, type annotation patterns, and error-handling idioms.
Install
pipx install memory-layer
Requires Python 3.11+ and Ollama.
No Python? Download a pre-built binary for your platform from the Releases page (Linux x86_64, macOS arm64, Windows x86_64).
Quick start
# Interactive setup: detects Ollama, suggests a model based on your GPU,
# writes ~/.memory-layer/config.toml
memory-layer init
# Start the file watcher + REST API (port 8000)
memory-layer api
# In a separate terminal — start the MCP server for editor integration
memory-layer mcp
The interactive init wizard:
- Detects whether Ollama is running (and tells you how to start it if not)
- Reads your GPU VRAM via
nvidia-smiand suggests the best model size - Writes
~/.memory-layer/config.toml(no.envfile required)
Skip all prompts with --yes for CI / Docker:
memory-layer init --yes --watch /path/to/project
Verify
curl http://localhost:8000/health # → {"status":"ok"}
curl http://localhost:8000/context # → {"context":"# Memory Layer Context ..."}
memory-layer status # active project, model, token-saving %
What it does
| Layer | Responsibility |
|---|---|
| Layer 1 — Grounding Feed | Watches your files, extracts AST surfaces via tree-sitter, summarises each module with a local Ollama model |
| Layer 2 — Memory Engine | Maintains the project state: entity graph, change history, semantic embeddings, developer profile |
| Layer 3 — Context Delivery | Assembles a token-budgeted context block from the DB and delivers it over MCP (stdio) or HTTP (FastAPI) |
| Layer 4 — Style Adapter | Runs a weekly LoRA fine-tune on your accepted/corrected completions to adapt the model to your coding style |
What it does not do
- Layer 4 does not teach the model facts about your codebase. Facts live in Layer 2; retrieved on every prompt by Layer 3.
- The system does not claim the model "remembers your codebase through training." Layer 4 learns your style, not your code.
- Cloud sync, team sharing, and VS Code extensions are V4 features in a separate repo.
Multi-project support (V3)
V3 manages multiple projects automatically. Each repo gets its own isolated database under <repo>/.memory-layer/. A global registry at ~/.memory-layer/registry.db tracks which repos are active.
memory-layer projects # list all registered repos
memory-layer register /path/to/other/repo # manual registration
memory-layer unregister <project-id> # remove from registry
When your editor sends an MCP roots/list_changed notification (Cursor, Claude Desktop, Continue.dev), memory-layer mcp switches the active project automatically.
Editor integration (MCP)
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"memory-layer": {
"command": "memory-layer",
"args": ["mcp"]
}
},
"systemPromptAppend": "Before answering coding questions, call the memory-layer get_context tool to load the current project state. This is required for every new session."
}
Claude Desktop auto-surfaces the memory://current/context resource to the model. The systemPromptAppend is belt-and-suspenders for sessions where the resource is not picked up automatically.
Run
memory-layer apiin a separate terminal so the MCP server sees live file changes.
Cursor
Add to .cursor/mcp.json in your project root (or ~/.cursor/mcp.json for global):
{
"mcpServers": {
"memory-layer": {
"command": "memory-layer",
"args": ["mcp"]
}
}
}
Continue.dev
In ~/.continue/config.json:
{
"experimental": {
"modelContextProtocolServers": [
{
"transport": {
"type": "stdio",
"command": "memory-layer",
"args": ["mcp"]
}
}
]
}
}
Cline (VS Code extension)
In Cline's MCP settings (gear icon → MCP Servers → Add):
{
"memory-layer": {
"command": "memory-layer",
"args": ["mcp"],
"disabled": false,
"autoApprove": ["get_context", "semantic_search"]
}
}
Gemini Code Assist (Standard / Enterprise)
In your workspace .gemini/settings.json:
{
"mcpServers": {
"memory-layer": {
"command": "memory-layer",
"args": ["mcp"]
}
}
}
Verifying auto-context
To confirm the AI is reading the context block automatically — without you calling get_context manually:
- Start
memory-layer api(watches files, writes DB) andmemory-layer mcp(serves MCP). - Open your project in Claude Desktop or another resource-aware client.
- Ask: "What files are in my project?" — do not mention
get_contextor@memory-layer. - The AI should answer with actual file names and summaries from your codebase.
How it works: On startup, resource-aware clients call resources/list; the MCP server returns memory://current/context and memory://current/developer-profile. The client injects those into the model's context window automatically. When you switch projects (your editor sends notifications/roots/list_changed), the server re-registers the new project and emits notifications/resources/list_changed so the client refreshes.
If the AI doesn't know your files:
| Symptom | Fix |
|---|---|
| AI has no project knowledge at all | Check memory-layer status — has the project been indexed? Run memory-layer api first. |
AI knows files after calling get_context but not before |
Your client may not auto-surface MCP resources. Add systemPromptAppend (Claude Desktop) or equivalent. |
| AI loses context when switching repos | Ensure roots/list_changed notifications are enabled in your client settings. |
Configuration
memory-layer init writes ~/.memory-layer/config.toml. You can edit it directly:
[ollama]
url = "http://localhost:11434"
summarizer_model = "qwen2.5-coder:7b" # verify with: ollama list
embedding_model = "nomic-embed-text"
[watch]
dirs = ["/path/to/your/project"]
[api]
host = "127.0.0.1" # use 0.0.0.0 to expose on LAN
port = 8000
# [layer4]
# base_model = "" # HF model ID or local path; leave blank to skip LoRA
Environment variables override config.toml values. See .env.example for the full list.
Process model
Three separate processes; never combined:
Process A memory-layer api watcher + FastAPI on :8000 + discovery service
Process B memory-layer train scheduler + LoRA trainer (no network surface)
Process C memory-layer mcp MCP stdio server (reads DB read-only)
The MCP server reads from the same SQLite file that Process A writes to (WAL mode). Run memory-layer api in a separate terminal.
CLI reference
| Command | Description |
|---|---|
memory-layer init [--yes] [--watch DIR] |
Interactive setup wizard |
memory-layer api |
Start watcher + REST API (port 8000) |
memory-layer mcp |
Start MCP stdio server |
memory-layer status [--language LANG] |
Show active project, model, token savings |
memory-layer projects |
List registered projects |
memory-layer register PATH |
Manually register a project |
memory-layer unregister PROJECT_ID |
Remove project from registry |
memory-layer migrate |
Apply pending DB migrations |
memory-layer reindex [--project ID] |
Force re-embedding of all entities |
memory-layer train [--force] |
Run LoRA training once |
memory-layer eval |
Evaluate current vs previous adapter |
memory-layer report [--days N] |
Print token-saving impact report |
memory-layer collect |
Interactive completion logger |
REST API
| Method | Path | Description |
|---|---|---|
GET |
/health |
Liveness probe (no auth) |
GET |
/context |
Return the current compressed context block |
POST |
/completion |
Log a completion and update developer profile |
POST |
/telemetry |
Record interaction metrics |
GET |
/report |
Aggregated telemetry stats |
GET |
/eval |
Held-out eval results for current vs previous adapter |
Interactive docs: http://localhost:8000/docs
MCP tools
| Tool | Description |
|---|---|
get_context |
Return the context block — inject as system prompt |
search_entities |
Fuzzy search active code entities by path or function name |
semantic_search |
Vector search over entity summaries (for concept-level queries) |
log_completion |
Log a completion outcome for Style Adapter training |
mark_reprompt |
Flag that a prior get_context response was insufficient; call with the session_id to mark it for telemetry analysis |
Migration guide: V2 → V3
V3 is backward-compatible at the DB level. The migration runner (memory-layer migrate) handles schema upgrades automatically on startup.
Key changes:
-
Config location. V2 used a per-project
.env. V3 adds~/.memory-layer/config.tomlas the user-level config. Your.envstill works (env vars overrideconfig.toml), butmemory-layer initgeneratesconfig.tomlfor new installs. -
Per-repo databases. V3 stores project data in
<repo>/.memory-layer/memory.dbrather than a singlememory.db. Runmemory-layer migrateonce; the runner addsproject_idto all tables. -
Developer profile moved to global registry. Your coding-style profile now lives in
~/.memory-layer/registry.dband is shared across all projects. Existing data is migrated automatically. -
Semantic search requires
nomic-embed-text. Pull it withollama pull nomic-embed-text, then runmemory-layer reindexto build embeddings for existing entities. -
MCP tool names unchanged. No editor config changes needed.
Style Adapter (Layer 4)
The Style Adapter is a LoRA fine-tune of your local base model, trained on completions you have accepted or corrected. It learns output style — not codebase facts.
Training gates (all required before a run is attempted):
| Gate | Default | Rationale |
|---|---|---|
| Minimum total samples | 500 | Below this, LoRA produces noise or memorisation |
| Minimum new since last run | 100 | Avoid retraining on tiny deltas |
| Frequency | Weekly (Sun 02:00 UTC) | Amortises the cost of a 30B model run |
Promotion gate: the new adapter is only promoted if its held-out loss improves on the previous adapter by ≥ 0.02. Failed runs are logged with status failed_eval; the previous adapter stays active.
Development
git clone https://github.com/yadu9989/memory-layer
cd memory-layer
pip install -e ".[dev]"
pytest
ruff check .
Optional extras:
pip install -e ".[train]" # LoRA training (torch, transformers, peft)
pip install -e ".[cpu-embeddings]" # sentence-transformers fallback if Ollama unavailable
pip install -e ".[unsloth]" # faster training via unsloth
Requirements
- Python 3.11+
- Ollama running locally (for file summaries and embeddings)
- SQLite (bundled with Python)
- (optional, for training)
pip install -e ".[train]"— torch, transformers, peft, accelerate
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file memory_layer_yadu9989-0.3.1.tar.gz.
File metadata
- Download URL: memory_layer_yadu9989-0.3.1.tar.gz
- Upload date:
- Size: 13.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2b17e72f9cc48863c2ef626b5a76a476df52944353d97e678f4706a633769bb
|
|
| MD5 |
767b66a479ceffa86e9c4953fa334b2f
|
|
| BLAKE2b-256 |
dda38c533810a7ca7cf39ea79539655553b36cd693e3b2c12e5b7a9a204b60cc
|
File details
Details for the file memory_layer_yadu9989-0.3.1-py3-none-any.whl.
File metadata
- Download URL: memory_layer_yadu9989-0.3.1-py3-none-any.whl
- Upload date:
- Size: 76.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6dc668c65d9cfbcf84bc9e25cc7ad8c0d103756290612a259ba017c3eec4ed57
|
|
| MD5 |
a0edcc616467f4e811724a19ba8e4c34
|
|
| BLAKE2b-256 |
edfc7350289c8614cdb49562f0b472f81e61a2154628b47ea6c45e3f8d5d1e83
|