Skip to main content

Local-first context engineering and style adaptation for LLM-assisted coding

Project description

memory-layer

A local-first context engineering and style adaptation system for LLM-assisted coding.

memory-layer runs alongside your codebase, maintains a compact semantic graph of your code, and feeds that compressed representation to any LLM on every prompt. A weekly LoRA fine-tuning job adapts a local base model to your style — your formatting conventions, type annotation patterns, and error-handling idioms.


Install

pipx install memory-layer

Requires Python 3.11+ and Ollama.

No Python? Download a pre-built binary for your platform from the Releases page (Linux x86_64, macOS arm64, Windows x86_64).


Quick start

# Interactive setup: detects Ollama, suggests a model based on your GPU,
# writes ~/.memory-layer/config.toml
memory-layer init

# Start the file watcher + REST API (port 8000)
memory-layer api

# In a separate terminal — start the MCP server for editor integration
memory-layer mcp

The interactive init wizard:

  • Detects whether Ollama is running (and tells you how to start it if not)
  • Reads your GPU VRAM via nvidia-smi and suggests the best model size
  • Writes ~/.memory-layer/config.toml (no .env file required)

Skip all prompts with --yes for CI / Docker:

memory-layer init --yes --watch /path/to/project

Verify

curl http://localhost:8000/health          # → {"status":"ok"}
curl http://localhost:8000/context         # → {"context":"# Memory Layer Context ..."}
memory-layer status                        # active project, model, token-saving %

What it does

Layer Responsibility
Layer 1 — Grounding Feed Watches your files, extracts AST surfaces via tree-sitter, summarises each module with a local Ollama model
Layer 2 — Memory Engine Maintains the project state: entity graph, change history, semantic embeddings, developer profile
Layer 3 — Context Delivery Assembles a token-budgeted context block from the DB and delivers it over MCP (stdio) or HTTP (FastAPI)
Layer 4 — Style Adapter Runs a weekly LoRA fine-tune on your accepted/corrected completions to adapt the model to your coding style

What it does not do

  • Layer 4 does not teach the model facts about your codebase. Facts live in Layer 2; retrieved on every prompt by Layer 3.
  • The system does not claim the model "remembers your codebase through training." Layer 4 learns your style, not your code.
  • Cloud sync, team sharing, and VS Code extensions are V4 features in a separate repo.

Multi-project support (V3)

V3 manages multiple projects automatically. Each repo gets its own isolated database under <repo>/.memory-layer/. A global registry at ~/.memory-layer/registry.db tracks which repos are active.

memory-layer projects                        # list all registered repos
memory-layer register /path/to/other/repo   # manual registration
memory-layer unregister <project-id>        # remove from registry

When your editor sends an MCP roots/list_changed notification (Cursor, Claude Desktop, Continue.dev), memory-layer mcp switches the active project automatically.


Editor integration (MCP)

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "memory-layer": {
      "command": "memory-layer",
      "args": ["mcp"]
    }
  },
  "systemPromptAppend": "Before answering coding questions, call the memory-layer get_context tool to load the current project state. This is required for every new session."
}

Claude Desktop auto-surfaces the memory://current/context resource to the model. The systemPromptAppend is belt-and-suspenders for sessions where the resource is not picked up automatically.

Run memory-layer api in a separate terminal so the MCP server sees live file changes.

Cursor

Add to .cursor/mcp.json in your project root (or ~/.cursor/mcp.json for global):

{
  "mcpServers": {
    "memory-layer": {
      "command": "memory-layer",
      "args": ["mcp"]
    }
  }
}

Continue.dev

In ~/.continue/config.json:

{
  "experimental": {
    "modelContextProtocolServers": [
      {
        "transport": {
          "type": "stdio",
          "command": "memory-layer",
          "args": ["mcp"]
        }
      }
    ]
  }
}

Cline (VS Code extension)

In Cline's MCP settings (gear icon → MCP Servers → Add):

{
  "memory-layer": {
    "command": "memory-layer",
    "args": ["mcp"],
    "disabled": false,
    "autoApprove": ["get_context", "semantic_search"]
  }
}

Gemini Code Assist (Standard / Enterprise)

In your workspace .gemini/settings.json:

{
  "mcpServers": {
    "memory-layer": {
      "command": "memory-layer",
      "args": ["mcp"]
    }
  }
}

Verifying auto-context

To confirm the AI is reading the context block automatically — without you calling get_context manually:

  1. Start memory-layer api (watches files, writes DB) and memory-layer mcp (serves MCP).
  2. Open your project in Claude Desktop or another resource-aware client.
  3. Ask: "What files are in my project?" — do not mention get_context or @memory-layer.
  4. The AI should answer with actual file names and summaries from your codebase.

How it works: On startup, resource-aware clients call resources/list; the MCP server returns memory://current/context and memory://current/developer-profile. The client injects those into the model's context window automatically. When you switch projects (your editor sends notifications/roots/list_changed), the server re-registers the new project and emits notifications/resources/list_changed so the client refreshes.

If the AI doesn't know your files:

Symptom Fix
AI has no project knowledge at all Check memory-layer status — has the project been indexed? Run memory-layer api first.
AI knows files after calling get_context but not before Your client may not auto-surface MCP resources. Add systemPromptAppend (Claude Desktop) or equivalent.
AI loses context when switching repos Ensure roots/list_changed notifications are enabled in your client settings.

Configuration

memory-layer init writes ~/.memory-layer/config.toml. You can edit it directly:

[ollama]
url              = "http://localhost:11434"
summarizer_model = "qwen2.5-coder:7b"   # verify with: ollama list
embedding_model  = "nomic-embed-text"

[watch]
dirs = ["/path/to/your/project"]

[api]
host = "127.0.0.1"   # use 0.0.0.0 to expose on LAN
port = 8000

# [layer4]
# base_model = ""   # HF model ID or local path; leave blank to skip LoRA

Environment variables override config.toml values. See .env.example for the full list.


Process model

Three separate processes; never combined:

Process A  memory-layer api    watcher + FastAPI on :8000 + discovery service
Process B  memory-layer train  scheduler + LoRA trainer (no network surface)
Process C  memory-layer mcp    MCP stdio server (reads DB read-only)

The MCP server reads from the same SQLite file that Process A writes to (WAL mode). Run memory-layer api in a separate terminal.


CLI reference

Command Description
memory-layer init [--yes] [--watch DIR] Interactive setup wizard
memory-layer api Start watcher + REST API (port 8000)
memory-layer mcp Start MCP stdio server
memory-layer status [--language LANG] Show active project, model, token savings
memory-layer projects List registered projects
memory-layer register PATH Manually register a project
memory-layer unregister PROJECT_ID Remove project from registry
memory-layer migrate Apply pending DB migrations
memory-layer reindex [--project ID] Force re-embedding of all entities
memory-layer train [--force] Run LoRA training once
memory-layer eval Evaluate current vs previous adapter
memory-layer report [--days N] Print token-saving impact report
memory-layer collect Interactive completion logger

REST API

Method Path Description
GET /health Liveness probe (no auth)
GET /context Return the current compressed context block
POST /completion Log a completion and update developer profile
POST /telemetry Record interaction metrics
GET /report Aggregated telemetry stats
GET /eval Held-out eval results for current vs previous adapter

Interactive docs: http://localhost:8000/docs


MCP tools

Tool Description
get_context Return the context block — inject as system prompt
search_entities Fuzzy search active code entities by path or function name
semantic_search Vector search over entity summaries (for concept-level queries)
log_completion Log a completion outcome for Style Adapter training
mark_reprompt Flag that a prior get_context response was insufficient; call with the session_id to mark it for telemetry analysis

Migration guide: V2 → V3

V3 is backward-compatible at the DB level. The migration runner (memory-layer migrate) handles schema upgrades automatically on startup.

Key changes:

  1. Config location. V2 used a per-project .env. V3 adds ~/.memory-layer/config.toml as the user-level config. Your .env still works (env vars override config.toml), but memory-layer init generates config.toml for new installs.

  2. Per-repo databases. V3 stores project data in <repo>/.memory-layer/memory.db rather than a single memory.db. Run memory-layer migrate once; the runner adds project_id to all tables.

  3. Developer profile moved to global registry. Your coding-style profile now lives in ~/.memory-layer/registry.db and is shared across all projects. Existing data is migrated automatically.

  4. Semantic search requires nomic-embed-text. Pull it with ollama pull nomic-embed-text, then run memory-layer reindex to build embeddings for existing entities.

  5. MCP tool names unchanged. No editor config changes needed.


Style Adapter (Layer 4)

The Style Adapter is a LoRA fine-tune of your local base model, trained on completions you have accepted or corrected. It learns output style — not codebase facts.

Training gates (all required before a run is attempted):

Gate Default Rationale
Minimum total samples 500 Below this, LoRA produces noise or memorisation
Minimum new since last run 100 Avoid retraining on tiny deltas
Frequency Weekly (Sun 02:00 UTC) Amortises the cost of a 30B model run

Promotion gate: the new adapter is only promoted if its held-out loss improves on the previous adapter by ≥ 0.02. Failed runs are logged with status failed_eval; the previous adapter stays active.


Development

git clone https://github.com/yadu9989/memory-layer
cd memory-layer
pip install -e ".[dev]"
pytest
ruff check .

Optional extras:

pip install -e ".[train]"          # LoRA training (torch, transformers, peft)
pip install -e ".[cpu-embeddings]" # sentence-transformers fallback if Ollama unavailable
pip install -e ".[unsloth]"        # faster training via unsloth

Requirements

  • Python 3.11+
  • Ollama running locally (for file summaries and embeddings)
  • SQLite (bundled with Python)
  • (optional, for training) pip install -e ".[train]" — torch, transformers, peft, accelerate

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memory_layer_yadu9989-0.3.1.tar.gz (13.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

memory_layer_yadu9989-0.3.1-py3-none-any.whl (76.8 kB view details)

Uploaded Python 3

File details

Details for the file memory_layer_yadu9989-0.3.1.tar.gz.

File metadata

  • Download URL: memory_layer_yadu9989-0.3.1.tar.gz
  • Upload date:
  • Size: 13.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for memory_layer_yadu9989-0.3.1.tar.gz
Algorithm Hash digest
SHA256 f2b17e72f9cc48863c2ef626b5a76a476df52944353d97e678f4706a633769bb
MD5 767b66a479ceffa86e9c4953fa334b2f
BLAKE2b-256 dda38c533810a7ca7cf39ea79539655553b36cd693e3b2c12e5b7a9a204b60cc

See more details on using hashes here.

File details

Details for the file memory_layer_yadu9989-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for memory_layer_yadu9989-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6dc668c65d9cfbcf84bc9e25cc7ad8c0d103756290612a259ba017c3eec4ed57
MD5 a0edcc616467f4e811724a19ba8e4c34
BLAKE2b-256 edfc7350289c8614cdb49562f0b472f81e61a2154628b47ea6c45e3f8d5d1e83

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page