Local-first context engineering and style adaptation for LLM-assisted coding

These details have not been verified by PyPI

Project links

Project description

memory-layer

A local-first context engineering and style adaptation system for LLM-assisted coding.

memory-layer runs alongside your codebase, maintains a compact semantic graph of your code, and feeds that compressed representation to any LLM on every prompt. A weekly LoRA fine-tuning job adapts a local base model to your style — your formatting conventions, type annotation patterns, and error-handling idioms.

Install

pipx install memory-layer

Requires Python 3.11+ and Ollama.

No Python? Download a pre-built binary for your platform from the Releases page (Linux x86_64, macOS arm64, Windows x86_64).

Quick start

# Interactive setup: detects Ollama, suggests a model based on your GPU,
# writes ~/.memory-layer/config.toml
memory-layer init

# Start the file watcher + REST API (port 8000)
memory-layer api

# In a separate terminal — start the MCP server for editor integration
memory-layer mcp

The interactive init wizard:

Detects whether Ollama is running (and tells you how to start it if not)
Reads your GPU VRAM via nvidia-smi and suggests the best model size
Writes ~/.memory-layer/config.toml (no .env file required)

Skip all prompts with --yes for CI / Docker:

memory-layer init --yes --watch /path/to/project

Verify

curl http://localhost:8000/health          # → {"status":"ok"}
curl http://localhost:8000/context         # → {"context":"# Memory Layer Context ..."}
memory-layer status                        # active project, model, token-saving %

What it does

Layer	Responsibility
Layer 1 — Grounding Feed	Watches your files, extracts AST surfaces via tree-sitter, summarises each module with a local Ollama model
Layer 2 — Memory Engine	Maintains the project state: entity graph, change history, semantic embeddings, developer profile
Layer 3 — Context Delivery	Assembles a token-budgeted context block from the DB and delivers it over MCP (stdio) or HTTP (FastAPI)
Layer 4 — Style Adapter	Runs a weekly LoRA fine-tune on your accepted/corrected completions to adapt the model to your coding style

What it does not do

Layer 4 does not teach the model facts about your codebase. Facts live in Layer 2; retrieved on every prompt by Layer 3.
The system does not claim the model "remembers your codebase through training." Layer 4 learns your style, not your code.
Cloud sync, team sharing, and VS Code extensions are V4 features in a separate repo.

Multi-project support (V3)

V3 manages multiple projects automatically. Each repo gets its own isolated database under <repo>/.memory-layer/. A global registry at ~/.memory-layer/registry.db tracks which repos are active.

memory-layer projects                        # list all registered repos
memory-layer register /path/to/other/repo   # manual registration
memory-layer unregister <project-id>        # remove from registry

When your editor sends an MCP roots/list_changed notification (Cursor, Claude Desktop, Continue.dev), memory-layer mcp switches the active project automatically.

Editor integration (MCP)

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "memory-layer": {
      "command": "memory-layer",
      "args": ["mcp"]
    }
  },
  "systemPromptAppend": "Before answering coding questions, call the memory-layer get_context tool to load the current project state. This is required for every new session."
}

Claude Desktop auto-surfaces the memory://current/context resource to the model. The systemPromptAppend is belt-and-suspenders for sessions where the resource is not picked up automatically.

Run memory-layer api in a separate terminal so the MCP server sees live file changes.

Cursor

Add to .cursor/mcp.json in your project root (or ~/.cursor/mcp.json for global):

{
  "mcpServers": {
    "memory-layer": {
      "command": "memory-layer",
      "args": ["mcp"]
    }
  }
}

Continue.dev

In ~/.continue/config.json:

{
  "experimental": {
    "modelContextProtocolServers": [
      {
        "transport": {
          "type": "stdio",
          "command": "memory-layer",
          "args": ["mcp"]
        }
      }
    ]
  }
}

Cline (VS Code extension)

In Cline's MCP settings (gear icon → MCP Servers → Add):

{
  "memory-layer": {
    "command": "memory-layer",
    "args": ["mcp"],
    "disabled": false,
    "autoApprove": ["get_context", "semantic_search"]
  }
}

Gemini Code Assist (Standard / Enterprise)

In your workspace .gemini/settings.json:

{
  "mcpServers": {
    "memory-layer": {
      "command": "memory-layer",
      "args": ["mcp"]
    }
  }
}

Verifying auto-context

To confirm the AI is reading the context block automatically — without you calling get_context manually:

Start memory-layer api (watches files, writes DB) and memory-layer mcp (serves MCP).
Open your project in Claude Desktop or another resource-aware client.
Ask: "What files are in my project?" — do not mention get_context or @memory-layer.
The AI should answer with actual file names and summaries from your codebase.

How it works: On startup, resource-aware clients call resources/list; the MCP server returns memory://current/context and memory://current/developer-profile. The client injects those into the model's context window automatically. When you switch projects (your editor sends notifications/roots/list_changed), the server re-registers the new project and emits notifications/resources/list_changed so the client refreshes.

If the AI doesn't know your files:

Symptom	Fix
AI has no project knowledge at all	Check `memory-layer status` — has the project been indexed? Run `memory-layer api` first.
AI knows files after calling `get_context` but not before	Your client may not auto-surface MCP resources. Add `systemPromptAppend` (Claude Desktop) or equivalent.
AI loses context when switching repos	Ensure `roots/list_changed` notifications are enabled in your client settings.

Configuration

memory-layer init writes ~/.memory-layer/config.toml. You can edit it directly:

[ollama]
url              = "http://localhost:11434"
summarizer_model = "qwen2.5-coder:7b"   # verify with: ollama list
embedding_model  = "nomic-embed-text"

[watch]
dirs = ["/path/to/your/project"]

[api]
host = "127.0.0.1"   # use 0.0.0.0 to expose on LAN
port = 8000

# [layer4]
# base_model = ""   # HF model ID or local path; leave blank to skip LoRA

Environment variables override config.toml values. See .env.example for the full list.

Process model

Three separate processes; never combined:

Process A  memory-layer api    watcher + FastAPI on :8000 + discovery service
Process B  memory-layer train  scheduler + LoRA trainer (no network surface)
Process C  memory-layer mcp    MCP stdio server (reads DB read-only)

The MCP server reads from the same SQLite file that Process A writes to (WAL mode). Run memory-layer api in a separate terminal.

CLI reference

Command	Description
`memory-layer init [--yes] [--watch DIR]`	Interactive setup wizard
`memory-layer api`	Start watcher + REST API (port 8000)
`memory-layer mcp`	Start MCP stdio server
`memory-layer status [--language LANG]`	Show active project, model, token savings
`memory-layer projects`	List registered projects
`memory-layer register PATH`	Manually register a project
`memory-layer unregister PROJECT_ID`	Remove project from registry
`memory-layer migrate`	Apply pending DB migrations
`memory-layer reindex [--project ID]`	Force re-embedding of all entities
`memory-layer train [--force]`	Run LoRA training once
`memory-layer eval`	Evaluate current vs previous adapter
`memory-layer report [--days N]`	Print token-saving impact report
`memory-layer collect`	Interactive completion logger

REST API

Method	Path	Description
`GET`	`/health`	Liveness probe (no auth)
`GET`	`/context`	Return the current compressed context block
`POST`	`/completion`	Log a completion and update developer profile
`POST`	`/telemetry`	Record interaction metrics
`GET`	`/report`	Aggregated telemetry stats
`GET`	`/eval`	Held-out eval results for current vs previous adapter

Interactive docs: http://localhost:8000/docs

MCP tools

Tool	Description
`get_context`	Return the context block — inject as system prompt
`search_entities`	Fuzzy search active code entities by path or function name
`semantic_search`	Vector search over entity summaries (for concept-level queries)
`log_completion`	Log a completion outcome for Style Adapter training
`mark_reprompt`	Flag that a prior `get_context` response was insufficient; call with the `session_id` to mark it for telemetry analysis

Migration guide: V2 → V3

V3 is backward-compatible at the DB level. The migration runner (memory-layer migrate) handles schema upgrades automatically on startup.

Key changes:

Config location. V2 used a per-project .env. V3 adds ~/.memory-layer/config.toml as the user-level config. Your .env still works (env vars override config.toml), but memory-layer init generates config.toml for new installs.
Per-repo databases. V3 stores project data in <repo>/.memory-layer/memory.db rather than a single memory.db. Run memory-layer migrate once; the runner adds project_id to all tables.
Developer profile moved to global registry. Your coding-style profile now lives in ~/.memory-layer/registry.db and is shared across all projects. Existing data is migrated automatically.
Semantic search requires nomic-embed-text. Pull it with ollama pull nomic-embed-text, then run memory-layer reindex to build embeddings for existing entities.
MCP tool names unchanged. No editor config changes needed.

Style Adapter (Layer 4)

The Style Adapter is a LoRA fine-tune of your local base model, trained on completions you have accepted or corrected. It learns output style — not codebase facts.

Training gates (all required before a run is attempted):

Gate	Default	Rationale
Minimum total samples	500	Below this, LoRA produces noise or memorisation
Minimum new since last run	100	Avoid retraining on tiny deltas
Frequency	Weekly (Sun 02:00 UTC)	Amortises the cost of a 30B model run

Promotion gate: the new adapter is only promoted if its held-out loss improves on the previous adapter by ≥ 0.02. Failed runs are logged with status failed_eval; the previous adapter stays active.

Development

git clone https://github.com/yadu9989/memory-layer
cd memory-layer
pip install -e ".[dev]"
pytest
ruff check .

Optional extras:

pip install -e ".[train]"          # LoRA training (torch, transformers, peft)
pip install -e ".[cpu-embeddings]" # sentence-transformers fallback if Ollama unavailable
pip install -e ".[unsloth]"        # faster training via unsloth

Requirements

Python 3.11+
Ollama running locally (for file summaries and embeddings)
SQLite (bundled with Python)
(optional, for training) pip install -e ".[train]" — torch, transformers, peft, accelerate

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.1

May 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memory_layer_yadu9989-0.3.1.tar.gz (13.2 MB view details)

Uploaded May 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

memory_layer_yadu9989-0.3.1-py3-none-any.whl (76.8 kB view details)

Uploaded May 29, 2026 Python 3

File details

Details for the file memory_layer_yadu9989-0.3.1.tar.gz.

File metadata

Download URL: memory_layer_yadu9989-0.3.1.tar.gz
Upload date: May 29, 2026
Size: 13.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for memory_layer_yadu9989-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`f2b17e72f9cc48863c2ef626b5a76a476df52944353d97e678f4706a633769bb`
MD5	`767b66a479ceffa86e9c4953fa334b2f`
BLAKE2b-256	`dda38c533810a7ca7cf39ea79539655553b36cd693e3b2c12e5b7a9a204b60cc`

See more details on using hashes here.

File details

Details for the file memory_layer_yadu9989-0.3.1-py3-none-any.whl.

File metadata

Download URL: memory_layer_yadu9989-0.3.1-py3-none-any.whl
Upload date: May 29, 2026
Size: 76.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for memory_layer_yadu9989-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6dc668c65d9cfbcf84bc9e25cc7ad8c0d103756290612a259ba017c3eec4ed57`
MD5	`a0edcc616467f4e811724a19ba8e4c34`
BLAKE2b-256	`edfc7350289c8614cdb49562f0b472f81e61a2154628b47ea6c45e3f8d5d1e83`

See more details on using hashes here.

memory-layer-yadu9989 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

memory-layer

Install

Quick start

Verify

What it does

What it does not do

Multi-project support (V3)

Editor integration (MCP)

Claude Desktop

Cursor

Continue.dev

Cline (VS Code extension)

Gemini Code Assist (Standard / Enterprise)

Verifying auto-context

Configuration

Process model

CLI reference

REST API

MCP tools

Migration guide: V2 → V3

Style Adapter (Layer 4)

Development

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes