Skip to main content

A plastic memory well for coding agents.

Project description

Mimir

A plastic memory well for coding agents.

Python 3.10+ License: Apache-2.0 PyPI GitHub

Unlike vector notebooks, Mimir reshapes its own embedding space as you work. One matrix. Zero cloud. Always local.


Why Mimir?

Most agent memory tools are filing cabinets: they store text, then search it. They don't learn. They don't adapt. They just retrieve.

Mimir is different.

It maintains a small, fixed-size prototype matrix that is updated with every interaction via Hebbian-style local learning. The same embedding model keeps its base weights, but Mimir overlays a fast, plastic layer that bends toward your domain, your project, and your habits.

The result: a memory system that feels less like a database and more like a second brain that gets sharper the more you use it.

What makes Mimir unique

Other memory tools Mimir
Core model Store and retrieve discrete facts Learn a plastic prototype matrix
Learning Heuristic scoring, TTL decay, or no learning at all Hebbian / Oja local updates + Markov prediction
State size Often grows with history (SQLite/vector DB) Fixed [k × dim] matrix, typically < 100 MB
Offline Needs cloud API or hosted vector DB Runs locally with llama-server, sentence-transformers, or fake backend
Inference Calls DB/index on every recall Pure matrix operation, no locks, predictable latency
Hook noise Stores every short reply, including "ok" and "continue" Language-aware small-talk filtering + importance gating
Security Stores secrets verbatim Automatic API key / token / password redaction
Project context Manual copy-paste of instruction files Auto-discovers AGENTS.md, CLAUDE.md, .cursorrules
Quality Duplicate memories accumulate Duplicate blocking + contradiction hints at store time

Secret redaction patterns can be customized or disabled entirely via MimirConfig(redaction_patterns=...) (None = defaults, [] = disabled).


Quick Start

pip install mimir-core

Use it from Python

from mimir import Mimir, MimirConfig

config = MimirConfig(
    base_model="all-MiniLM-L6-v2",
    num_prototypes=64,
    top_k=4,
)
mimir = Mimir(config)

# Encode
emb = mimir.encode("hello world")
print(emb.shape)  # (1, 384)

# Learn
report = mimir.learn("hello world", importance=1.0)
print(report)

# Save / load
mimir.save("checkpoint.pt")
mimir.load("checkpoint.pt")

Plug it into your coding agent

Mimir exposes an MCP server and Agent CLI hooks for Kimi Code, Claude Code, Codex, and OpenCode. Each workspace is isolated under ~/.mimir/workspaces/.

Start the MCP server

# Local embedding backend (recommended)
llama-server \
  --model Qwen3-Embedding-8B-Q4_K_M.gguf \
  --embeddings \
  --port 11435

# Or use sentence-transformer if you don't have llama-server
mimir mcp --backend sentence-transformer

Configure Kimi Code / Claude Code / Codex (.mcp.json)

{
  "mcpServers": {
    "mimir": {
      "command": "mimir",
      "args": ["mcp", "--backend", "sentence-transformer"]
    }
  }
}

Configure OpenCode (.opencode/opencode.jsonc)

{
  "mcp": {
    "mimir": {
      "type": "local",
      "command": ["mimir", "mcp", "--backend", "sentence-transformer"]
    }
  }
}

One-shot setup with mimir setup

Instead of editing config files by hand, you can install hooks automatically:

mimir setup kimi-code
mimir setup claude-code
mimir setup codex

Use --base-dir to write the configuration somewhere other than the agent's default config directory (useful for custom dotfiles layouts or CI):

mimir setup kimi-code --base-dir ~/my-configs/kimi-code

This writes the correct hook definitions for each agent CLI and is safe to run multiple times.

Manual hook configuration

# ~/.kimi-code/config.toml
[[hooks]]
event = "UserPromptSubmit"
command = "python3 -m mimir.hooks.mimir_turn"
timeout = 10

[[hooks]]
event = "Stop"
command = "python3 -m mimir.hooks.mimir_turn"
timeout = 10

Both the MCP store() and the hook Stop path apply the same redaction, filtering, and duplicate-blocking pipeline, so secrets are never persisted regardless of how a memory is captured.

See docs/mcp-user-guide.md for the full hook guide.


How It Works

Text
  │
  ▼
[Base embedding model] ───────┐
  │                            │
  ▼                            │
Base embedding                │
  │                            │
  ▼                            │
[Prototype matrix lookup]     │
  │                            │
  ▼                            │
Sparse prototype activation   │
  │                            │
  ▼                            │
Residual modulation ◄─────────┘
  │
  ▼
Mimir embedding
  • Slow weights: the frozen base embedding model gives stable semantic priors.
  • Fast weights: a fixed-capacity prototype matrix encodes your local domain.
  • Learning: each input activates the nearest prototypes and nudges them toward the new observation.
  • Prediction: a first-order Markov transition matrix predicts the next prototype and emits a surprise_score.
  • Forgetting: prototype strength decays exponentially; weak prototypes are overwritten when capacity is full.

This design is inspired by Prototype Theory in cognitive psychology and Predictive Coding in neuroscience: memory is not a pile of events, but a compressed set of typical examples that continuously updates itself.


MCP Tools

Tool Purpose
store(text, importance=1.0) Store and learn from text. Secrets are redacted before storage; the response text field is the redacted form. With async store enabled, returns "pending" and processes in the background.
recall(query, top_k=5, min_score=0.0) Hybrid vector + BM25 recall, reranked by lifecycle metadata
consolidate() Consolidate the working memory buffer
forget() Clear the current session's working memory
checkpoint(name) Save a named checkpoint
restore(name) Restore to a named checkpoint
status() Show session stats. When async store is enabled, includes async_store with enabled and pending_count.

store() may also include reason, similar_memory, or contradictions in its response when a memory is rejected or appears to contradict an existing memory.


Quality Gate

store() runs a lightweight quality gate before learning:

  • Duplicate blocking: near-duplicate memories (cosine similarity ≥ 0.95) are rejected instead of accumulating.
  • Contradiction hints: simple negation/polarity checks flag pairs like "I use Python" vs "I don't use Python". The memory is still stored, but the result includes the hint so the agent can ask before acting on stale context.

Both checks can be disabled via MimirConfig:

MimirConfig(quality_gate_enabled=False)

Or tune the thresholds:

MimirConfig(
    quality_gate_enabled=True,
    quality_gate_duplicate_threshold=0.95,
    quality_gate_contradiction_threshold=0.85,
)

Async store

For MCP / agent integrations where the embedding backend is slow, you can defer store() to a background worker so the tool returns immediately:

MimirConfig(
    async_store_enabled=True,
    async_store_queue_size=1000,
    async_store_flush_timeout=5.0,
)

When async storage is enabled, store() returns one of:

  • {"stored": "pending", "text": ..., "memory_count": ..., "pending_count": ...} when the item is enqueued successfully.
  • {"stored": False, "text": ..., "memory_count": ..., "reason": "queue_full"} when the queue is at capacity.

The background worker performs duplicate checks, learning, and persistence. On MCP server shutdown the queue is flushed so pending memories are not lost.

status() includes an async_store dictionary with enabled, pending_count, and worker_alive so you can monitor the queue health.

Other useful configuration fields

Field Default Purpose
redaction_enabled True Enable secret redaction
redaction_patterns None Custom regex list (None = defaults, [] = disabled)
project_context_enabled True Auto-ingest AGENTS.md / CLAUDE.md / .cursorrules
project_context_importance 1.5 Importance assigned to project context memories
async_store_enabled False Defer embedding/learning to background worker
async_store_queue_size 1000 Max pending items for async store
async_store_flush_timeout 5.0 Seconds to wait for flush on shutdown

Programming Interface

If you want to use Mimir inside your own Python code:

from mimir.adapters.agents import InMemoryAgentAdapter, Message
from mimir.core.config import MimirConfig

adapter = InMemoryAgentAdapter(
    config=MimirConfig(base_model="all-MiniLM-L6-v2", top_k=4),
)

adapter.observe([
    Message(role="user", content="请用 Python 写快排"),
    Message(role="assistant", content="..."),
])
adapter.consolidate()
memories = adapter.recall("Python 排序", top_k=3)

print(adapter.memory_count)
adapter.clear_memories()

AgentMemoryInterface also exposes encode(texts) to retrieve embeddings for a list of texts, which is useful for custom duplicate checks or integrations:

embeddings = adapter.encode(["hello world", "goodbye world"])

See docs/agent-integration.md for the adapter API.


CLI

# Encode
mimir encode --backend sentence-transformer "hello world"

# Learn
mimir learn --backend llama-server "重要上下文"

# Evaluate
python -m mimir.eval --backend llama-server --top-k 4

Status & Roadmap

Mimir is currently v0.3.0.

  • MVP: encode / learn / save / load
  • Top-k sparse prototype activation
  • EventBus + PredictionPolicy + surprise score
  • MCP server + Agent CLI hooks
  • BM25 + lifecycle hybrid recall
  • Multi-language small-talk filtering for automatic hook capture
  • Secret redaction for API keys, tokens, and passwords
  • Project context discovery (AGENTS.md, CLAUDE.md, .cursorrules)
  • mimir setup <agent> one-shot configuration
  • Duplicate blocking and contradiction hints
  • Async embedding queue
  • SQLite-backed memory metadata
  • HITL preview before storing high-impact memories

See docs/roadmap.md for the full roadmap.


Embedding Backend Performance

Mimir supports multiple embedding backends. Choose based on your hardware and latency requirements.

Measured on Apple Silicon (M-series), 128-sample batch:

Backend Model Dim Cold Start Short Text Long Text Batch-128 Throughput Memory
sentence-transformer all-MiniLM-L6-v2 384 ~20.6s* 4.9ms 6.1ms 23ms ~4,500/s ~350MB
llama-server Qwen3-Embedding-8B-Q4_K_M 4096 ~43ms 32ms 641ms 10,509ms ~12/s ~288MB

* Cold start includes one-time model download/load. Subsequent runs are fast.

Recommendation

  • Use sentence-transformer for local development and everyday use.
  • Use llama-server when you need higher-quality embeddings and can accept higher latency.

Run the benchmark yourself:

python scripts/benchmark_embedding_backends.py

License

Apache-2.0


Mimir is named after the Norse guardian of the well of wisdom — a source of knowledge that deepens with every visit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mimir_core-0.3.0.tar.gz (100.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mimir_core-0.3.0-py3-none-any.whl (124.1 kB view details)

Uploaded Python 3

File details

Details for the file mimir_core-0.3.0.tar.gz.

File metadata

  • Download URL: mimir_core-0.3.0.tar.gz
  • Upload date:
  • Size: 100.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for mimir_core-0.3.0.tar.gz
Algorithm Hash digest
SHA256 92a038ec74505b5b91b3cf049c91c018b557ff598d19eff2eccba876f769a940
MD5 d9a3362d2d17cacf87202683e1dfd4a7
BLAKE2b-256 394d6341835f0f9df926312aa0bc9a9b4d7dd48f0188b9e746962ec932e7eede

See more details on using hashes here.

File details

Details for the file mimir_core-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: mimir_core-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 124.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for mimir_core-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 92c26c094cbd2ccfdac0f4c6be4762288e632c306c0ddb5ebd7e2d60450904a1
MD5 c0c42012220091a7a3ea1071ddfd5b38
BLAKE2b-256 edc830b564f8d26aa790ac8bb22c9718266570a8e7c0019716f02739dfbc46e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page