Skip to main content

Local memory for coding agents: deterministic zero-token writes, a first-class failure guard (never repeat a mistake), and a token-efficient briefing. Local CPU semantic recall — no API keys, no cloud.

Project description

🧠 Memir

The memoir your coding agent writes for itself. Long-term memory for coding agents that doesn't eat your context window. Local-first. Zero API keys. And it never lets the agent repeat a mistake.

CI Python License Runs on CPU

Memir = memory + memoir + Mímir, the Norse keeper of memory and wisdom. Your agent keeps a memoir; Memir is where it lives.


The problem

Every coding agent forgets. Switch chats, hit the context limit, start a new session — and it re-asks what it already knew, re-decides what you already decided, and repeats the exact mistake it made an hour ago.

The common "fix" is to stuff the whole history back into the prompt every turn. That burns thousands of tokens per request and still overflows as the project grows.

The idea

Memir stores three kinds of memory in a tiny local SQLite file:

kind what it captures
fact durable truths about the project (endpoints, conventions, constraints)
decision a choice and the reason for it, so it isn't relitigated
failure what was tried, why it failed, and the lesson — a first-class memory

Then it hands the agent a token-budgeted briefing at the start of a session and a failure guard it can check before trying something.

Writing a memory costs 0 tokens and 0 API calls — it's a local insert, sub- millisecond. Compare that to memory frameworks that run an LLM extraction call on every add.


Why it's different

  • First-class FAILURE memory. Most memory tools remember facts. Memir remembers mistakes — what failed, why, and what to do instead — and blocks the repeat. That's the load-bearing feature for a coding agent.
  • 0-token, deterministic writes. No LLM in the capture path. Nothing to bill, nothing to rate-limit, nothing to go down.
  • Local semantic recall by default. Ships with a retrieval-tuned, numpy-only embedding model (model2vec potion-retrieval-32M) that runs in ~1 ms on a CPU — no GPU, no torch, no API. Paraphrase matching works out of the box, and gracefully falls back to keyword recall if the model can't be loaded.
  • Anti-bloat. Write-time dedup plus a safe consolidate() that only merges genuine restatements — it will never collapse two distinct facts.
  • Bounded context cost. A naive "replay everything" agent grows without bound; the briefing here stays inside a token budget you set.
  • Plugs into any agent via MCP. One stdio server works with Cursor, Claude Desktop, Cline, VS Code, Windsurf, …

Install

pip install memir

The base install is light — model2vec + numpy only (no torch, no GPU, no API key). Optional local CPU accuracy features are opt-in extras:

pip install "memir[rerank]"      # cross-encoder reranker (precision second stage)
pip install "memir[nli]"         # NLI reasoner (contradiction detection)
pip install "memir[contextual]"  # contextual BGE encoder (stronger multi-hop)
pip install "memir[ml]"          # all of the above

Warm the model cache once so later runs work fully offline:

memir setup      # downloads the CPU model, initialises the store
memir start      # starts the MCP server (stdio) for your editor/agent

Quickstart (Python)

from memir import Memir

brain = Memir("my-project")          # local model loads automatically (CPU)

# remember things (0 tokens, sub-ms, no API)
brain.remember_fact("The billing job MUST run in UTC; our partner reports in UTC.")
brain.record_decision("Use SQLite FTS5 for recall.", reason="100x faster than scanning.")

# record a failure so it's never repeated
brain.record_failure(
    attempt="POSTed the whole batch to /v1/ingest in one request.",
    reason="The gateway silently drops bodies >256KB and returns an empty 200.",
    lesson="Chunk uploads under 256KB and verify each ack id.",
)

# BEFORE trying something, check the failure guard
if hits := brain.check_failure("send the full batch to /v1/ingest at once"):
    print("⛔", hits[0].lesson)   # -> Chunk uploads under 256KB and verify each ack id.

# start a fresh session fully informed, within a token budget
print(brain.briefing(budget_tokens=300))

Auto-capture from real errors

# parses the traceback, decides it's non-obvious, stores it as a failure:
brain.capture_error("RuntimeError: gateway returned 200 with empty body — silently dropped")

# a textbook typo is judged self-evident and skipped (no bloat):
brain.capture_error("NameError: name 'foo' is not defined")   # -> None

Command line

memir remember "Deploys go out via the blue/green pipeline only."
memir decision "Adopt SQLite WAL" --why "concurrent reads during writes"
memir failure  "Dropped the prod table" --why "ran migration w/o backup" \
               --lesson "snapshot before every migration"
memir recall   "how do we deploy?"
memir check    "run the migration now"      # failure guard
memir briefing --budget 300
memir stats
memir doctor                                 # environment + MCP config check

Use it from your editor (MCP)

Memir ships a Model Context Protocol server (stdio JSON-RPC).

Cursor / Windsurf / Claude Desktop — add to your MCP config:

{
  "mcpServers": {
    "memir": {
      "command": "memir",
      "args": ["start"],
      "env": { "MEMIR_PROJECT": "my-project", "MEMIR_DB": ".memir/memir.db" }
    }
  }
}

VS Code (.vscode/mcp.json):

{ "servers": { "memir": { "command": "memir", "args": ["start"] } } }

Tools exposed: remember_fact, record_decision, record_failure, recall, check_failure, briefing, capture_error, stats, consolidate.

Set MEMIR_EMBED=0 to force keyword-only mode (skips the model entirely). Override the model with MEMIR_MODEL=minishlab/potion-base-8M (30 MB, faster).


Benchmarks

Memir is measured against real memory backends (mem0, cognee, graphiti, letta) and an oracle upper bound under a single uniform LLM judge. The honest headline: Memir matches the top accuracy tier while injecting ~10× fewer memory tokens per turn, locally and with zero-token writes. See docs/BENCHMARKS.md for the full tables, the competitor head-to-heads, and the documented limits (relational multi-hop at large scale).


License

PolyForm Noncommercial 1.0.0free for research, personal, educational, and other noncommercial use. Commercial use is reserved; for a commercial license, open an issue. (The author holds the copyright and may relicense more permissively in the future.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memir-0.3.1.tar.gz (53.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

memir-0.3.1-py3-none-any.whl (49.7 kB view details)

Uploaded Python 3

File details

Details for the file memir-0.3.1.tar.gz.

File metadata

  • Download URL: memir-0.3.1.tar.gz
  • Upload date:
  • Size: 53.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for memir-0.3.1.tar.gz
Algorithm Hash digest
SHA256 0dc8443998a83bc60e7edbc8eaa1311d8f2ffdd17c219f0a50ebc8a11dd1cf4a
MD5 f38256d54d16ea6bb80c381ece4154e7
BLAKE2b-256 83b4dfef26a98883b8b1475835629c11bac51b4d0bce1241b7cc4f330a255c84

See more details on using hashes here.

File details

Details for the file memir-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: memir-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 49.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for memir-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 279c5da7ebbfa004b2ff408cf515d91a1b025c7a23c8c54a588b8ba7f04ef631
MD5 2367c4499090f30b5790bf3f6c02c468
BLAKE2b-256 56a0379255e0e62b7817979574610b5c41e6add3f58f94960a94b59e7076a841

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page