Local memory for coding agents: deterministic zero-token writes, a first-class failure guard (never repeat a mistake), and a token-efficient briefing. Local CPU semantic recall — no API keys, no cloud.

These details have not been verified by PyPI

Project links

Project description

🧠 Memir

The memoir your coding agent writes for itself. Long-term memory for coding agents that doesn't eat your context window. Local-first. Zero API keys. And it never lets the agent repeat a mistake.

Python License Runs on CPU

Memir = memory + memoir + Mímir, the Norse keeper of memory and wisdom. Your agent keeps a memoir; Memir is where it lives.

The problem

Every coding agent forgets. Switch chats, hit the context limit, start a new session — and it re-asks what it already knew, re-decides what you already decided, and repeats the exact mistake it made an hour ago.

The common "fix" is to stuff the whole history back into the prompt every turn. That burns thousands of tokens per request and still overflows as the project grows.

The idea

Memir stores three kinds of memory in a tiny local SQLite file:

kind	what it captures
fact	durable truths about the project (endpoints, conventions, constraints)
decision	a choice and the reason for it, so it isn't relitigated
failure	what was tried, why it failed, and the lesson — a first-class memory

Then it hands the agent a token-budgeted briefing at the start of a session and a failure guard it can check before trying something.

Writing a memory costs 0 tokens and 0 API calls — it's a local insert, sub- millisecond. Compare that to memory frameworks that run an LLM extraction call on every add.

Why it's different

First-class FAILURE memory. Most memory tools remember facts. Memir remembers mistakes — what failed, why, and what to do instead — and blocks the repeat. That's the load-bearing feature for a coding agent.
0-token, deterministic writes. No LLM in the capture path. Nothing to bill, nothing to rate-limit, nothing to go down.
Local semantic recall by default. Ships with a retrieval-tuned, numpy-only embedding model (model2vec potion-retrieval-32M) that runs in ~1 ms on a CPU — no GPU, no torch, no API. Paraphrase matching works out of the box, and gracefully falls back to keyword recall if the model can't be loaded.
Anti-bloat. Write-time dedup plus a safe consolidate() that only merges genuine restatements — it will never collapse two distinct facts (verified in the benchmark below).
Bounded context cost. A naive "replay everything" agent grows without bound; the briefing here stays inside a token budget you set.
Plugs into any agent via MCP. One stdio server works with Cursor, Claude Desktop, Cline, VS Code, Windsurf, …

Install & run (one command)

pip install memir

That pulls everything — including the local model deps. To grab the model and warm the cache up front:

memir setup      # downloads the CPU model, initialises the store
memir start      # starts the MCP server (stdio) for your editor/agent

Prefer a fully scripted bootstrap that creates a venv and self-installs?

./scripts/start.sh        # macOS / Linux  — first run sets everything up
.\scripts\start.ps1       # Windows PowerShell

Run it once and it self-installs; run it again and it just starts.

Quickstart (Python)

from memir import Memir

brain = Memir("my-project")          # local model loads automatically (CPU)

# remember things (0 tokens, sub-ms, no API)
brain.remember_fact("The billing job MUST run in UTC; our partner reports in UTC.")
brain.record_decision("Use SQLite FTS5 for recall.", reason="100x faster than scanning.")

# record a failure so it's never repeated
brain.record_failure(
    attempt="POSTed the whole batch to /v1/ingest in one request.",
    reason="The gateway silently drops bodies >256KB and returns an empty 200.",
    lesson="Chunk uploads under 256KB and verify each ack id.",
)

# BEFORE trying something, check the failure guard
if hits := brain.check_failure("send the full batch to /v1/ingest at once"):
    print("⛔", hits[0].lesson)   # -> Chunk uploads under 256KB and verify each ack id.

# start a fresh session fully informed, within a token budget
print(brain.briefing(budget_tokens=300))

Auto-capture from real errors

# parses the traceback, decides it's non-obvious, stores it as a failure:
brain.capture_error("RuntimeError: gateway returned 200 with empty body — silently dropped")

# a textbook typo is judged self-evident and skipped (no bloat):
brain.capture_error("NameError: name 'foo' is not defined")   # -> None

# or wrap risky work and capture whatever blows up:
with brain.watch("uploading batch to /v1/ingest"):
    client.upload(batch)

Command line

memir remember "Deploys go out via the blue/green pipeline only."
memir decision "Adopt SQLite WAL" --why "concurrent reads during writes"
memir failure  "Dropped the prod table" --why "ran migration w/o backup" \
               --lesson "snapshot before every migration"
memir recall   "how do we deploy?"
memir check    "run the migration now"      # failure guard
memir briefing --budget 300
memir stats
memir doctor                                 # environment + MCP config check

Use it from your editor (MCP)

Memir ships a Model Context Protocol server (stdio JSON-RPC, no SDK).

Cursor / Windsurf / Claude Desktop — add to your MCP config:

{
  "mcpServers": {
    "memir": {
      "command": "memir",
      "args": ["start"],
      "env": {
        "MEMIR_PROJECT": "my-project",
        "MEMIR_DB": ".memir/memir.db"
      }
    }
  }
}

VS Code (.vscode/mcp.json):

{
  "servers": {
    "memir": { "command": "memir", "args": ["start"] }
  }
}

Tools exposed: remember_fact, record_decision, record_failure, record_directive, recall, check_failure, briefing, capture_error, stats, consolidate.

record_directive stores a standing user instruction (e.g. "keep working until X is done; do not stop early"). Directives are pinned to the top of every briefing and are never dropped by the token budget, so the agent can never drift from — or stop short of — what the user told it to do.

Set MEMIR_EMBED=0 to force keyword-only mode (skips the model entirely). Override the model with MEMIR_MODEL=minishlab/potion-base-8M (30 MB, faster).

Benchmark (reproducible, honest)

python benchmarks/benchmark.py — synthetic multi-session fact-QA, fixed seed, fully local, each fact restated 3× to expose bloat. Memir is measured head-to-head against the realistic local baselines an agent could actually use:

strategy	recall@1	recall@3	tokens / turn	rows kept
no-memory	0%	0%	~10	0
full-context (replay everything)	100%	100%	6,906	600
naive-rag (add-only vector store)	96.5%	99.5%	35	1,800
Memir	100%	100%	36	1,200

192× fewer tokens per turn than replaying the transcript, for the same answer availability.
Leads recall@3 (100%) — matching full-context and beating the add-only vector baseline — at a bounded token cost.
Anti-bloat: keeps ~33% fewer rows than the add-only store, without ever merging two distinct facts.
Write: ~0.8 ms / memory, 0 tokens, 0 API calls.
Recall: ~8 ms / query.
Failure-repeat prevention: 100% of known failures blocked on retry — which no other strategy here can do at all.

Honesty note. The dataset is synthetic so the numbers reproduce on any machine. naive-rag is an idealized, free local stand-in for add-only memory stores (mem0-style read path) run without their per-write LLM extraction call — so its write cost here is a generous lower bound for those products. We deliberately do not print accuracy figures for mem0 / Zep / Letta, because running them needs API keys / cloud and we won't fabricate competitor results.

How it stays lean

Write-time dedup — identical memories are never stored twice.
consolidate() — merges semantic near-duplicates only when they are both meaning-equivalent and lexically overlapping (a true restatement), so it can never silently delete a distinct fact.
contradictions() — surfaces conflicting memories non-destructively (negation-aware, so it doesn't mistake a paraphrase for a conflict).
prune() — decays cold facts/decisions, but never auto-drops a failure.
Tiers — session / project / global; global memories survive pruning and can be export_brain()-ed and re-import_brain()-ed into other projects.

Honest limitations

Contradiction detection is heuristic. Telling "X is true" from "X is false" perfectly needs an NLI/LLM step; the built-in detector is a fast, negation-aware approximation that surfaces candidates for review rather than auto-resolving.
Consolidation is conservative on purpose. It errs toward keeping two rows rather than risk merging two different facts — so a loosely-worded paraphrase of the same fact may survive as a separate row. That's the safe trade.
Auto-capture is intentionally conservative. It skips self-evident errors by design. Immediately-visible mistakes (a NameError typo) are not stored.
The benchmark dataset is synthetic. It measures the system's own properties honestly; it is not a head-to-head accuracy claim against other products.

Development

pip install -e ".[dev]"
pytest -q
python benchmarks/benchmark.py

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Jun 9, 2026

0.3.3

Jun 8, 2026

0.3.2

Jun 8, 2026

0.3.1

Jun 8, 2026

0.3.0

Jun 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memir-0.4.0.tar.gz (72.3 kB view details)

Uploaded Jun 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

memir-0.4.0-py3-none-any.whl (59.7 kB view details)

Uploaded Jun 9, 2026 Python 3

File details

Details for the file memir-0.4.0.tar.gz.

File metadata

Download URL: memir-0.4.0.tar.gz
Upload date: Jun 9, 2026
Size: 72.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for memir-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`2cf46728b78b71cf524de9213fc1ca0779b8c3f0cacb9d30dd10ebee845d9760`
MD5	`20cf40500e69510affcf77661266e765`
BLAKE2b-256	`4bdeb315825d6bac51ecbfadfd7e6e1aee295a989c7585edf787bfd45eafad02`

See more details on using hashes here.

File details

Details for the file memir-0.4.0-py3-none-any.whl.

File metadata

Download URL: memir-0.4.0-py3-none-any.whl
Upload date: Jun 9, 2026
Size: 59.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for memir-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`167d8b68e3e82b1d8059e8cfcddfe05521df8a278db17baa10abe9018d518046`
MD5	`e379949d6ee8cbb685b404b28a486c1c`
BLAKE2b-256	`460fbdd66778c43123fa56538b2bbda7e837b0ee56809f96315d083c4c6d0050`

See more details on using hashes here.

memir 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🧠 Memir

The problem

The idea

Why it's different

Install & run (one command)

Quickstart (Python)

Auto-capture from real errors

Command line

Use it from your editor (MCP)

Benchmark (reproducible, honest)

How it stays lean

Honest limitations

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes