The compound knowledge system for AI agents
Project description
MemKraft 🧠
Bitemporal memory × empirical tuning: the first self-improvement ledger for AI agents. Your agent's accountable past, in plain Markdown.
🏆 LongMemEval 98.0% — #1 on open-source agent long-term memory benchmarks (Surpasses MemPalace 96.6%, MEMENTO by Microsoft 90.8% · LLM-as-judge · oracle 50 · 3-run semantic majority)
v2.7.0 · Zero-dependency compound knowledge system for AI agents. Auto-extract, classify, search, tune, and time-travel — all in plain Markdown. Debugging is memory. Time travel is memory. Multi-agent handoffs are memory. Facts have bitemporal validity. Memories decay reversibly. Wiki links build graphs. Tuning iterations leave an audit trail.
Plain Markdown source-of-truth · zero deps · zero keys · zero LLM calls inside MemKraft. In 30 seconds:
pipx install memkraft && memkraft init && memkraft agents-hint claude-code
📖 Table of Contents
- ⚡ Quickstart (30s)
- 🎯 Why MemKraft?
- 🧩 Features
- 🍳 Real-world Recipes
- 🔬 Specialized Features
- 📚 API Reference
- ⌨️ CLI Reference
- 🏗️ Architecture
- ⚖️ Comparison
- 🏆 Reproducing LongMemEval
- 🆙 Staying Up To Date
- 📝 Changelog
- 🤝 Contributing
- 📄 License
- 🙏 Appendix: Inspirations & Credits
⚡ Quickstart (30s)
pip install memkraft
memkraft init # → creates ./memory/ with RESOLVER, TEMPLATES, entities/, ...
memkraft agents-hint claude-code >> AGENTS.md # your agent is now memory-aware
Or scaffold a full project
memkraft init --template claude-code # CLAUDE.md + memory/ + examples
memkraft init --template cursor # .cursorrules + memory/
memkraft init --template mcp # claude_desktop_config snippet + memory/
memkraft init --template rag # retrieval-focused structure
memkraft init --template minimal # just memory/entities/
memkraft templates list # see all presets
Templates are idempotent — re-running init --template X never overwrites your edits.
Or in Python:
from memkraft import MemKraft
mk = MemKraft("./memory"); mk.init()
mk.track("Simon Kim", entity_type="person", source="news")
mk.update("Simon Kim", info="Launched MemKraft 0.8.1", source="PyPI")
mk.search("MemKraft")
That's it. Your agent now has persistent memory as plain markdown files.
No API keys. No database. No config. Just .md files you own.
Optional extras
pip install 'memkraft[mcp]' # + MCP server (`python -m memkraft.mcp`)
pip install 'memkraft[watch]' # + auto-reindex on save (`memkraft watch`)
pip install 'memkraft[all]' # everything
Connect Any Agent in 30 Seconds
memkraft agents-hint <target> prints copy-paste-ready integration snippets:
memkraft agents-hint claude-code # → CLAUDE.md / AGENTS.md block
memkraft agents-hint openclaw # → AGENTS.md block for ОpenClaw
memkraft agents-hint cursor # → .cursorrules block
memkraft agents-hint openai # → Custom GPT / function-calling schema
memkraft agents-hint mcp # → claude_desktop_config.json snippet
memkraft agents-hint langchain # → LangChain StructuredTool wrappers
Paste the output. Done. Or pipe it straight into your config:
memkraft agents-hint claude-code >> AGENTS.md
See examples/ for runnable variants.
🎯 Why MemKraft?
What Makes MemKraft Different
| MemKraft | Mem0 | Letta | |
|---|---|---|---|
| Dependencies | 0 | many | many |
| API key required | No | Yes | Yes |
| Source of truth | Plain .md |
Cloud/DB | DB |
| Local-first | ✅ | — | — |
| Git-friendly | ✅ | — | — |
API overview (14 public methods)
| API | Since | Role |
|---|---|---|
track |
0.5 | Start tracking an entity |
update |
0.5 | Append information to an entity |
search |
0.5 | Hybrid search (exact + IDF + fuzzy + BM25) |
tier_set |
0.8 | Set tier: core / recall / archival |
fact_add |
0.8 | Record a bitemporal fact (fact_type: episodic / semantic / procedural since 2.6) |
log_event |
0.8 | Log a timestamped event |
decision_record |
0.9 | Capture a decision with rationale |
evidence_first |
0.9 | Retrieve evidence before acting |
prompt_register |
1.0 | Register a prompt/skill as an entity |
prompt_eval |
1.0 | Record one tuning iteration |
prompt_evidence |
1.0 | Cite past tuning results |
convergence_check |
1.0 | Auto-judge convergence |
auto_tier |
2.6 | Recommend core / recall / archival from (recency, frequency, importance); dry_run=True by default |
cache_stats |
2.7 | Inspect search cache hit/miss/eviction counters and current generation |
Also new in 2.6: silent contradiction detection on fact_add, plus 1-hop graph neighbor expansion for counting-style queries (how many, list all).
New in 2.7: in-process search result caching for search_v2() and search_smart() — thread-safe LRU + TTL (default capacity 256, TTL 300s). Mutations (update, track, fact_add, log_event, consolidate, decision_record, dream_cycle) auto-invalidate via a generation counter, so callers never need to think about cache coherence. Opt-out per call with cache=False. Measured 6.14x speedup on a hot-path workload (152 → 931 qps) and 1.65x on a 50/50 mixed workload — raw numbers in benchmarks/v2.7.0-bench-result.json. Zero breaking changes.
Self-improvement loop: register → tune → recall → decide, every step auditable and time-travelable. See MIGRATION.md for upgrading from 0.9.x (zero breaking changes).
The 1.0 Self-Improvement Loop
Register a prompt/skill, record iterations, cite past evidence, and let MemKraft auto-judge when to stop tuning — all in plain Markdown, no LLM calls inside MemKraft:
from memkraft import MemKraft
mk = MemKraft("./memory")
# 1. register a prompt/skill as a first-class entity
mk.prompt_register(
"my-skill",
path="skills/my-skill/SKILL.md",
owner="zeon",
tags=["tuning"],
)
# 2. record each empirical iteration (host agent dispatches the run
# — MemKraft only persists the report)
mk.prompt_eval(
"my-skill",
iteration=1,
scenarios=[{
"name": "parallel-dispatch",
"description": "3 subagents at once",
"requirements": [{"item": "all return", "critical": True}],
}],
results=[{
"scenario": "parallel-dispatch",
"success": True, "accuracy": 85,
"tool_uses": 5, "duration_ms": 2000,
"unclear_points": ["schema missing"],
"discretion": [],
}],
)
# 3. cite past iterations before the next run
mk.prompt_evidence("my-skill", "accuracy regression")
# 4. stop when the last N iterations stabilise
verdict = mk.convergence_check("my-skill", window=2)
# -> {"converged": False, "reason": "insufficient-iters",
# "iterations_checked": [1],
# "suggested_next": "patch-and-iterate", ...}
Each call leaves an auditable trail on disk: a decision record per iteration, an incident when unclear points pile up, and wiki-links between iterations. Upgrade is zero-breaking from 0.9.x — see MIGRATION.md.
## 🧩 Features
Ingestion & Extraction
| Feature | Description |
|---|---|
| Auto-extract | Pipe any text in, get entities + facts out. Regex-based NER for EN, KR, CN, JP - no LLM calls. |
| CJK detection | 806 stopwords, 100 Chinese surnames, 85 Japanese surnames, Korean particle stripping. |
| Cognify pipeline | Routes inbox/ items to the right directory. Recommend-only by default - --apply to move. |
| Fact registry | Extracts currencies, percentages, dates, quantities into a cross-domain index. |
| Originals capture | Save raw text verbatim - no paraphrasing. |
| Confidence levels | Tag facts as verified / experimental / hypothesis. Dream Cycle warns untagged facts. |
| Applicability conditions | --when "condition" --when-not "condition" - facts get When: / When NOT: metadata. |
Search & Retrieval
| Feature | Description |
|---|---|
| Fuzzy search | difflib.SequenceMatcher-based. Works offline, zero setup. |
| Brain-first lookup | Searches entities → notes → decisions → meetings. Stops after sufficient high-relevance results. |
| Agentic search | Multi-hop: decompose query → search → traverse [[wiki-links]] → re-rank by tier/recency/confidence/applicability. |
| Goal-weighted re-ranking | Conway SMS: same query with different --context produces different rankings. |
| Feedback loop | --file-back: search results auto-filed back to entity timelines (compound interest for memory). |
| Progressive disclosure | 3-level query: L1 index (~50 tokens) → L2 section headers → L3 full file. |
| Backlinks | [[entity-name]] cross-references. See every page that references an entity. |
| Link suggestions | Auto-suggest missing [[wiki-links]] based on known entity names. |
Structure & Organization
| Feature | Description |
|---|---|
| Compiled Truth + Timeline | Dual-layer entity model: mutable current state + append-only audit trail with [Source:] tags. |
| Memory tiers | Core / Recall / Archival - explicit context window priority. promote to reclassify. |
| Memory type classification | 8 types: identity, belief, preference, relationship, skill, episodic, routine, transient. |
| Type-aware decay | Identity memories decay 10x slower than routine memories. Differential decay multipliers. |
| RESOLVER.md | MECE classification tree - every piece of knowledge has exactly one destination. |
| Source attribution | Every fact tagged with [Source: who, when, how]. Enforced by Dream Cycle. |
| Dialectic synthesis | Auto-detect contradictory facts during extract, tag [CONFLICT], generate CONFLICTS.md. |
| Conflict resolution | `resolve-conflicts --strategy newest |
| Live Notes | Persistent tracking for people and companies. Auto-incrementing updates + timeline. |
Maintenance & Audit
| Feature | Description |
|---|---|
| Dream Cycle | Nightly auto-maintenance: missing sources, thin pages, duplicates, inbox age, bloated pages, daily notes. |
| Debug Hypothesis Tracking | OBSERVE → HYPOTHESIZE → EXPERIMENT → CONCLUDE flow. Track hypotheses, evidence, rejections. Auto-switch warning after 2 failures. Search past sessions to avoid repeating failed approaches. |
| Health Check | 5 self-diagnostic assertions: source attribution, orphan facts, duplicates, inbox freshness, unresolved conflicts. Pass rate % + health score (A/B/C/D). |
| Memory decay | Older, unaccessed memories naturally decay - type-aware differential curves. |
| Fact dedup | Detects and merges duplicate facts across entities. |
| Auto-summarize | Condenses bloated pages while preserving key information. |
| Diff tracking | See exactly what changed since the last Dream Cycle. |
| Open loop tracking | Finds all pending / TODO / FIXME items across memory. |
Logging & Reflection
| Feature | Description |
|---|---|
| Session logging | JSONL event trail with tags, importance, entity, task, and decision fields. |
| Daily retrospective | Auto-generated Well / Bad / Next from session events + file changes. |
| Decision distillation | Scans events and notes for decision candidates. EN + KR keyword matching. |
| Meeting briefs | One command compiles entity info, timeline, open threads, and a pre-meeting checklist. |
Debugging
| Feature | Description |
|---|---|
| ✅ Debug Hypothesis Tracking | OBSERVE→HYPOTHESIZE→EXPERIMENT→CONCLUDE loop with persistent failure memory. |
🍳 Real-world Recipes
memkraft init
memkraft extract "Simon Kim is the CEO of Hashed in Seoul." --source "news"
memkraft brief "Simon Kim"
memkraft doctor # 🟢/🟡/🔴 health check with fix hints
memkraft doctor --fix --yes # auto-repair missing structure (create-only, never deletes)
memkraft stats --export json # workspace stats for CI dashboards
memkraft mcp doctor # validate MCP server readiness
memkraft mcp test # remember→search→recall smoke test
MCP (Claude Desktop / Cursor) setup: see docs/mcp-setup.md.
Python Usage
from memkraft import MemKraft
mk = MemKraft("/path/to/memory")
mk.init() # returns {"created": [...], "exists": [...], "base_dir": "..."}
# Extract entities & facts from text
mk.extract_conversations("Simon Kim is the CEO of Hashed.", source="news")
# Track an entity
mk.track("Simon Kim", entity_type="person", source="news")
mk.update("Simon Kim", info="Launched MemKraft", source="X/@simonkim_nft")
# Search with fuzzy matching
results = mk.search("venture capital", fuzzy=True)
# Agentic multi-hop search with context-aware re-ranking
results = mk.agentic_search(
"who is the CEO of Hashed",
context="crypto investment research", # Conway SMS: same query, different context → different ranking
file_back=True, # feedback loop: results auto-filed back to entity timelines
)
# Run health check (5 self-diagnostic assertions)
report = mk.health_check()
# → {"pass_rate": 80.0, "health_score": "A", ...}
# Dream Cycle - nightly maintenance
mk.dream(dry_run=True)
More CLI examples - 6 daily patterns that cover 90% of use
# 1. Extract & Track - auto-detect entities from any text
memkraft extract "Simon Kim is the CEO of Hashed in Seoul." --source "news"
memkraft extract "Revenue grew 85% YoY" --confidence verified --when "bull market"
memkraft track "Simon Kim" --type person --source "X/@simonkim_nft"
memkraft update "Simon Kim" --info "Launched MemKraft" --source "X/@simonkim_nft"
# 2. Search & Recall - find anything in your memory
memkraft search "venture capital" --fuzzy
memkraft search "Seoul VC" --file-back # feedback loop: auto-file to timelines
memkraft lookup "Simon" --brain-first
memkraft agentic-search "who is the CEO of Hashed" --context "meeting prep"
# 3. Meeting Prep - compile all context before a meeting
memkraft brief "Simon Kim"
memkraft brief "Simon Kim" --file-back # record brief generation in timeline
memkraft links "Simon Kim"
# 4. Ingest & Classify - inbox → structured pages (safe by default)
memkraft cognify # recommend-only; add --apply to move files
memkraft detect "Jack Ma and 马化腾 discussed AI" --dry-run
# 5. Log & Reflect - structured audit trail
memkraft log --event "Deployed v0.3" --tags deploy --importance high
memkraft retro # daily Well / Bad / Next retrospective
# 6. Maintain & Heal - Dream Cycle keeps memory healthy
memkraft health-check # 5 assertions → pass rate + health score (A/B/C/D)
memkraft dream --dry-run # nightly: sources, duplicates, bloated pages
memkraft resolve-conflicts --strategy confidence # resolve contradictory facts
memkraft diff # what changed since last maintenance?
memkraft open-loops # find all unresolved items
# 7. Debug Hypothesis Tracking - "Debugging is Memory"
memkraft debug start "API returns 500 on POST /users"
memkraft debug hypothesis "Database connection timeout"
memkraft debug evidence "DB pool healthy" --result contradicts
memkraft debug reject --reason "DB is fine"
memkraft debug hypothesis "Request validation missing"
memkraft debug evidence "Empty POST triggers 500" --result supports
memkraft debug confirm
memkraft debug end "Added request body validation"
memkraft debug search-rejected "timeout" # avoid past mistakes
## 🔬 Specialized Features
This section gathers features that deserve a closer look: time-travel snapshots, multi-agent context plumbing, autonomous memory lifecycle, and scientific debugging.
📸 Memory Snapshots & Time Travel (v0.5.1)
| Feature | Description |
|---|---|
| Snapshot | Create a point-in-time manifest of all memory files (hash, size, summary, sections, fact count, link count). Optionally embed full content. |
| Snapshot List | List all saved snapshots, newest first, with labels and metadata. |
| Snapshot Diff | Compare two snapshots (or snapshot vs live state). Shows added, removed, modified, unchanged files with byte deltas. |
| Time Travel | Search memory as it was at a past snapshot. Answer "what did I know about X on March 1st?" |
| Entity Timeline | Track how a specific entity evolved across all snapshots — new, modified, unchanged, deleted states. |
from memkraft import MemKraft
mk = MemKraft("/path/to/memory")
# Take a snapshot before a big operation
snap = mk.snapshot(label="before-migration", include_content=True)
# ... time passes, memory changes ...
# What changed?
diff = mk.snapshot_diff(snap["snapshot_id"]) # vs live state
# → {added: [...], removed: [...], modified: [...], unchanged_count: 42}
# Search memory as it was at that snapshot
results = mk.time_travel("venture capital", snapshot_id=snap["snapshot_id"])
# How did an entity evolve over time?
timeline = mk.snapshot_entity("Simon Kim")
# → [{snapshot_id, timestamp, fact_count, size, change_type: "new"}, ...]
🧠 Channel Context Memory + Task Continuity + Agent Working Memory (v0.5.4)
| Feature | Description |
|---|---|
| Channel Context Memory | Per-channel context persistence. Save/load/update context keyed by channel ID (e.g. telegram-46291309). Stored in .memkraft/channels/{channel_id}.json. |
| Task Continuity Register | Task lifecycle tracking with full history. task_start → task_update → task_complete + task_history + task_list. Each update stores timestamp + status + note. Stored in .memkraft/tasks/{task_id}.json. |
| Agent Working Memory | Per-agent persistent context. agent_save / agent_load any working memory dict. Stored in .memkraft/agents/{agent_id}.json. |
agent_inject() |
The key feature. Merges agent working memory + channel context + task history into a single ready-to-inject prompt block. Use this to give sub-agents full situational awareness. |
from memkraft import MemKraft
mk = MemKraft("/path/to/memory")
# Save channel context
mk.channel_save("telegram-46291309", {
"summary": "DM with Simon",
"recent_tasks": ["vibekai deploy", "memkraft v0.5.4"],
"preferences": {"language": "ko"},
})
# Register a task
mk.task_start("deploy-001", "Deploy vibekai to production",
channel_id="telegram-46291309", agent="zeon")
mk.task_update("deploy-001", "active", "vercel build passed")
# Save agent working memory
mk.agent_save("zeon", {
"key_context": "Simon's AI assistant",
"active_tasks": ["deploy-001"],
"learned": ["always report completion", "no silence"],
})
# Inject merged context block into a sub-agent instruction
block = mk.agent_inject("zeon",
channel_id="telegram-46291309",
task_id="deploy-001")
print(block)
# ## Agent Working Memory
# - **key_context:** Simon's AI assistant
# - **active_tasks:** deploy-001
# ...
# ## Channel Context
# - **summary:** DM with Simon
# ...
# ## Task Context
# - **Task:** Deploy vibekai to production
# - **Status:** active
# - **History:**
# - [2026-04-15T...] active: vercel build passed
🤖 Autonomous Memory Management (v1.1.0)
"Memory should manage itself."
Memory tends to grow without limit — agents add entries but rarely clean up. MemKraft 1.1.0 solves this with a self-managing lifecycle.
The Problem
- Add-only pattern: agents append to MEMORY.md every session, never prune
- Silent maintenance failures: nightly cleanup crons fail without notice
- No lifecycle: every memory entry treated equally, forever
The Solution: flush → compact → digest
from memkraft import MemKraft
mk = MemKraft(base_dir="memory/")
# 1. Import existing MEMORY.md → structured MemKraft data
mk.flush("MEMORY.md")
# 2. Auto-archive old/low-priority items
result = mk.compact(max_chars=15000)
# → {"moved": 47, "freed_chars": 89400, ...}
# 3. Re-render MEMORY.md — always ≤ 15KB
mk.digest("MEMORY.md")
# → {"chars": 11700, "truncated": False}
# 4. Check memory health
health = mk.health()
# → {"status": "healthy", "total_chars": 11700, "recommendations": [...]}
Real-world result
Our MEMORY.md grew to 153KB (1,862 lines) over weeks of agent sessions.
After flush → compact → digest: 11.7KB (170 lines). 92% reduction.
Nightly self-cleanup recipe
# Watch for real-time sync
mk.watch("memory/", on_change="flush", interval=300)
# Or set a nightly schedule (requires: pip install memkraft[schedule])
mk.schedule([
lambda: mk.compact(max_chars=15000),
lambda: mk.digest("MEMORY.md"),
], cron_expr="0 23 * * *")
🐛 Debugging is Memory
Debugging insights are too valuable to lose in scrollback. MemKraft treats the entire debug process as first-class memory.
The debug-hypothesis loop - inspired by Shen Huang's scientific debugging method:
OBSERVE → HYPOTHESIZE → EXPERIMENT → CONCLUDE
↑ |
| rejected? |
+←── next hypothesis ←───+
|
all rejected? → back to OBSERVE
mk.start_debug("bug description")- begin a tracked sessionmk.log_hypothesis(bug_id, "theory", "evidence")- record each theorymk.log_evidence(bug_id, hyp_id, "test result", "supports|contradicts")- track proofmk.reject_hypothesis(bug_id, hyp_id, "reason")- mark failed approachesmk.confirm_hypothesis(bug_id, hyp_id)- lock in the root causemk.end_debug(bug_id, "resolution")- close session, feed back to memory
Why it matters: rejected hypotheses are permanent memory. Next time you hit a similar bug, MemKraft surfaces what you already tried - no more repeating the same failed approaches.
## 📚 API Reference
MemKraft(base_dir=None)
Initialize the memory system. If base_dir is not provided, uses $MEMKRAFT_DIR or ./memory.
from memkraft import MemKraft
mk = MemKraft("/path/to/memory")
Core Methods
| Method | Description |
|---|---|
init(path="") |
Create memory directory structure with all subdirectories and templates. |
track(name, entity_type="person", source="") |
Start tracking an entity. Creates a live-note in live-notes/. |
update(name, info, source="manual") |
Append new information to a tracked entity's timeline. |
brief(name, save=False, file_back=False) |
Generate a meeting brief for an entity. file_back=True records the brief generation in the entity timeline. |
promote(name, tier="core") |
Change memory tier: core / recall / archival. |
list_entities() |
List all tracked entities with their types. |
Extraction & Classification
| Method | Description |
|---|---|
extract_conversations(input_text, source="", dry_run=False, confidence="experimental", applicability="") |
Extract entities and facts from text. confidence: verified / experimental / hypothesis. applicability: "When: X | When NOT: Y". |
detect(text, source="", dry_run=False) |
Detect entities in text (EN/KR/CN/JP). |
cognify(dry_run=False, apply=False) |
Route inbox items to structured directories. Recommend-only by default. |
extract_facts_registry(text="") |
Extract numeric/date facts into cross-domain index. |
detect_conflicts(entity_name, new_fact, source="") |
Check for contradictory facts and tag with [CONFLICT]. |
resolve_conflicts(strategy="newest", dry_run=False) |
Resolve conflicts. Strategies: newest, confidence, keep-both, prompt. |
classify_memory_type(text) |
Classify text into one of 8 memory types. |
Search
| Method | Description |
|---|---|
search(query, fuzzy=False) |
Search memory files. Returns list of {file, score, context, line}. |
agentic_search(query, max_hops=2, json_output=False, context="", file_back=False) |
Multi-hop search with query decomposition, link traversal, and goal-weighted re-ranking. context enables Conway SMS reconstructive ranking. file_back enables the feedback loop. |
lookup(query, json_output=False, brain_first=False, full=False) |
Brain-first lookup: stop early on high-relevance hits unless full=True. |
query(query="", level=1, recent=0, tag="", date="") |
Progressive disclosure: L1=index, L2=sections, L3=full. |
links(name) |
Show backlinks to an entity ([[wiki-links]]). |
Maintenance
| Method | Description |
|---|---|
dream(date=None, dry_run=False, resolve_conflicts=False) |
Run Dream Cycle. 6 health checks + optional conflict resolution. |
health_check() |
Run 5 self-diagnostic assertions. Returns {pass_rate, health_score, assertions}. |
decay(days=90, dry_run=False) |
Flag stale facts. Type-aware: identity decays 10x slower than routine. |
dedup(dry_run=False) |
Find and merge duplicate facts. |
summarize(name=None, max_length=500) |
Auto-summarize bloated entity pages. |
diff() |
Show changes since last Dream Cycle. |
open_loops(dry_run=False) |
Find unresolved items (TODO/FIXME/pending). |
build_index() |
Build memory index at .memkraft/index.json. |
suggest_links() |
Suggest missing [[wiki-links]]. |
Logging
| Method | Description |
|---|---|
log_event(event, tags="", importance="normal", entity="", task="", decision="") |
Log a session event to JSONL. |
log_read(date=None) |
Read session events for a date. |
retro(dry_run=False) |
Generate daily retrospective (Well / Bad / Next). |
distill_decisions() |
Scan for decision candidates in events and notes. |
Debug Hypothesis Tracking
| Method | Description |
|---|---|
start_debug(bug_description) |
Start a new debug session. Returns {bug_id, file, status}. |
log_hypothesis(bug_id, hypothesis, evidence="", status="testing") |
Log a hypothesis. Auto-increments ID (H1, H2, ...). |
get_hypotheses(bug_id) |
Get all hypotheses for a debug session. |
reject_hypothesis(bug_id, hypothesis_id, reason="") |
Reject a hypothesis. Preserved permanently for future reference. |
confirm_hypothesis(bug_id, hypothesis_id) |
Confirm a hypothesis. Feeds back into memory. |
log_evidence(bug_id, hypothesis_id, evidence_text, result="neutral") |
Log evidence. Result: supports / contradicts / neutral. |
get_evidence(bug_id, hypothesis_id="") |
Get evidence entries, optionally filtered by hypothesis. |
end_debug(bug_id, resolution) |
End session with resolution. Auto-feeds to memory. |
get_debug_status(bug_id) |
Get current session status and hypothesis counts. |
debug_history(limit=10) |
List past debug sessions. |
search_debug_sessions(query) |
Search past sessions by description/hypothesis/resolution. |
search_rejected_hypotheses(query) |
Search rejected hypotheses — anti-pattern detector. |
Memory Snapshots & Time Travel
| Method | Description |
|---|---|
snapshot(label="", include_content=False) |
Create a point-in-time snapshot of all memory files. Returns {snapshot_id, timestamp, label, file_count, total_bytes, path}. |
snapshot_list() |
List all saved snapshots, newest first. |
snapshot_diff(snapshot_a, snapshot_b="") |
Compare two snapshots, or a snapshot vs live state. Returns {added, removed, modified, unchanged_count}. |
time_travel(query, snapshot_id="", date="") |
Search memory as it was at a past snapshot. Supports search by snapshot ID or date. |
snapshot_entity(name) |
Track how a specific entity evolved across all snapshots (new/modified/unchanged/deleted). |
⌨️ CLI Reference
memkraft <command> [options]
Commands
| Command | Description |
|---|---|
init [--path DIR] |
Initialize memory structure |
extract TEXT [--source S] [--dry-run] [--confidence C] [--when W] [--when-not W] |
Auto-extract entities and facts |
detect TEXT [--source S] [--dry-run] |
Detect entities in text (EN/KR/CN/JP) |
track NAME [--type T] [--source S] |
Start tracking an entity |
update NAME --info INFO [--source S] |
Update a tracked entity |
list |
List all tracked entities |
brief NAME [--save] [--file-back] |
Generate meeting brief |
promote NAME [--tier T] |
Change memory tier (core/recall/archival) |
search QUERY [--fuzzy] [--file-back] |
Search memory files |
agentic-search QUERY [--max-hops N] [--json] [--context C] [--file-back] |
Multi-hop agentic search |
lookup QUERY [--json] [--brain-first] [--full] |
Brain-first lookup |
query [QUERY] [--level 1|2|3] [--recent N] [--tag T] [--date D] |
Progressive disclosure query |
links NAME |
Show backlinks to an entity |
cognify [--dry-run] [--apply] |
Process inbox into structured pages |
log --event E [--tags T] [--importance I] [--entity E] [--task T] [--decision D] |
Log session event |
log --read [--date D] |
Read session events |
retro [--dry-run] |
Daily retrospective |
distill-decisions |
Scan for decision candidates |
health-check |
Run 5 self-diagnostic assertions → health score |
dream [--date D] [--dry-run] [--resolve-conflicts] |
Run Dream Cycle (nightly maintenance) |
resolve-conflicts [--strategy S] [--dry-run] |
Resolve fact conflicts |
decay [--days N] [--dry-run] |
Flag stale facts |
dedup [--dry-run] |
Find and merge duplicates |
summarize [NAME] [--max-length N] |
Auto-summarize bloated pages |
diff |
Show changes since last Dream Cycle |
open-loops [--dry-run] |
Find unresolved items |
index |
Build memory index |
suggest-links |
Suggest missing wiki-links |
extract-facts [TEXT] |
Extract numeric/date facts |
debug start DESC |
Start a new debug session (OBSERVE) |
debug hypothesis TEXT [--bug-id ID] [--evidence E] |
Log a hypothesis (HYPOTHESIZE) |
debug evidence TEXT [--bug-id ID] [--hypothesis-id H] [--result R] |
Log evidence (supports/contradicts/neutral) |
debug reject [--bug-id ID] [--hypothesis-id H] [--reason R] |
Reject current hypothesis |
debug confirm [--bug-id ID] [--hypothesis-id H] |
Confirm current hypothesis |
debug status [--bug-id ID] |
Show debug session status |
debug history [--limit N] |
List past debug sessions |
debug end RESOLUTION [--bug-id ID] |
End debug session (CONCLUDE) |
debug search QUERY |
Search past debug sessions |
debug search-rejected QUERY |
Search rejected hypotheses (anti-patterns) |
snapshot [--label L] [--include-content] |
Create a point-in-time memory snapshot |
snapshot-list |
List all saved snapshots (newest first) |
snapshot-diff SNAP_A [SNAP_B] |
Compare two snapshots or snapshot vs live state |
time-travel QUERY [--snapshot ID] [--date YYYY-MM-DD] |
Search memory as it was at a past snapshot |
snapshot-entity NAME |
Show how an entity evolved across snapshots |
selfupdate [--dry-run] |
Self-upgrade MemKraft via pip when newer version on PyPI |
doctor [--check-updates] |
Health check; with --check-updates also reports PyPI version status |
## 🏗️ Architecture
Raw Input ──▶ Extract ──▶ Classify ──▶ Forge ──▶ Compound Knowledge
▲ │ │
│ Confidence │
│ Applicability │
│ │
└──── Feedback Loop ◄── Brain-first recall ◄───────┘
maintained by Dream Cycle + Health Check
How It Works
Zero dependencies. Built entirely from Python stdlib: re for NER, difflib for fuzzy search, json for structured data, pathlib for file ops. No vector DB, no LLM calls at runtime, no framework lock-in.
Compiled Truth + Timeline. Every entity has two layers: a mutable Compiled Truth (current state) and an append-only Timeline with [Source:] tags. You get both "what we know now" and "how we got here."
Auto-Extract pipeline. Multi-stage NER: English Title Case → Korean particle stripping → Chinese surname detection (100 surnames) → Japanese surname detection (85 surnames) → fact extraction (X is/was/leads Y) → stopword filtering (806 KR/CN/JP stopwords).
Goal-weighted re-ranking (Conway SMS). agentic_search("X", context="meeting prep") and agentic_search("X", context="investment analysis") return different rankings from the same data. Memory type, confidence, and applicability conditions all factor into scoring.
Feedback loop. --file-back files search results back into entity timelines. Each query makes future queries richer - compound interest for memory.
Health Check. 5 assertions: (1) source attribution, (2) no orphan facts, (3) no duplicates, (4) inbox freshness, (5) no unresolved conflicts. Returns a pass rate and letter grade (A/B/C/D).
Memory Directory Structure
memory/
├── .memkraft/ # Internal state (index.json, timestamps)
├── sessions/ # Structured event logs (YYYY-MM-DD.jsonl)
├── RESOLVER.md # Classification decision tree (MECE)
├── TEMPLATES.md # Page templates with tier labels
├── CONFLICTS.md # Auto-generated conflict report
├── open-loops.md # Unresolved items hub
├── fact-registry.md # Cross-domain numeric/date facts
├── YYYY-MM-DD.md # Daily notes
├── entities/ # People, companies, concepts (Tier: recall)
├── live-notes/ # Persistent tracking targets (Tier: core)
├── decisions/ # Decision records with rationale
├── originals/ # Captured verbatim - no paraphrasing
├── inbox/ # Quick capture before classification
├── tasks/ # Work-in-progress context
├── meetings/ # Briefs and notes
└── debug/ # Debug sessions (DEBUG-YYYYMMDD-HHMMSS.md)
⚖️ Comparison
| MemKraft | Mem0 | Letta | |
|---|---|---|---|
| Storage | Plain Markdown | Vector + Graph DB | DB-backed |
| Dependencies | Zero | Vector DB + API | DB + runtime |
| Offline / git-friendly | ✅ | ❌ | ❌ |
| Auto-extract (EN/KR/CN/JP) | ✅ | ✅ (LLM) | - |
| Agentic search | ✅ | - | - |
| Goal-weighted re-ranking | ✅ | - | - |
| Feedback loop | ✅ | - | - |
| Confidence levels | ✅ | - | - |
| Health check | ✅ | - | - |
| Conflict detection & resolution | ✅ | - | - |
| Source attribution | Required | - | - |
| Dream Cycle | ✅ | - | - |
| Memory tiers | ✅ | - | ✅ |
| Type-aware decay | ✅ | - | - |
| Debug hypothesis tracking | ✅ | - | - |
| Memory snapshots & time travel | ✅ | ❌ | ❌ |
| Entity evolution timeline | ✅ | ❌ | ❌ |
| Snapshot diff | ✅ | ❌ | ❌ |
| Semantic search | ❌ | ✅ | - |
| Graph memory | ❌ | ✅ | - |
| Self-editing memory | ❌ | - | ✅ |
| Cost | Free | Free tier + paid | Free |
Choose MemKraft when: you want portable, git-friendly, zero-dependency memory that works with any agent framework, offline, forever.
Choose something else when: you need semantic/vector search, graph traversal, or a full agent runtime with virtual context management.
🏆 Reproducing LongMemEval Results
MemKraft achieves 98.0% on LongMemEval (LLM-as-judge, oracle subset, 3-run semantic majority vote). Single-run performance: 96–98% (non-deterministic at inference level — sampling, not memory).
Score measured on v1.0.2; v2.6.0 is regression-free with 1168 tests passing and is API-compatible with the benchmark harness.
Comparison vs prior SOTA:
- MemKraft (v1.0.2 measurement) — 98.0% (LLM-judge, oracle 50, 3-run majority)
- MemPalace — 96.6%
- MEMENTO/MS — 90.8%
Setup
git clone https://github.com/seojoonkim/memkraft
cd memkraft
pip install -e ".[bench]"
Run
cd benchmarks/longmemeval
# Single run (96% typical)
MODEL="claude-sonnet-4-6" \
ANTHROPIC_API_KEY="your-key" \
TAG="myrun" \
python3 run.py 50 oracle
# LLM-as-judge scoring
MODEL="claude-sonnet-4-6" \
ANTHROPIC_API_KEY="your-key" \
python3 llm_judge.py
# 3-run majority vote (98% typical)
MODEL="claude-sonnet-4-6" \
ANTHROPIC_API_KEY="your-key" \
python3 run_majority_vote.py
Notes
- Dataset: LongMemEval oracle subset (50 questions)
- Judge: LLM-as-judge (claude-sonnet-4-6) — semantic matching, not string match
- 98% = 3-run semantic majority vote result
- Single run: 96~100% depending on inference sampling
- Reproducibility note: Variance comes from LLM inference sampling, not from MemKraft itself. Memory storage and retrieval are deterministic.
🆙 Staying Up To Date
MemKraft ships an opt-in self-upgrade flow so agents (and humans) never silently drift behind PyPI:
memkraft doctor --check-updates # 🟢 up to date / 🟡 update available / 🔴 PyPI unreachable
memkraft selfupdate # pip install -U memkraft when newer
memkraft selfupdate --dry-run # check only
Classic still works:
pip install -U memkraft
For agents: add memkraft doctor --check-updates to your weekly skill or heartbeat — if it reports 🟡, ask the human before running memkraft selfupdate. Never auto-upgrade without explicit consent.
For maintainers: pushing a vX.Y.Z git tag triggers .github/workflows/release.yml, which builds, verifies (twine check), publishes to PyPI, and cuts a GitHub Release. Requires a PYPI_API_TOKEN repo secret — add it at Settings → Secrets and variables → Actions.
## 📝 Changelog
Highlights from recent releases. Full history: CHANGELOG.md.
v2.6.0 (current)
auto_tier— recommendcore/recall/archivalfrom(recency, frequency, importance);dry_run=Trueby default.fact_typeonfact_add—episodic/semantic/proceduraltaxonomy with silent contradiction detection.- 1-hop graph neighbor expansion for counting-style queries (
how many,list all). - 1168 tests passing; zero breaking changes from 2.5.x.
v2.5.0
- Hybrid search upgraded to exact + IDF + fuzzy + BM25.
- Smarter ranking on multi-token queries; better recall for short answers.
v2.4.0
- Cross-entity link graph hardened; faster
link_scan, more reliable backlinks index. - Performance improvements on large memory directories.
v2.3.x
- Watchdog-based
memkraft watchstability fixes. - Doctor health hints; richer
--check-updatesoutput.
v2.0.0
- Major API consolidation around the register → tune → recall → decide loop.
- Bitemporal facts, tier labels, reversible decay, link graph become first-class.
- Zero breaking changes from 0.9.x — see MIGRATION.md.
v1.1.0 — Autonomous Memory Management
flush → compact → digest self-managing lifecycle. See 🤖 Autonomous Memory Management above for details.
v1.0.0 — Self-Improvement Loop
prompt_register / prompt_eval / prompt_evidence / convergence_check make tuning a first-class, auditable workflow.
Earlier releases (v0.x — one-line summaries)
- v0.8.1 (2026-04-17) —
agents-hintCLI,examples/,python -m memkraft.mcp,memkraft watch,memkraft doctor. 515 tests. - v0.8.0 (2026-04-17) — Bitemporal Fact Layer + Memory Tier Labels + Reversible Decay/Tombstone + Cross-Entity Link Graph. 492 tests.
- v0.7.0 (2026-04-15) — multi-agent:
channel_updatemodes, task delegation,agent_handoff, channel task listing, task cleanup. 409 tests. - v0.5.4 (2026-04-15) — Channel Context Memory + Task Continuity Register + Agent Working Memory +
agent_inject(). 377 tests. - v0.5.1 (2026-04-14) — Memory Snapshots & Time Travel:
snapshot/snapshot_list/snapshot_diff/time_travel/snapshot_entity. 328 tests. - v0.4.1 (2026-04-13) — README: Debugging is Memory section + Appendix (Inspirations & Credits).
- v0.4.0 (2026-04-13) — Debug Hypothesis Tracking: full OBSERVE→HYPOTHESIZE→EXPERIMENT→CONCLUDE loop, 2-fail auto-switch warning,
search_rejected_hypotheses(). 277 tests. - v0.3.0 (2026-04-13) — Query-to-Memory Feedback Loop (
--file-back), Confidence Levels, Memory Health Assertions, Applicability Conditions. 198 tests. - v0.2.0 (2026-04-12) — Goal-Weighted Reconstructive Memory (Conway SMS), Dialectic Synthesis, Memory Type Classification (8 types), Type-Aware Decay. 158 tests.
- v0.1.0 (2026-04-12) — Initial release: extract, detect, decay, dedup, summarize, agentic search, entity tracking, Dream Cycle, hybrid search. Zero dependencies.
Full details for every release: CHANGELOG.md.
🤝 Contributing
PRs welcome. See CONTRIBUTING.md.
📄 License
MIT - use it however you want.
🙏 Appendix: Inspirations & Credits
MemKraft stands on the shoulders of giants. These projects and ideas shaped our approach:
| Project | Inspiration | Link |
|---|---|---|
| Karpathy auto-research | Evidence-based autonomous research methodology | Tweet |
| Shen Huang debug-hypothesis | Scientific debugging: hypothesis-driven, max 5-line experiments | GitHub · Tweet |
| Letta (MemGPT) | Tiered memory architecture (core / archival / recall) | GitHub |
| mem0 | Agent memory extraction and retrieval patterns | GitHub |
| Zep | Temporal memory decay and entity extraction | GitHub |
| MemoryWeaver | Dialectic synthesis and memory reconstruction | GitHub |
| Shubham Saboo's 6-agent system | OpenClaw-based multi-agent + SOUL.md / MEMORY.md pattern | Article |
| Karpathy llm-wiki | Wiki-style structured knowledge for LLMs | Tweet |
"If I have seen further, it is by standing on the shoulders of giants."
Thank you to all these creators for sharing their work openly. MemKraft exists because of you.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file memkraft-2.7.0.tar.gz.
File metadata
- Download URL: memkraft-2.7.0.tar.gz
- Upload date:
- Size: 435.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65778feed22680432da40c06fa6e7bff48dd8ce92c621a0783a4d3e6f6be87b5
|
|
| MD5 |
1fe486396ff8ec6cf7184396687a8fe5
|
|
| BLAKE2b-256 |
d2d82012b50b26ce68a479dbda201da3de66319dd042c6b79fe32c8771926578
|
File details
Details for the file memkraft-2.7.0-py3-none-any.whl.
File metadata
- Download URL: memkraft-2.7.0-py3-none-any.whl
- Upload date:
- Size: 268.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8db023b6ed5cac039fa2f052e5d248ff303a7ae475aedb9e6eec07603fa046c
|
|
| MD5 |
76a6e027615f3c8c00416159ed7193b4
|
|
| BLAKE2b-256 |
280dc9cfa7a6dd50a10f953646f399c99e40b3cb2c312baa789aae4a90e40c9a
|