Agent-loop layer for persistent memory, built on top of vstash.

These details have not been verified by PyPI

Project links

Project description

merken

Agent-loop layer for persistent memory, built on top of vstash.

Status: v0.1.0 on PyPI, local-first, 171 tests green. Four decision primitives, four deployment surfaces (SDK, CLI, MCP server, Claude Code hooks), five loop-quality scenarios.

In one paragraph

vstash is a glass-box retrieval substrate — SQLite + sqlite-vec + FTS5 + reciprocal rank fusion, with observability and explicit limits. merken is the loop on top: when to write a memory, when to recall, when to distill raw events into semantic facts, when to tombstone the ones that are redundant. vstash stores and searches; merken reasons about what is worth storing and searching. Every decision is logged to an audit collection you can query.

       ┌──────────────────────────────────────────┐
       │  your agent (Claude Code, your code, …)  │
       └────────────────────┬─────────────────────┘
                            │ remember / recall / consolidate / forget
                            ▼
       ┌──────────────────────────────────────────┐
       │                   merken                  │
       │                                           │
       │  ┌────────────┐  ┌─────────────────────┐ │
       │  │ Decision   │  │ Memory              │ │
       │  │ primitives │◄─┤  .remember()        │ │
       │  │            │  │  .recall()          │ │
       │  │ should_    │  │  .consolidate()     │ │
       │  │  remember  │  │  .forget()          │ │
       │  │  recall    │  │  .audit()           │ │
       │  │  consoli-  │  │  .tombstones()      │ │
       │  │  date      │  │                     │ │
       │  │  forget    │  │  Context manager    │ │
       │  └──────┬─────┘  └─────────┬───────────┘ │
       │         │                   │             │
       │         └─────────┬─────────┘             │
       │                   ▼                       │
       │          merken_audit collection          │
       │          merken_tombstones collection     │
       └────────────────────┬─────────────────────┘
                            │ every storage + search call
                            ▼
       ┌──────────────────────────────────────────┐
       │  vstash (substrate — glass box)           │
       │  sqlite-vec + FTS5 + RRF + MMR dedup     │
       │  metrics, limits, integrity, contracts    │
       └──────────────────────────────────────────┘

The three deployment surfaces

merken ships as one library with three ways to call it. They all wrap the same Memory class.

1. Python SDK

from merken import Memory, ForgetConsolidated, PeriodicConsolidator

with Memory(
    project="my_agent",
    consolidate_decider=PeriodicConsolidator(min_events=10),
    forget_decider=ForgetConsolidated(),
) as mem:
    mem.remember("the user switched to Postgres on 2026-04-08")
    mem.remember("the analytics warehouse now runs on Postgres 16")

    result = mem.consolidate()
    print(f"{result.facts_written} fact(s) from {result.events_examined} events")

    hits = mem.recall("what database does the analytics warehouse use?")
    for h in hits[:3]:
        print(f"  • {h.text}")

    forget = mem.forget()
    print(f"{len(forget.tombstoned)} events tombstoned")

Deeper SDK docs: docs/primitives.md, docs/extending.md.

2. CLI (after `pip install -e .`)

merken remember "the user switched to Postgres on 2026-04-08"
merken recall "what database does the analytics warehouse use?"
merken consolidate
merken forget --decider consolidated
merken audit should_remember
merken tombstones
merken status
merken stats

Human output by default; pass --json anywhere for pipeable output. Every command accepts --project NAME (or ENGRAM_PROJECT env var) and --db PATH (default ~/.merken/<project>.db, deliberately separate from ~/.vstash/memory.db).

Deeper CLI reference: docs/cli.md.

3. MCP server — use merken from Claude Code

# attach merken to Claude Code as an MCP server
claude mcp add merken -- merken-mcp

Then, inside any Claude Code session:

"Claude, remember that we switched to Postgres on April 8th, 2026." "Claude, what did we decide about the analytics warehouse database?" "Claude, consolidate what we've discussed."

Eight tools, one per CLI command: merken_remember, merken_recall, merken_consolidate, merken_forget, merken_audit, merken_tombstones, merken_status, merken_stats. Config via environment: ENGRAM_PROJECT, ENGRAM_DB.

Deeper MCP reference: docs/mcp-server.md.

The four decision primitives

Every memory system eventually has to answer four questions. merken makes each one an explicit decision with inputs, outputs, and an audit row.

Primitive	What it decides	Default implementation
`should_remember`	Does this event merit a write?	`HeuristicWriteDecider` — skip empty / too short / too long / exact duplicate via an in-process set that hydrates lazily from vstash
`should_consolidate`	Is it time to distill episodic events into semantic facts?	`PeriodicConsolidator` — fires when ≥ `min_events` unconsolidated events accumulate
`should_recall`	Which layers to query and with what budget?	`LayeredRecaller` — semantic first, episodic fallback, round-robin interleave with dedup by path. Optional temporal reranking via `temporal_weight`
`should_forget`	Is this event safe to tombstone?	`NeverForget` — safe default, only forgets on `force=True` or with `ForgetConsolidated` opt-in

Full depth: docs/primitives.md.

Every decision writes a row to the merken_audit collection — you can always query why something was kept or dropped:

merken audit should_remember
merken audit dup_exact

Memory layers

episodic — raw events, high volume, low information density
semantic — consolidated facts derived from episodic, with derived_from provenance pointers to the source events
audit — every decision the loop made, queryable via mem.audit() / merken audit
tombstones — forgotten events with full text preserved for unforgetting, queryable via mem.tombstones() / merken tombstones

Deeper: docs/architecture.md.

Temporal reranking

When facts evolve over time (e.g., "we switched from Redis to Caffeine"), queries should prefer the latest version. Enable with temporal_weight:

mem = Memory(project="my_agent", temporal_weight=0.2)
hits = mem.recall("what caching solution are we using?")
# → Caffeine (newest) ranks above Redis (oldest)

Formula: reranked_score = score × (1 + weight × recency_fraction). Multiplicative, bounded, default off (0.0). See docs/primitives.md for details.

Quick start

# Clone and install
git clone https://github.com/stffns/merken && cd merken
pip install -e .

# Run the full test suite (~10s)
python3 -m pytest tests/ -q

# Try the CLI
merken remember "the user asked about postgres on 2026-04-08"
merken recall "postgres"
merken status

# Attach to Claude Code
claude mcp add merken -- merken-mcp

Tests and scenarios

171 tests across four decision primitives, three deployment surfaces, and five loop_quality scenarios.
Loop-quality scenarios live in experiments/loop_quality/ and enforce that every decider change is validated against at least one real-content fixture before landing:
1. analytics_project — synthetic control, 100%/100%/100%
2. session_2026_04_09 — synthetic borderline, 100%/100%/33%
3. jay_vstash_2026_04_09_snapshot — real organic content from a live vstash, 100%/100%/80%
Public retrieval benchmarks live in experiments/retrieval/ for absolute positioning against published competitor claims. LongMemEval runner is implemented; full n=500 overnight run is Phase A of the roadmap.

Measurement doctrine: experiments/BENCHMARK_STRATEGY.md.

Repository layout

merken/
├── README.md                    ← you are here
├── CONSTITUTION.md              ← principles, non-negotiables
├── CLAUDE.md                    ← session entry-point for Claude sessions
├── pyproject.toml               ← package config, [project.scripts]
├── merken/                      ← the package itself
│   ├── __init__.py              ← public surface
│   ├── memory.py                ← Memory class, the glue
│   ├── consolidation.py         ← Fact, clustering, consolidate pipeline
│   ├── audit.py                 ← audit + tombstone row formats
│   ├── reranking.py             ← temporal reranking (post-retrieval)
│   ├── cli.py                   ← merken CLI entry point
│   ├── mcp_server.py            ← merken-mcp MCP server entry point
│   └── policies/
│       ├── should_remember.py
│       ├── should_recall.py
│       ├── should_consolidate.py
│       ├── should_forget.py
│       └── types.py             ← shared Event / Decision / Protocol
├── docs/                        ← user-facing documentation
│   ├── architecture.md          ← the memory model and the loop in depth
│   ├── primitives.md            ← each decision primitive's semantics
│   ├── cli.md                   ← every CLI command with examples
│   ├── mcp-server.md            ← MCP tools reference + Claude Code setup
│   └── extending.md             ← write your own decider
├── experiments/                 ← the empirical bar (CONSTITUTION §9)
│   ├── BENCHMARK_STRATEGY.md    ← measurement doctrine
│   ├── loop_quality/            ← merken's design bar (scenario runner)
│   │   ├── runner.py
│   │   ├── scenario.py
│   │   ├── RESULTS.md
│   │   └── scenarios/*.json
│   └── retrieval/               ← absolute positioning (public benches)
│       └── longmemeval/
│           ├── runner.py
│           ├── dataset.py
│           └── RESULTS.md
├── tests/                       ← pytest suite, every primitive + surface
└── notes/                       ← working notes, prior art, research
    ├── prior-art.md             ← mempalace retrospective
    ├── silt.md                  ← memorial + rule derivations
    ├── research-2026-04-09.md   ← 6 papers verified + findings
    └── vstash-issue-*.md        ← upstream issue drafts

Configuration and defaults

Knob	Default	Where to change
Project name	`"default"` (or `$ENGRAM_PROJECT`)	`Memory(project=...)` / `--project` / env
DB path	`~/.merken/<project>.db`	`Memory(db=...)` / `--db` / `$ENGRAM_DB`
Collection	`"default"`	`Memory(collection=...)`
Embedding model (consolidation)	read from vstash `store_meta` at runtime; fallback to `vstash.config.EmbeddingsConfig().model`	set on the vstash side
Consolidation method	`"embedding_v1"`	`mem.consolidate(method=...)`
Embedding threshold	`0.70` (complete linkage)	`mem.consolidate(embedding_threshold=...)`
Clustering linkage	`"complete"`	`mem.consolidate(embedding_linkage=...)`
`should_remember` decider	`HeuristicWriteDecider()`	`Memory(write_decider=...)`
`should_recall` decider	`LayeredRecaller()` (sem 5, epi 3)	`Memory(recall_decider=...)`
`should_consolidate` decider	`PeriodicConsolidator(min_events=10)`	`Memory(consolidate_decider=...)`
`should_forget` decider	`NeverForget()` (safe)	`Memory(forget_decider=...)`
Temporal reranking	`0.0` (off)	`Memory(temporal_weight=...)` or `mem.recall(temporal_weight=...)`

Deliberately isolated by default: merken's default DB is NOT your ~/.vstash/memory.db. It lives under ~/.merken/<project>.db so a buggy decider can't corrupt your main vstash store. To attach merken to a live vstash, point at it explicitly via --db / $ENGRAM_DB.

Multilingual corpora

If your content is bilingual or multilingual (e.g. mixed Spanish/ English), configure vstash to use a multilingual embedder before ingesting:

# vstash.toml
[embeddings]
model = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"

This is validated as safe, not speculative. On a Spanish/English probe scenario (2026-04-14), the multilingual model produced a clean signal/noise gap — same-topic cross-lingual pairs clustered at [0.754, 0.832] while cross-topic noise capped at 0.501. The default BAAI/bge-small-en-v1.5 left signal and noise overlapping on that same data.

We also ran a LongMemEval no-regression check (2026-04-14, n=100, Colab CPU): paraphrase-multilingual hit R@5=0.980 [0.950, 1.000], with the CI overlapping the bge baseline R@5=0.964 [0.948, 0.980] from the n=500 full run. No regression on English-only workloads.

If you swap the embedder, consider lowering embedding_threshold to 0.55–0.60 to exploit the wider gap. See experiments/loop_quality/RESULTS_multilingual.md for the full distribution and reasoning, and experiments/retrieval/longmemeval/RESULTS.md for the LongMemEval row.

Design non-negotiables

From CONSTITUTION.md, enforced in CLAUDE.md:

Local-first. No mandatory network calls. Default install runs offline against a local embedder and a local vstash.
Glass box. Every decision writes to the audit collection.
Single process by default. No daemons, no queues, no Redis.
vstash is a hard dependency. merken never reimplements retrieval or reaches into vstash._private.
Empirical first. Every default-policy change cites a benchmark in experiments/.
Silt's rule. Before proposing an algorithm, look at the distribution of the data. See notes/silt.md.

Development status

merken 0.1.0 is on PyPI. The four decision primitives are implemented and tested, four deployment surfaces are working (SDK, CLI, MCP server, Claude Code hooks), and the loop-quality safety net covers five scenarios. LongMemEval Phase A is complete (R@5 = 0.964 on n=500).

What's deliberately not here

Knowledge graph. CONSTITUTION §5 keeps this optional and gated.
LLM-based consolidation in the hot path. Gated on a scenario where the non-LLM loop leaves real value on the table.
Bespoke compression dialect. See notes/prior-art.md for why.
Spatial vocabulary (wings/rooms/etc.). Use vstash's existing project / collection / layer / tags fields.
MCP tool sprawl. Eight tools, one per primitive. No more.
Public leaderboard. Premature at pre-v0.1.

What's coming

Claude Code hooks hardening: error handling, threshold tuning, integration tests for the hook scripts.
Additional loop_quality/ scenarios from real work: perf migration notes, MedLocal hackathon logs, Kafka meeting threads, daily reviews.
LoCoMo runner under experiments/retrieval/locomo/ (Phase B of experiments/BENCHMARK_STRATEGY.md).
LMEB episodic/semantic/procedural evaluation (20 sub-datasets).
Multilingual calibration (Spanish/English mixed content).

License

MIT.

Read CONSTITUTION.md for why merken exists. Read CLAUDE.md for how to work in the repo. Read docs/ for how to use merken.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Apr 15, 2026

0.1.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merken-0.2.0.tar.gz (203.9 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

merken-0.2.0-py3-none-any.whl (49.7 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file merken-0.2.0.tar.gz.

File metadata

Download URL: merken-0.2.0.tar.gz
Upload date: Apr 15, 2026
Size: 203.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for merken-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`f9e5d8f8284c29eb926abb83f559116b93cbf01bc81709b18c50ce482fce3e09`
MD5	`7bc7154110412b5b4418e4cb5f3905e3`
BLAKE2b-256	`167a50eee38efb2f3c16e71784755abbc78f5865614586b16f016bfaa6f8c0a0`

See more details on using hashes here.

File details

Details for the file merken-0.2.0-py3-none-any.whl.

File metadata

Download URL: merken-0.2.0-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 49.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for merken-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`47cbd8c0536c84c6af18dc3680fd612b975d53960d4a4b18c86c533ea03f4078`
MD5	`0dc654961a60dc1d24085873e02d80d2`
BLAKE2b-256	`e500b27e85024ea2d7aa58d18b11e042a053a7e6327dedcad5c8cef63e84a308`

See more details on using hashes here.

merken 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

merken

In one paragraph

The three deployment surfaces

1. Python SDK

2. CLI (after pip install -e .)

3. MCP server — use merken from Claude Code

The four decision primitives

Memory layers

Temporal reranking

Quick start

Tests and scenarios

Repository layout

Configuration and defaults

Multilingual corpora

Design non-negotiables

Development status

What's deliberately not here

What's coming

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

2. CLI (after `pip install -e .`)