Agent-loop layer for persistent memory, built on top of vstash.

These details have not been verified by PyPI

Project links

Project description

merken

Agent-loop layer for persistent memory, built on top of vstash.

Status: pre-v0.1, local-first, 154 tests green. Four decision primitives, three deployment surfaces, three loop-quality scenarios all at 100% pass rate and 100% cluster purity.

In one paragraph

vstash is a glass-box retrieval substrate — SQLite + sqlite-vec + FTS5 + reciprocal rank fusion, with observability and explicit limits. merken is the loop on top: when to write a memory, when to recall, when to distill raw events into semantic facts, when to tombstone the ones that are redundant. vstash stores and searches; merken reasons about what is worth storing and searching. Every decision is logged to an audit collection you can query.

       ┌──────────────────────────────────────────┐
       │  your agent (Claude Code, your code, …)  │
       └────────────────────┬─────────────────────┘
                            │ remember / recall / consolidate / forget
                            ▼
       ┌──────────────────────────────────────────┐
       │                   merken                  │
       │                                           │
       │  ┌────────────┐  ┌─────────────────────┐ │
       │  │ Decision   │  │ Memory              │ │
       │  │ primitives │◄─┤  .remember()        │ │
       │  │            │  │  .recall()          │ │
       │  │ should_    │  │  .consolidate()     │ │
       │  │  remember  │  │  .forget()          │ │
       │  │  recall    │  │  .audit()           │ │
       │  │  consoli-  │  │  .tombstones()      │ │
       │  │  date      │  │                     │ │
       │  │  forget    │  │  Context manager    │ │
       │  └──────┬─────┘  └─────────┬───────────┘ │
       │         │                   │             │
       │         └─────────┬─────────┘             │
       │                   ▼                       │
       │          merken_audit collection          │
       │          merken_tombstones collection     │
       └────────────────────┬─────────────────────┘
                            │ every storage + search call
                            ▼
       ┌──────────────────────────────────────────┐
       │  vstash (substrate — glass box)           │
       │  sqlite-vec + FTS5 + RRF + MMR dedup     │
       │  metrics, limits, integrity, contracts    │
       └──────────────────────────────────────────┘

The three deployment surfaces

merken ships as one library with three ways to call it. They all wrap the same Memory class.

1. Python SDK

from merken import Memory, ForgetConsolidated, PeriodicConsolidator

with Memory(
    project="my_agent",
    consolidate_decider=PeriodicConsolidator(min_events=10),
    forget_decider=ForgetConsolidated(),
) as mem:
    mem.remember("the user switched to Postgres on 2026-04-08")
    mem.remember("the analytics warehouse now runs on Postgres 16")

    result = mem.consolidate()
    print(f"{result.facts_written} fact(s) from {result.events_examined} events")

    hits = mem.recall("what database does the analytics warehouse use?")
    for h in hits[:3]:
        print(f"  • {h.text}")

    forget = mem.forget()
    print(f"{len(forget.tombstoned)} events tombstoned")

Deeper SDK docs: docs/primitives.md, docs/extending.md.

2. CLI (after `pip install -e .`)

merken remember "the user switched to Postgres on 2026-04-08"
merken recall "what database does the analytics warehouse use?"
merken consolidate
merken forget --decider consolidated
merken audit should_remember
merken tombstones
merken status
merken stats

Human output by default; pass --json anywhere for pipeable output. Every command accepts --project NAME (or ENGRAM_PROJECT env var) and --db PATH (default ~/.merken/<project>.db, deliberately separate from ~/.vstash/memory.db).

Deeper CLI reference: docs/cli.md.

3. MCP server — use merken from Claude Code

# attach merken to Claude Code as an MCP server
claude mcp add merken -- merken-mcp

Then, inside any Claude Code session:

"Claude, remember that we switched to Postgres on April 8th, 2026." "Claude, what did we decide about the analytics warehouse database?" "Claude, consolidate what we've discussed."

Eight tools, one per CLI command: merken_remember, merken_recall, merken_consolidate, merken_forget, merken_audit, merken_tombstones, merken_status, merken_stats. Config via environment: ENGRAM_PROJECT, ENGRAM_DB.

Deeper MCP reference: docs/mcp-server.md.

The four decision primitives

Every memory system eventually has to answer four questions. merken makes each one an explicit decision with inputs, outputs, and an audit row.

Primitive	What it decides	Default implementation
`should_remember`	Does this event merit a write?	`HeuristicWriteDecider` — skip empty / too short / too long / exact duplicate via an in-process set that hydrates lazily from vstash
`should_consolidate`	Is it time to distill episodic events into semantic facts?	`PeriodicConsolidator` — fires when ≥ `min_events` unconsolidated events accumulate
`should_recall`	Which layers to query and with what budget?	`LayeredRecaller` — semantic first, episodic fallback, round-robin interleave with dedup by path
`should_forget`	Is this event safe to tombstone?	`NeverForget` — safe default, only forgets on `force=True` or with `ForgetConsolidated` opt-in

Full depth: docs/primitives.md.

Every decision writes a row to the merken_audit collection — you can always query why something was kept or dropped:

merken audit should_remember
merken audit dup_exact

Memory layers

episodic — raw events, high volume, low information density
semantic — consolidated facts derived from episodic, with derived_from provenance pointers to the source events
audit — every decision the loop made, queryable via mem.audit() / merken audit
tombstones — forgotten events with full text preserved for unforgetting, queryable via mem.tombstones() / merken tombstones

Deeper: docs/architecture.md.

Quick start

# Clone and install
git clone https://github.com/stffns/merken && cd merken
pip install -e .

# Run the full test suite (~10s)
python3 -m pytest tests/ -q

# Try the CLI
merken remember "the user asked about postgres on 2026-04-08"
merken recall "postgres"
merken status

# Attach to Claude Code
claude mcp add merken -- merken-mcp

Tests and scenarios

154 tests across four decision primitives, three deployment surfaces, and three loop_quality scenarios.
Loop-quality scenarios live in experiments/loop_quality/ and enforce that every decider change is validated against at least one real-content fixture before landing:
1. analytics_project — synthetic control, 100%/100%/100%
2. session_2026_04_09 — synthetic borderline, 100%/100%/33%
3. jay_vstash_2026_04_09_snapshot — real organic content from a live vstash, 100%/100%/80%
Public retrieval benchmarks live in experiments/retrieval/ for absolute positioning against published competitor claims. LongMemEval runner is implemented; full n=500 overnight run is Phase A of the roadmap.

Measurement doctrine: experiments/BENCHMARK_STRATEGY.md.

Repository layout

merken/
├── README.md                    ← you are here
├── CONSTITUTION.md              ← principles, non-negotiables
├── CLAUDE.md                    ← session entry-point for Claude sessions
├── pyproject.toml               ← package config, [project.scripts]
├── merken/                      ← the package itself
│   ├── __init__.py              ← public surface
│   ├── memory.py                ← Memory class, the glue
│   ├── consolidation.py         ← Fact, clustering, consolidate pipeline
│   ├── audit.py                 ← audit + tombstone row formats
│   ├── cli.py                   ← merken CLI entry point
│   ├── mcp_server.py            ← merken-mcp MCP server entry point
│   └── policies/
│       ├── should_remember.py
│       ├── should_recall.py
│       ├── should_consolidate.py
│       ├── should_forget.py
│       └── types.py             ← shared Event / Decision / Protocol
├── docs/                        ← user-facing documentation
│   ├── architecture.md          ← the memory model and the loop in depth
│   ├── primitives.md            ← each decision primitive's semantics
│   ├── cli.md                   ← every CLI command with examples
│   ├── mcp-server.md            ← MCP tools reference + Claude Code setup
│   └── extending.md             ← write your own decider
├── experiments/                 ← the empirical bar (CONSTITUTION §9)
│   ├── BENCHMARK_STRATEGY.md    ← measurement doctrine
│   ├── loop_quality/            ← merken's design bar (scenario runner)
│   │   ├── runner.py
│   │   ├── scenario.py
│   │   ├── RESULTS.md
│   │   └── scenarios/*.json
│   └── retrieval/               ← absolute positioning (public benches)
│       └── longmemeval/
│           ├── runner.py
│           ├── dataset.py
│           └── RESULTS.md
├── tests/                       ← pytest suite, every primitive + surface
└── notes/                       ← working notes, prior art, research
    ├── prior-art.md             ← mempalace retrospective
    ├── silt.md                  ← memorial + rule derivations
    ├── research-2026-04-09.md   ← 6 papers verified + findings
    └── vstash-issue-*.md        ← upstream issue drafts

Configuration and defaults

Knob	Default	Where to change
Project name	`"default"` (or `$ENGRAM_PROJECT`)	`Memory(project=...)` / `--project` / env
DB path	`~/.merken/<project>.db`	`Memory(db=...)` / `--db` / `$ENGRAM_DB`
Collection	`"default"`	`Memory(collection=...)`
Embedding model (consolidation)	read from vstash `store_meta` at runtime; fallback to `vstash.config.EmbeddingsConfig().model`	set on the vstash side
Consolidation method	`"embedding_v1"`	`mem.consolidate(method=...)`
Embedding threshold	`0.70` (complete linkage)	`mem.consolidate(embedding_threshold=...)`
Clustering linkage	`"complete"`	`mem.consolidate(embedding_linkage=...)`
`should_remember` decider	`HeuristicWriteDecider()`	`Memory(write_decider=...)`
`should_recall` decider	`LayeredRecaller()` (sem 5, epi 3)	`Memory(recall_decider=...)`
`should_consolidate` decider	`PeriodicConsolidator(min_events=10)`	`Memory(consolidate_decider=...)`
`should_forget` decider	`NeverForget()` (safe)	`Memory(forget_decider=...)`

Deliberately isolated by default: merken's default DB is NOT your ~/.vstash/memory.db. It lives under ~/.merken/<project>.db so a buggy decider can't corrupt your main vstash store. To attach merken to a live vstash, point at it explicitly via --db / $ENGRAM_DB.

Design non-negotiables

From CONSTITUTION.md, enforced in CLAUDE.md:

Local-first. No mandatory network calls. Default install runs offline against a local embedder and a local vstash.
Glass box. Every decision writes to the audit collection.
Single process by default. No daemons, no queues, no Redis.
vstash is a hard dependency. merken never reimplements retrieval or reaches into vstash._private.
Empirical first. Every default-policy change cites a benchmark in experiments/.
Silt's rule. Before proposing an algorithm, look at the distribution of the data. See notes/silt.md.

Development status

merken is pre-v0.1. The four decision primitives are implemented and tested, the three deployment surfaces are working, and the loop-quality safety net is in place. What's next is use — putting merken in front of real agent workflows and watching what the loop does with organic content.

What's deliberately not here

Knowledge graph. CONSTITUTION §5 keeps this optional and gated.
LLM-based consolidation in the hot path. Gated on a scenario where the non-LLM loop leaves real value on the table.
Bespoke compression dialect. See notes/prior-art.md for why.
Spatial vocabulary (wings/rooms/etc.). Use vstash's existing project / collection / layer / tags fields.
MCP tool sprawl. Eight tools, one per primitive. No more.
Public leaderboard. Premature at pre-v0.1.

What's coming

Claude Code hooks (auto-save + precompact) — depends on the MCP server being stable, which it is.
A second and third real-content scenario in loop_quality/.
LoCoMo runner under experiments/retrieval/locomo/ (Phase B of experiments/BENCHMARK_STRATEGY.md).
Overnight full n=500 LongMemEval run (Phase A).

License

MIT.

Read CONSTITUTION.md for why merken exists. Read CLAUDE.md for how to work in the repo. Read docs/ for how to use merken.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Apr 15, 2026

This version

0.1.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merken-0.1.0.tar.gz (183.8 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

merken-0.1.0-py3-none-any.whl (47.3 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file merken-0.1.0.tar.gz.

File metadata

Download URL: merken-0.1.0.tar.gz
Upload date: Apr 13, 2026
Size: 183.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for merken-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6416e88d03064ab36e1aa400b9d9d7886fbef1d8eb5bed27580aec15e8e40216`
MD5	`f177ea1c0ca772e012014fde4d9b0933`
BLAKE2b-256	`1ba5610606b79599d5f15d5a34a472c000c22c2e8a87b9c7696afbda1aad6524`

See more details on using hashes here.

File details

Details for the file merken-0.1.0-py3-none-any.whl.

File metadata

Download URL: merken-0.1.0-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 47.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for merken-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`85407b7638d7067f3782412b9bd093d8b0cc92e027a97cbf60161a2a79f98e49`
MD5	`2183eb83661487c86cef3ebfb0eaa313`
BLAKE2b-256	`1780787bbbc1e25adecc95e1f8fc1c46ba9143e3789c6efd6c42d896b3ca13d4`

See more details on using hashes here.

merken 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

merken

In one paragraph

The three deployment surfaces

1. Python SDK

2. CLI (after pip install -e .)

3. MCP server — use merken from Claude Code

The four decision primitives

Memory layers

Quick start

Tests and scenarios

Repository layout

Configuration and defaults

Design non-negotiables

Development status

What's deliberately not here

What's coming

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

2. CLI (after `pip install -e .`)