Skip to main content

FERNme — a user-owned, near-zero-LLM memory layer for AI agents (websites today; desktop and mobile on the roadmap).

Project description

🌿 FERNme

Fuzzy-Edged Recall Network

Agent personalization memory that models the user, not the transcript.

A user-owned, near-zero-LLM personalization memory layer for AI agents. It turns consented interactions into an inspectable model of each person's preferences, habits, communication style, and constraints — staying token-flat as it grows, while letting people see, edit, delete, and own what agents use to personalize.

License: Apache 2.0 Site Python 3.10+ Tests Storage Status

Cheap to write · flat to read · interpretable by design · owned by the user

fernme.dev


✨ The one-paragraph pitch

Most agent memory is written by an LLM on every turn (expensive, hallucination-prone), evaluated on question-answering (not actions), and assumes a single user. FERNme is built for the opposite world — agents that act for many people, in any domain (a sale, a booking, a resolved ticket, a completed lesson, a kept appointment — "outcome" is whatever the goal is). It starts where agents already act today — websites — and builds a user-owned personalization model the person can inspect and control. Each user is a sparse, fuzzily-weighted node in a per-site graph; edges update by a Hebbian co-occurrence rule with zero LLM calls, retrieval is spreading activation, and the prompt-facing "card" stores only deviations from a population prior. The result: per-turn cost stays flat as a profile grows for years, the user can read and correct what agents use to personalize, and the same engine assembles — only with the user's consent — into a cross-site supernode they fully control.


What it is / is not

FERNme is FERNme is not
A user-owned personalization memory layer for agents A generic transcript store
A fuzzy graph of preferences, habits, style, constraints, and outcomes A folder of chat logs with search
Deterministic-first memory writes with optional LLM enrichment An LLM extraction call on every turn
A bounded prompt card that stays small as memory grows Full-history injection
An inspectable and editable user model Hidden behavioral profiling
Consent-first, per-site, default-deny sharing Cross-site surveillance

🎯 Why FERNme (the strong points)

🪶 Zero-LLM writes Memory updates are arithmetic on a graph — 0 LLM calls per interaction vs. ~2 for extraction-based memory. No write-time cost, no write-time hallucination.
📉 Flat token cost forever The prompt card holds ~25 tokens whether it's a visitor's first day or fifth year. A full-history baseline is 77× larger by 120 interactions.
🧠 Strong in every regime Ties a frequency counter on static recall, beats it 0.72 → 0.13 on drift, and wins on context (0.62 → 0.51). Decay + spreading activation unify stability and adaptivity.
🪟 Glass-box & user-owned Every preference is visible and editable. People fix what's wrong, delete everything, or export it. Privacy becomes a feature, not a liability.
🏬 Built for outcomes Evaluated by conversion, not QA. A simulated storefront shows +16% conversion lift vs. non-personalized recommendations.
🧩 User-owned supernode Sign in across sites → your memories assemble like Lego into one profile you control, default-deny, sensitive data walled off. Not surveillance — the mirror image of it.
🎚 Cost/quality dial One engine, a memory_mode switch: free key-less pure by default, opt-in gated/offline LLM enrichment when you need Mem0-grade nuance — pay only for the compute you use.
🔐 Verifiable & unlearnable Every action is logged in a tamper-evident HMAC chain the user can replay to detect any alteration; forget_everywhere wipes the profile and unlearns the person from the population prior — provable right-to-be-forgotten.
🛡 Injection-proof by design Writes are arithmetic, not LLM extraction, so page/user text can't be "talked into" becoming a belief — tested that injected instructions never enter memory.
🧠 Private collective intelligence New users benefit from crowd patterns on turn one (cold-start from a population prior), with k-anonymity + differential privacy so no individual leaks. A network-effect moat single-user memories can't have.
🗣 Style & mood memory Learns how each person communicates (terse/verbose, formal/casual, energy) and tracks their mood with trend detection, so the agent can match tone and notice when someone's frustration is rising — in any domain.
🎯 Outcome-learning, any goal Memory is reinforced by results — not just recall. record_outcome(success) strengthens what worked and weakens what backfired, where "success" is any goal (purchase, booking, resolved ticket, completed lesson…).
🔍 Explainable Ask why(user, attr) — get the evidence (observations + good/bad outcomes + dates). No black box.
🔌 Deployable plumbing (research preview; harden per SECURITY.md) SQLite or Postgres (tested on real PG 16), REST + MCP servers, consent gating, injection-safe writes, proactive triggers — all tested.

📊 Benchmarks

Honest scope: the numbers below are on synthetic or LLM-authored data, not real users. They validate the mechanism and surface failures; a real-human pilot is the pending next step. The Mem0 (LLM) head-to-head needs an API key and is not yet run.

On LLM-authored people (closest to real, agentic ingestion)

A sample of 16 of 92 third-person profiles (ChatGPT-authored), read as prose only and remembered agentically, then scored against hidden answer keys:

metric result
preference coverage vs. hidden key 75%
communication style — formality 100%
mood sign / mood arc 94% / 100%
preference drift detected 94%
injection attempts ignored 100%
note → card compression 7.3×

(The "agent" here is an LLM reading prose, so these reflect agent + engine together — the engine is solid; the extraction quality is the agent's.)

Cost, recall, and Pareto (synthetic, multi-seed)

Reproduce: python -m fernme.eval.cost_variance · ... quality · ... drift · ... context · ... ablation · ... pilot

Cost — per-turn memory tokens vs. profile size (5 seeds):

metric FERNme baseline
card size 24.9 ± 0.5 tokens (flat) full history grows linearly
at 120 interactions 77× ± 1.3 larger
LLM calls per write 0 ~2 (extraction memory)

Recall quality — precision@5 vs. ground-truth preferences (5 seeds × 40 users):

regime 🌿 FERNme frequency recency
static recall 0.74 0.74 0.47
drift (taste shifts) 0.72 0.13 ❌ 0.59
context (precision@3) 0.62 0.51 (blind)

The headline: FERNme is the only method strong everywhere. Frequency can't forget (fails drift); recency is noisy (fails static). FERNme's decay + spreading activation get both.

Cold-start ablation — population prior gives +0.06 precision@5 at turns 1–3, washing out by turn 10 (a real but modest, cold-start-only benefit).

Cost / quality Pareto (python -m fernme.eval.pareto) — measured FERNme recall & tokens, modeled LLM nuance & price (assumptions in-file). Per 1,000 interactions:

strategy quality $/1k vs Mem0
FERNme-pure 0.52 $0.008 122× cheaper
FERNme+gated 0.66 $0.023 42× cheaper
FERNme+offline 0.73 $0.104 9× cheaper
full-history@120 0.82 $0.59 (grows)
Mem0-style 0.82 $0.95

FERNme+gated/offline sit on the efficient knee: ~80–90% of the LLM-ceiling quality at 1–2 orders of magnitude less cost. (Modeled assumptions; shape is the point.)

Cost/quality Pareto — FERNme+gated/offline on the efficient knee

Simulated outcome pilot — fake storefront, learn-from-behavior shoppers: +16% relative conversion lift over a popularity baseline; tied at visit 1 (cold start), pulling ahead as it learns, recovering through a mid-pilot taste drift.


🎚 Memory modes (one engine, a cost/quality dial)

FERNme ships one core with a deployment-level switch — FernService(memory_mode=...). The default is free, key-less, and tested; LLM modes are opt-in and pluggable.

mode LLM use cost status
pure (default) none cheapest, flat ✅ tested, key-less
gated one small call only on novel free-text ~tiny 🧪 experimental — needs a model
offline batched consolidate() enrichment, off the hot path ~tiny, amortized 🧪 experimental — needs a model
  • A pluggable tagger (tagging.py) does the LLM work; you pass llm_fn, optionally constrained to a controlled vocabulary (the real consistency lever across models).
  • The hot write path stays LLM-free in every mode; gated spends a call only when the deterministic mapping finds nothing, and svc.llm_calls counts every invocation for cost transparency.
  • See the cost/quality Pareto above for where each mode lands. Honest note: the gated/ offline quality is modeled until run against a real model — the wiring is tested here with a mock LLM, not validated for quality.

🧭 The 9 leapfrog dimensions (status)

FERNme's edge isn't the mechanism (that's now a crowded 2026 category) — it's competing on dimensions single-user, vendor-owned, recall-optimized systems structurally can't.

# Dimension Status
9 Communication-style & mood memory ✅ built + tested
2 Outcome-learning for any goal (reinforce on results) ✅ built + tested
8 Explainable provenance (why) ✅ built + tested
1 Private collective priors (network-effect cold-start; k-anonymity + bounded-mean DP) ✅ built + tested
4 Verifiable, cryptographic data ownership (tamper-evident HMAC chain, cascading unlearning) ✅ built + tested
7 Multi-timescale memory (fast context vs. slow identity) ✅ built + tested
6 Self-tuning forgetting (learn decay from outcomes; adapts to drift) ✅ built + tested
5 Injection-resistant by construction (deterministic writes can't be talked into beliefs) ✅ built + tested
3 Open user-owned memory protocol (portable across any agent, with consent) ◑ spec stage

These are deliberately the things HippoGraph et al. can't follow: they're single-user (no collective priors), vendor-owned (no user-owned protocol), and recall-optimized (no outcome loop). Built in honest, tested slices — research-dependent ones are marked.

🏗 Architecture

flowchart TD
    V[Visitor on a website] -->|prompt + action| API[FERNme Service]
    API --> CONSENT{consent?}
    CONSENT -->|no| STOP[blocked]
    CONSENT -->|yes| ENGINE
    subgraph ENGINE[Engine - no LLM in the write path]
      W[Hebbian write + decay] --> G[(Per-site preference graph<br/>fuzzy 0-9 edges)]
      G --> R[Spreading-activation retrieval]
      R --> CARD[Token-minimal card ~25 tok]
      PRIOR[Population prior<br/>differential encoding] --> R
    end
    CARD --> AGENT[Agent: recommend / act]
    G --> CAB[(Cabinet: raw event log)]
    API --> STORE[(SQLite or Postgres<br/>multi-tenant)]
    API --> GLASS[🪟 Glass-box editor]
    API -.user signs in.-> SUPER[User-owned Supernode<br/>cross-site, default-deny]

🧠 How FERNme works (visual walkthrough)

Why FERNme
Why FERNme — adaptive local memory instead of expensive RAG/vector retrieval in the loop.
Seven core principles
Core principles — near-zero-LLM, deterministic-first, Hebbian, fuzzy, memory cards, action-aware, user-owned.
How memory grows
How memory grows — new event → connect → strengthen → decay → update the card.
Fuzzy Hebbian graph
Fuzzy Hebbian graph — sparse, weighted (0–9) edges for users, preferences, topics, and goals.
The LLM gate
The LLM gate — an exception, not the default; most events are handled deterministically.
Memory card
Memory card — bounded, interpretable, token-minimal context for the agent.
Action-aware learning
Action-aware learning — good outcomes strengthen connections, bad outcomes weaken them.
FERNme architecture
Architecture — ingestion bridge → vocabulary → fuzzy graph → memory card → agent, with LLM fallback only when uncertain.

🚀 Quickstart

pip install -e ".[dev,api]"

python run_demo.py                      # cold-start → learning → glass-box edit
python supernode_demo.py                # one person, three sites, one owned profile
pytest -q                               # 119 passing, 2 skipped

# experiments
python -m fernme.eval.drift               # FERNme beats a frequency counter when tastes change
python -m fernme.eval.pilot               # +16% simulated conversion lift

# run it live
FERNME_API_KEY=secret uvicorn fernme.api.rest:app --port 8077   # REST API (docs at /docs)
open http://localhost:8077/ui                               # glass-box memory editor
open http://localhost:8077/graph                            # your memory as a graph — focus by site / PC / phone
python -m fernme.api.mcp_server                               # MCP server for agents/Claude

🗄 Storage: defaults to ~/.fernme/fernme.db (SQLite). For production use PostgresStore — same interface, tested against a real Postgres 16. Keep SQLite off cloud-synced folders.


Minimal API example

from fernme.service import FernService

svc = FernService(db_path=":memory:")
svc.consent("shop.example", "elena", True)

svc.observe(
    "shop.example",
    "elena",
    "chat",
    {
        "tags": ["pref:concise", "pref:oat_milk"],
        "text": "Elena prefers concise answers and oat milk.",
    },
)

print(svc.card("shop.example", "elena")["wire"])

🧱 What's inside

  • Engine — saturating Hebbian write (no LLM), ACT-R decay, spreading activation, token-minimal card.
  • Population prior — IDF cold-start; differential (deviation-only) storage is enforced by an explicit prune_to_prior pass (redundant edges read through to the prior).
  • StoresSQLiteStore (zero-setup) and PostgresStore (tested vs real PG 16), one interface.
  • Ingestion bridge — a per-site catalog (item_id->tags) plus a controlled, namespaced vocabulary (vocabulary.py) that canonicalizes every tag (catalog, free text, or LLM) to one form (pref:, topic:, goal:, context:) so the same concept never drifts across months. Deterministic by default; gated-LLM only for novel free text. This is the product-critical layer — and the foundation a future recursive/region organization would group on.
  • The Cabinet — append-only event log with recall() for specific facts.
  • Supernode (supernode.py + auth.py) — user-owned cross-site profile, built by sign-in (verified token → opaque person id), default-deny scoped views, sensitive categories walled off.
  • Proactive triggers — due-to-reorder + fading-favorite nudges.
  • Safety — event tags treated as untrusted data: injection-pattern dropping, size/value caps.
  • Interfaces — REST (/observe /card /recall /edit /export /delete /triggers …) + MCP tools + a glass-box web UI (editor at /ui, cross-surface memory graph at /graph — one memory, focusable by site / PC / phone).
  • Governance — consent-gated everywhere; export & right-to-be-forgotten built in.

🔬 How FERNme compares

FERNme is a different category from conversational memories — it is a user-owned personalization graph evaluated by actions, not a transcript store optimized for QA recall. Don't benchmark it only on LoCoMo; that's the wrong axis.

🌿 FERNme Mem0 Zep/Graphiti Letta MemOS
Write no LLM LLM LLM → KG LLM-paged LLM
Retrieval spreading activation vector graph+time OS paging hybrid
Eval axis outcomes QA temporal QA long-horizon QA
User-owned + glass-box
Multi-tenant per-site passport

Leads on: write cost, interpretability, per-site user-ownership/consent. Honestly behind on: nuanced/causal preferences (LLM extraction wins), benchmark credibility, ecosystem & distribution.


⚖️ Honest status

Done & tested (119 passing, 2 skipped): engine, SQLite + real-Postgres stores, supernode + sign-in, triggers, safety, REST/MCP, glass-box UI + memory-graph view, and the full results suite above.

🆕 New, opt-in (not yet validated): resolution/temperature decay layer — implemented and unit-tested, off by default (resolution=False). It is not in the results suite above; its efficacy (vs. the current flat decay) still needs the control-vs-treatment harness before any quality claim.

🚧 Still open (genuinely needs the outside world):

  • A real-human per-site pilot — only live users close the loop a simulator can't.
  • The Mem0 (LLM) head-to-head — harness wired; run locally with OPENAI_API_KEY.
  • Embeddings for context→attribute matching; offline LLM catalog enrichment for messy inputs.

Every claim above is backed by a test or a reproducible experiment. Where a result is simulated, it says so — a simulator proves the mechanism, not real-world behavior.


📁 Layout

fernme/
  core/      graph types · fuzzy 0–9 edges · event record
  write/     event→attr mapping (no LLM) · Hebbian update · decay
  retrieve/  base-level + spreading activation · token-minimal card
  prior/     population prior · differential encoding · IDF cold-start
  store/     sqlite_store · postgres_store (one interface)
  supernode.py · auth.py · triggers.py · safety.py · service.py
  api/       rest.py (FastAPI) · mcp_server.py · web/glassbox.html · web/graph.html
  eval/      simulator · cost · quality · drift · context · ablation · pilot
tests/       119 passing, 2 skipped   ·   *_demo.py walkthroughs

📜 License & citation

Apache-2.0, © 2026 Acquilab Inc. — see LICENSE and NOTICE. Security notes in SECURITY.md; the name is a working codename (see NAMING.md). If you use FERNme in research, please cite it via CITATION.cff.

Research preview. Benchmarks are synthetic or LLM-authored unless stated otherwise.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fernme-0.3.3.tar.gz (103.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fernme-0.3.3-py3-none-any.whl (96.4 kB view details)

Uploaded Python 3

File details

Details for the file fernme-0.3.3.tar.gz.

File metadata

  • Download URL: fernme-0.3.3.tar.gz
  • Upload date:
  • Size: 103.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fernme-0.3.3.tar.gz
Algorithm Hash digest
SHA256 f3e962c0c2218c42b7bb8b9b3014ee8e83c74fa10ed3f6bf0787118454e44b02
MD5 613e008b3ae36d3809ba7b39f7ea6dc3
BLAKE2b-256 d5cceb6e76179aec5a0db0e8a1f6973315bafb4245657210f8685fc0b1ace7dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for fernme-0.3.3.tar.gz:

Publisher: publish.yml on mirkofr/FERNme

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fernme-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: fernme-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 96.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fernme-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b33ef6e7f20c4ced516a36ea2dda805153fc7b33ea924327a1cad7cfb70d0d8b
MD5 7cd3af1e6c1fcdaf41bf8395630723af
BLAKE2b-256 59025fef8f2048c433eb8798fa9488ee10a325a8ebcf2a8297221e52e3b13800

See more details on using hashes here.

Provenance

The following attestation bundles were made for fernme-0.3.3-py3-none-any.whl:

Publisher: publish.yml on mirkofr/FERNme

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page