Skip to main content

FERNme — a user-owned, near-zero-LLM memory layer for AI agents (websites today; desktop and mobile on the roadmap).

Project description

🌿 FERNme

Fuzzy-Edged Recall Network

A user-owned, near-zero-LLM memory layer for AI agents. It learns each person from their behavior — including how they talk and feel — stays token-flat forever, and lets people see, edit, and own what's remembered. The engine is substrate-agnostic: it remembers wherever an agent acts — websites today (shopping, support, booking, healthcare, tutoring, gov), desktop and mobile next.

License: Apache 2.0 Site Python 3.10+ Tests Storage Status

Cheap to write · flat to read · interpretable by design · owned by the user

fernme.dev


✨ The one-paragraph pitch

Most agent memory is written by an LLM on every turn (expensive, hallucination-prone), evaluated on question-answering (not actions), and assumes a single user. FERNme is built for the opposite world — agents that act for many people, in any domain (a sale, a booking, a resolved ticket, a completed lesson, a kept appointment — "outcome" is whatever the goal is). It starts where agents already act today — websites — and the same user-owned memory is designed to extend to desktop and mobile. Each user is a sparse, fuzzily-weighted node in a per-site graph; edges update by a Hebbian co-occurrence rule with zero LLM calls, retrieval is spreading activation, and the prompt-facing "card" stores only deviations from a population prior. The result: per-turn cost stays flat as a profile grows for years, the user can read and correct their own memory, and the same engine assembles — only with the user's consent — into a cross-site supernode they fully control.


🎯 Why FERNme (the strong points)

🪶 Zero-LLM writes Memory updates are arithmetic on a graph — 0 LLM calls per interaction vs. ~2 for extraction-based memory. No write-time cost, no write-time hallucination.
📉 Flat token cost forever The prompt card holds ~25 tokens whether it's a visitor's first day or fifth year. A full-history baseline is 77× larger by 120 interactions.
🧠 Strong in every regime Ties a frequency counter on static recall, beats it 0.72 → 0.13 on drift, and wins on context (0.62 → 0.51). Decay + spreading activation unify stability and adaptivity.
🪟 Glass-box & user-owned Every preference is visible and editable. People fix what's wrong, delete everything, or export it. Privacy becomes a feature, not a liability.
🏬 Built for outcomes Evaluated by conversion, not QA. A simulated storefront shows +16% conversion lift vs. non-personalized recommendations.
🧩 User-owned supernode Sign in across sites → your memories assemble like Lego into one profile you control, default-deny, sensitive data walled off. Not surveillance — the mirror image of it.
🎚 Cost/quality dial One engine, a memory_mode switch: free key-less pure by default, opt-in gated/offline LLM enrichment when you need Mem0-grade nuance — pay only for the compute you use.
🔐 Verifiable & unlearnable Every action is logged in a tamper-evident HMAC chain the user can replay to detect any alteration; forget_everywhere wipes the profile and unlearns the person from the population prior — provable right-to-be-forgotten.
🛡 Injection-proof by design Writes are arithmetic, not LLM extraction, so page/user text can't be "talked into" becoming a belief — tested that injected instructions never enter memory.
🧠 Private collective intelligence New users benefit from crowd patterns on turn one (cold-start from a population prior), with k-anonymity + differential privacy so no individual leaks. A network-effect moat single-user memories can't have.
🗣 Style & mood memory Learns how each person communicates (terse/verbose, formal/casual, energy) and tracks their mood with trend detection, so the agent can match tone and notice when someone's frustration is rising — in any domain.
🎯 Outcome-learning, any goal Memory is reinforced by results — not just recall. record_outcome(success) strengthens what worked and weakens what backfired, where "success" is any goal (purchase, booking, resolved ticket, completed lesson…).
🔍 Explainable Ask why(user, attr) — get the evidence (observations + good/bad outcomes + dates). No black box.
🔌 Deployable plumbing (research preview; harden per SECURITY.md) SQLite or Postgres (tested on real PG 16), REST + MCP servers, consent gating, injection-safe writes, proactive triggers — all tested.

📊 Benchmarks

Honest scope: the numbers below are on synthetic or LLM-authored data, not real users. They validate the mechanism and surface failures; a real-human pilot is the pending next step. The Mem0 (LLM) head-to-head needs an API key and is not yet run.

On LLM-authored people (closest to real, agentic ingestion)

A sample of 16 of 92 third-person profiles (ChatGPT-authored), read as prose only and remembered agentically, then scored against hidden answer keys:

metric result
preference coverage vs. hidden key 75%
communication style — formality 100%
mood sign / mood arc 94% / 100%
preference drift detected 94%
injection attempts ignored 100%
note → card compression 7.3×

(The "agent" here is an LLM reading prose, so these reflect agent + engine together — the engine is solid; the extraction quality is the agent's.)

Cost, recall, and Pareto (synthetic, multi-seed)

Reproduce: python -m fernme.eval.cost_variance · ... quality · ... drift · ... context · ... ablation · ... pilot

Cost — per-turn memory tokens vs. profile size (5 seeds):

metric FERNme baseline
card size 24.9 ± 0.5 tokens (flat) full history grows linearly
at 120 interactions 77× ± 1.3 larger
LLM calls per write 0 ~2 (extraction memory)

Recall quality — precision@5 vs. ground-truth preferences (5 seeds × 40 users):

regime 🌿 FERNme frequency recency
static recall 0.74 0.74 0.47
drift (taste shifts) 0.72 0.13 ❌ 0.59
context (precision@3) 0.62 0.51 (blind)

The headline: FERNme is the only method strong everywhere. Frequency can't forget (fails drift); recency is noisy (fails static). FERNme's decay + spreading activation get both.

Cold-start ablation — population prior gives +0.06 precision@5 at turns 1–3, washing out by turn 10 (a real but modest, cold-start-only benefit).

Cost / quality Pareto (python -m fernme.eval.pareto) — measured FERNme recall & tokens, modeled LLM nuance & price (assumptions in-file). Per 1,000 interactions:

strategy quality $/1k vs Mem0
FERNme-pure 0.52 $0.008 122× cheaper
FERNme+gated 0.66 $0.023 42× cheaper
FERNme+offline 0.73 $0.104 9× cheaper
full-history@120 0.82 $0.59 (grows)
Mem0-style 0.82 $0.95

FERNme+gated/offline sit on the efficient knee: ~80–90% of the LLM-ceiling quality at 1–2 orders of magnitude less cost. (Modeled assumptions; shape is the point.)

Cost/quality Pareto — FERNme+gated/offline on the efficient knee

Simulated outcome pilot — fake storefront, learn-from-behavior shoppers: +16% relative conversion lift over a popularity baseline; tied at visit 1 (cold start), pulling ahead as it learns, recovering through a mid-pilot taste drift.


🎚 Memory modes (one engine, a cost/quality dial)

FERNme ships one core with a deployment-level switch — FernService(memory_mode=...). The default is free, key-less, and tested; LLM modes are opt-in and pluggable.

mode LLM use cost status
pure (default) none cheapest, flat ✅ tested, key-less
gated one small call only on novel free-text ~tiny 🧪 experimental — needs a model
offline batched consolidate() enrichment, off the hot path ~tiny, amortized 🧪 experimental — needs a model
  • A pluggable tagger (tagging.py) does the LLM work; you pass llm_fn, optionally constrained to a controlled vocabulary (the real consistency lever across models).
  • The hot write path stays LLM-free in every mode; gated spends a call only when the deterministic mapping finds nothing, and svc.llm_calls counts every invocation for cost transparency.
  • See the cost/quality Pareto above for where each mode lands. Honest note: the gated/ offline quality is modeled until run against a real model — the wiring is tested here with a mock LLM, not validated for quality.

🧭 The 9 leapfrog dimensions (status)

FERNme's edge isn't the mechanism (that's now a crowded 2026 category) — it's competing on dimensions single-user, vendor-owned, recall-optimized systems structurally can't.

# Dimension Status
9 Communication-style & mood memory ✅ built + tested
2 Outcome-learning for any goal (reinforce on results) ✅ built + tested
8 Explainable provenance (why) ✅ built + tested
1 Private collective priors (network-effect cold-start; k-anonymity + bounded-mean DP) ✅ built + tested
4 Verifiable, cryptographic data ownership (tamper-evident HMAC chain, cascading unlearning) ✅ built + tested
7 Multi-timescale memory (fast context vs. slow identity) ✅ built + tested
6 Self-tuning forgetting (learn decay from outcomes; adapts to drift) ✅ built + tested
5 Injection-resistant by construction (deterministic writes can't be talked into beliefs) ✅ built + tested
3 Open user-owned memory protocol (portable across any agent, with consent) ◑ spec stage

These are deliberately the things HippoGraph et al. can't follow: they're single-user (no collective priors), vendor-owned (no user-owned protocol), and recall-optimized (no outcome loop). Built in honest, tested slices — research-dependent ones are marked.

🏗 Architecture

flowchart TD
    V[Visitor on a website] -->|prompt + action| API[FERNme Service]
    API --> CONSENT{consent?}
    CONSENT -->|no| STOP[blocked]
    CONSENT -->|yes| ENGINE
    subgraph ENGINE[Engine - no LLM in the write path]
      W[Hebbian write + decay] --> G[(Per-site preference graph<br/>fuzzy 0-9 edges)]
      G --> R[Spreading-activation retrieval]
      R --> CARD[Token-minimal card ~25 tok]
      PRIOR[Population prior<br/>differential encoding] --> R
    end
    CARD --> AGENT[Agent: recommend / act]
    G --> CAB[(Cabinet: raw event log)]
    API --> STORE[(SQLite or Postgres<br/>multi-tenant)]
    API --> GLASS[🪟 Glass-box editor]
    API -.user signs in.-> SUPER[User-owned Supernode<br/>cross-site, default-deny]

🧠 How FERNme works (visual walkthrough)

Why FERNme Why FERNme — adaptive local memory instead of expensive RAG/vector retrieval in the loop.

Seven core principles What makes it different — near-zero-LLM, deterministic-first, Hebbian, fuzzy, memory cards, action-aware, user-owned.

How memory grows How memory grows — new event → connect → strengthen → decay → update the card (Hebbian learning).

Fuzzy Hebbian graph The fuzzy Hebbian graph — sparse, weighted (0–9) edges; nodes for users, preferences, topics, goals.

The LLM gate The LLM gate — an exception, not the default. Most events are handled deterministically; the LLM is a rare fallback when uncertain.

Memory card The memory card — a bounded, interpretable, token-minimal summary of what matters.

Action-aware learning Action-aware learning — good outcomes strengthen connections, bad outcomes weaken them.

The road ahead The road ahead — today's local memory; tomorrow's recursive organization and user-owned supernode (roadmap, not yet built).

FERNme architecture Full architecture: ingestion bridge → namespaced vocabulary → fuzzy Hebbian graph → memory card → agent, with the LLM gate only when uncertain.

🚀 Quickstart

pip install -e ".[dev,api]"

python run_demo.py                      # cold-start → learning → glass-box edit
python supernode_demo.py                # one person, three sites, one owned profile
pytest -q                               # 91 tests (engine, store, supernode, safety, auth…)

# experiments
python -m fernme.eval.drift               # FERNme beats a frequency counter when tastes change
python -m fernme.eval.pilot               # +16% simulated conversion lift

# run it live
FERNME_API_KEY=secret uvicorn fernme.api.rest:app --port 8077   # REST API (docs at /docs)
open http://localhost:8077/ui                               # glass-box memory editor
open http://localhost:8077/graph                            # your memory as a graph — focus by site / PC / phone
python -m fernme.api.mcp_server                               # MCP server for agents/Claude

🗄 Storage: defaults to ~/.fernme/fernme.db (SQLite). For production use PostgresStore — same interface, tested against a real Postgres 16. Keep SQLite off cloud-synced folders.


🧱 What's inside

  • Engine — saturating Hebbian write (no LLM), ACT-R decay, spreading activation, token-minimal card.
  • Population prior — IDF cold-start; differential (deviation-only) storage is enforced by an explicit prune_to_prior pass (redundant edges read through to the prior).
  • StoresSQLiteStore (zero-setup) and PostgresStore (tested vs real PG 16), one interface.
  • Ingestion bridge — a per-site catalog (item_id->tags) plus a controlled, namespaced vocabulary (vocabulary.py) that canonicalizes every tag (catalog, free text, or LLM) to one form (pref:, topic:, goal:, context:) so the same concept never drifts across months. Deterministic by default; gated-LLM only for novel free text. This is the product-critical layer — and the foundation a future recursive/region organization would group on.
  • The Cabinet — append-only event log with recall() for specific facts.
  • Supernode (supernode.py + auth.py) — user-owned cross-site profile, built by sign-in (verified token → opaque person id), default-deny scoped views, sensitive categories walled off.
  • Proactive triggers — due-to-reorder + fading-favorite nudges.
  • Safety — event tags treated as untrusted data: injection-pattern dropping, size/value caps.
  • Interfaces — REST (/observe /card /recall /edit /export /delete /triggers …) + MCP tools + a glass-box web UI (editor at /ui, cross-surface memory graph at /graph — one memory, focusable by site / PC / phone).
  • Governance — consent-gated everywhere; export & right-to-be-forgotten built in.

🔬 How FERNme compares

FERNme is a different category from conversational memories — it's a per-user preference graph evaluated by actions, not a QA memory. Don't benchmark it on LoCoMo; that's the wrong axis.

🌿 FERNme Mem0 Zep/Graphiti Letta MemOS
Write no LLM LLM LLM → KG LLM-paged LLM
Retrieval spreading activation vector graph+time OS paging hybrid
Eval axis outcomes QA temporal QA long-horizon QA
User-owned + glass-box
Multi-tenant per-site passport

Leads on: write cost, interpretability, per-site user-ownership/consent. Honestly behind on: nuanced/causal preferences (LLM extraction wins), benchmark credibility, ecosystem & distribution.


⚖️ Honest status

Done & tested (91 tests): engine, SQLite + real-Postgres stores, supernode + sign-in, triggers, safety, REST/MCP, glass-box UI + memory-graph view, and the full results suite above.

🚧 Still open (genuinely needs the outside world):

  • A real-human per-site pilot — only live users close the loop a simulator can't.
  • The Mem0 (LLM) head-to-head — harness wired; run locally with OPENAI_API_KEY.
  • Embeddings for context→attribute matching; offline LLM catalog enrichment for messy inputs.
  • Desktop & mobile surfaces — the engine is substrate-agnostic; web ingestion ships today, desktop/mobile adapters are on the roadmap. The user-owned supernode is the bridge that assembles them, with consent, into one cross-surface profile.

Every claim above is backed by a test or a reproducible experiment. Where a result is simulated, it says so — a simulator proves the mechanism, not real-world behavior.


📁 Layout

fernme/
  core/      graph types · fuzzy 0–9 edges · event record
  write/     event→attr mapping (no LLM) · Hebbian update · decay
  retrieve/  base-level + spreading activation · token-minimal card
  prior/     population prior · differential encoding · IDF cold-start
  store/     sqlite_store · postgres_store (one interface)
  supernode.py · auth.py · triggers.py · safety.py · service.py
  api/       rest.py (FastAPI) · mcp_server.py · web/glassbox.html · web/graph.html
  eval/      simulator · cost · quality · drift · context · ablation · pilot
tests/       91 tests   ·   *_demo.py walkthroughs

📜 License & citation

Apache-2.0, © 2026 Acquilab Inc. — see LICENSE and NOTICE. Security notes in SECURITY.md; the name is a working codename (see NAMING.md). If you use FERNme in research, please cite it via CITATION.cff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fernme-0.3.0.tar.gz (99.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fernme-0.3.0-py3-none-any.whl (92.8 kB view details)

Uploaded Python 3

File details

Details for the file fernme-0.3.0.tar.gz.

File metadata

  • Download URL: fernme-0.3.0.tar.gz
  • Upload date:
  • Size: 99.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for fernme-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b789534e1a9069bd5fa6705ddccc81ca13865f5a68dafa97512c0c5bdf7673cd
MD5 3b2134f26c09775bf2a22b4a7434714b
BLAKE2b-256 d97a065e0f9fb0c7b08c04feaaba6aff9f95201ffb71945d3d03e9bde7774079

See more details on using hashes here.

File details

Details for the file fernme-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: fernme-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 92.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for fernme-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b4e0f935766716d79f95e011353cb71c6e5c2c27e4bae9699f66c703e3f3d235
MD5 86b22f17efce2a6c804d39a7e13186cd
BLAKE2b-256 fa7048f77b2cb29808d88dcbd17722eecd707bda698f1066b098713f85ebff97

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page