Skip to main content

Lifetime persistent memory for LLMs. A lightweight SDK for structured knowledge graphs.

Project description

Reeve

Lifetime persistent memory for LLMs — a knowledge graph that remembers everything, knows what changed, and never forgets what matters.

Reeve is a lightweight SDK for long-term AI memory. It transforms unstructured chat history into a structured Neo4j knowledge graph, ensuring that every fact you store today remains accurately retrievable a lifetime from now — in context, and evolving with you.

import os
from reeve import store_memory, query_memory

# Set your API key
os.environ["REEVE_API_KEY"] = "sk-XXXXXXXXXXXXXXXXXXXXXXXX"

# Store a memory
store_memory("I just started as a software engineer at Google in San Francisco")

# Months later...
store_memory("I moved from San Francisco to New York")

# Reeve connects the dots across time
print(query_memory("Where do I live?"))        # → "New York."
print(query_memory("Did I ever live in SF?"))  # → "Yes, before moving to New York."

The memory of your move was stored months after your initial job announcement. But when you need it, Reeve connects the dots — because it actually understands the relationship between locations, time, and your career history.

No context windows. No token limits. No forgetting. Persistent memory that outlives any single conversation — or any single year.


Why This Exists

LLMs are stateless. Every conversation starts from zero. The common workarounds all break down over time:

Approach Works for a week Breaks at scale
Chat history Grows unbounded, no structure, can't handle contradictions
Vector stores (Pinecone, ChromaDB) Flat chunks — "I moved to NYC" never invalidates "I live in SF"
RAG pipelines Retrieves similar text, not relevant knowledge
Context-window managers (MemGPT) Clever paging, but still raw text — no entity tracking, no state evolution

Ask any of them: "What changed about me since last year?" — silence.

Ask Reeve — it knows, because it actually tracks entities, states, and time.

How It Works

Reeve doesn't store text. It understands it.

Every input is parsed by an LLM into structured semantics — entities, actions, states, roles, emotions, locations — and written into a Neo4j knowledge graph with typed relationships. This isn't an embedding dump. It's a living, evolving model of everything you've told it.

What Happens When You Store a Memory

"I love playing football" becomes a structured graph:

(Episode) ──INVOLVES──▶ (Entity: you)
    │
    ├──HAS_STATE──▶ (State: hobby=football, sentiment=love) ──OF_ENTITY──▶ (Entity: you)
    └──INVOLVES──▶ (Entity: football)

"I just started at Google as a software engineer in San Francisco":

(Episode) ──INVOLVES──▶ (Entity: Google)
    │                        │
    ├──HAS_ACTION──▶ (Action: started working) ──BY_ENTITY──▶ (Entity: you)
    │                                          ──ON_ENTITY──▶ (Entity: Google)
    ├──HAS_STATE──▶ (State: role=software engineer) ──OF_ENTITY──▶ (Entity: you)
    └──AT_LOCATION──▶ (Location: San Francisco)

Months later, when someone asks "My friend invited me to play football, should I go?", Reeve retrieves the football episode by semantic similarity, sees the sentiment=love state, and answers: "Yes! You love playing football."

No explicit link was ever made between the question and the stored memory — the knowledge graph makes the connection automatically.

What Happens When Facts Change

"I moved from San Francisco to New York" — the old state is superseded, not deleted or duplicated. Full history is preserved:

(State: city=New York, active=true) ──SUPERSEDES──▶ (State: city=San Francisco, active=false)

Ask "Where do I live?" → gets the current answer: New York.
Ask "Where was I living when I joined Google?" → still knows it was San Francisco.

This is what persistent memory actually means. Not similarity search — structured, evolving knowledge with a complete audit trail.

Triple-Scoring Retrieval: Why the Right Memory Surfaces

Most systems rank by vector similarity alone. That works for a week. Over years, your "I got married" memory gets buried under thousands of newer, more recent entries.

Reeve scores every candidate across three scoring dimensions simultaneously:

score = 0.65 × vector_similarity + 0.30 × importance + 0.05 × recency
  • Vector similarity — semantic relevance via ANN search
  • Importance — LLM-assigned at ingestion (landmark life events score higher)
  • Recency — exponential decay with a 365-day half-life

The key: memories above importance 0.75 bypass recency decay entirely. Your wedding, your child's birth, a cancer diagnosis — these surface instantly no matter how old they are. Yesterday's lunch order won't outrank them.

Entity Resolution: One Identity, Many Names

"Google", "my company", "the office", "work" — all resolve to the same canonical entity through 3-layer matching:

  1. Case-insensitive exact — instant lookup
  2. Substring containment — "my company Google" → Google
  3. Embedding similarity — cosine ≥ 0.88 catches semantic equivalents

Designed to Last a Lifetime

Most memory systems assume weeks of data. Reeve is engineered for a lifetime of use:

What could go wrong How Reeve handles it
Graph grows to millions of nodes Dynamic candidate pool scales with size (2% of episodes, 50–500)
Old memories become noise Consolidation engine LLM-summarizes old low-importance episodes
Entity names fragment over years Background 3-pass deduplication merges fragmented nodes
Facts become outdated SUPERSEDES chain — current state always queryable, history preserved
Embedding models get replaced Model version tags + batch reindex utility
Disaster strikes Full JSON backup with timestamped exports

Store a memory on day one. Query it a lifetime later. It's still there, still accurate, still in context.


Features

  • Pluggable LLM providers — Ollama (local, free) or OpenAI (cloud); switch via one env var
  • Semantic extraction — LLM parses text into structured entities, actions, states, roles, emotions, importance, and location
  • Neo4j knowledge graph — Episodes, Entities, Actions, States, Roles, Locations with typed relationships
  • 3-layer entity resolution — case-insensitive → substring → embedding similarity (≥ 0.88)
  • State contradiction handling — new facts SUPERSEDE old ones with full audit trail
  • Vector ANN search — Provider-matched embeddings indexed in Neo4j
  • Lifetime recency tuning — 5% recency weight, 365-day half-life, importance floor bypass
  • Temporal queries — supports "last 3 months", "in 2024", "March 2025" natively
  • Memory consolidation — LLM-summarizes old low-importance episodes
  • Backup & export — full graph dump to timestamped JSON
  • Entity deduplication — 3-pass background scan and merge

Installation

pip install reeve

Prerequisites

  • Python 3.10+
  • Neo4j 5.x (Aura or self-hosted with vector index support)
  • LLM provider — one of:

Configuration

Copy the example environment file and fill in your credentials:

cp .env.example .env

Option A — Ollama (local, default)

# Install & start Ollama
brew install ollama
ollama serve          # keep running in a separate terminal

# Pull required models
ollama pull llama3.2:3b
ollama pull nomic-embed-text

Your .env only needs Neo4j credentials — Ollama settings default automatically:

LLM_PROVIDER=ollama
NEO4J_URI=neo4j+s://your-instance.databases.neo4j.io
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password

Option B — OpenAI (cloud)

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-your-api-key
NEO4J_URI=neo4j+s://your-instance.databases.neo4j.io
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password

Create the required Neo4j vector index (run once in Neo4j Browser — set dimension to match your provider):

-- Ollama (nomic-embed-text): 768 dimensions
-- OpenAI (text-embedding-ada-002): 1536 dimensions
CREATE VECTOR INDEX episode_embedding IF NOT EXISTS
FOR (ep:Episode) ON (ep.embedding)
OPTIONS {indexConfig: {
  `vector.dimensions`: 768,
  `vector.similarity_function`: 'cosine'
}};

Verify your setup:

reeve config

Quick Start

Python API

import os
from reeve import store_memory, query_memory

# Set your API key
os.environ["REEVE_API_KEY"] = "sk-XXXXXXXXXXXXXXXXXXXXXXXX"

# Store memories
store_memory("I love playing football", speaker="ankesh")
store_memory("I work at Google as a software engineer", speaker="ankesh")

# Query — even months later, across different conversations
answer = query_memory("My friend invited me to play football, should I go?", speaker="ankesh")
print(answer)  # "Yes, you should! You love playing football."

CLI

# Interactive chat
reeve chat --speaker ankesh

# Store a single memory
reeve store "I love playing football" --speaker ankesh

# Query — connects to stored knowledge automatically
reeve query "My friend invited me to play football, should I go?" --speaker ankesh

# Show active provider, models & run health checks
reeve config

# Admin operations
reeve backup --speaker ankesh
reeve dedup
reeve consolidate --speaker ankesh
reeve reindex --old-model nomic-embed-text

Or via python -m:

python -m reeve chat --speaker ankesh

Architecture

User Input
  │
  ├── Statement ──▶ operator.py ──▶ reconciler.py ──▶ Neo4j Graph
  │                  (LLM)          (entity resolution,
  │                                   state contradiction,
  │                                   graph writing)
  │
  └── Question  ──▶ retriever.py ──▶ Neo4j Vector Index
                     (triple-scoring)    + Graph Traversal
                    ──▶ llm_interface.py ──▶ Answer
                         (LLM)

Switching Providers

Reeve auto-detects the correct embedding dimension and warns you about mismatches. Run reeve config at any time to check your setup.

Ollama → OpenAI

# 1. Update .env
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-your-api-key

# 2. Install the optional OpenAI dependency
pip install "reeve[openai]"

# 3. Verify (will warn if Neo4j index dimension doesn't match)
reeve config

4. Recreate the vector index (1536-dim for text-embedding-ada-002)

In Neo4j Browser:

DROP INDEX episode_embedding;

CREATE VECTOR INDEX episode_embedding IF NOT EXISTS

FOR (ep:Episode) ON (ep.embedding)

OPTIONS {indexConfig: {vector.dimensions: 1536, vector.similarity_function: 'cosine'}};

5. Re-embed all episodes with the new model

reeve reindex --old-model nomic-embed-text


### OpenAI → Ollama

```bash
# 1. Install & start Ollama
brew install ollama && ollama serve
ollama pull llama3.2:3b && ollama pull nomic-embed-text

# 2. Update .env
LLM_PROVIDER=ollama

# 3. Verify
reeve config

# 4. Recreate the vector index (768-dim for nomic-embed-text)
#    In Neo4j Browser:
#      DROP INDEX episode_embedding;
#      CREATE VECTOR INDEX episode_embedding IF NOT EXISTS
#      FOR (ep:Episode) ON (ep.embedding)
#      OPTIONS {indexConfig: {`vector.dimensions`: 768, `vector.similarity_function`: 'cosine'}};

# 5. Re-embed all episodes
reeve reindex --old-model text-embedding-ada-002

Using a Custom Model

Set these in .env to use any Ollama-compatible model:

OLLAMA_CHAT_MODEL=mistral       # any chat model
OLLAMA_EMBED_MODEL=mxbai-embed-large  # any embedding model
EMBEDDING_DIM=1024              # override auto-detection for unknown models

Advanced Usage

Use the lower-level API for fine-grained control:

from reeve.operator import operator_extract
from reeve.reconciler import reconcile, consolidate
from reeve.retriever import retrieve
from reeve.backup import save_backup
from reeve.entity_dedup import deduplicate_entities

Extract semantics

semantics = operator_extract("Alice presented her paper at MIT")

Write to graph

episode_id = reconcile(semantics, speaker="ankesh", raw_text="...")

Retrieve context

context = retrieve("What did Alice do?", speaker="ankesh")

Admin operations

consolidate("ankesh") save_backup(speaker="ankesh") deduplicate_entities(dry_run=False)


## Configuration Reference

| Variable | Default | Description |
|---|---|---|
| `LLM_PROVIDER` | `ollama` | LLM backend: `ollama` (local) or `openai` (cloud) |
| `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama server address |
| `OLLAMA_CHAT_MODEL` | `llama3.2:3b` | Ollama chat model |
| `OLLAMA_EMBED_MODEL` | `nomic-embed-text` | Ollama embedding model (768-dim) |
| `OPENAI_API_KEY` | — | OpenAI API key (required when provider=openai) |
| `OPENAI_CHAT_MODEL` | `gpt-4o` | OpenAI chat model |
| `OPENAI_EMBED_MODEL` | `text-embedding-ada-002` | OpenAI embedding model (1536-dim) |
| `EMBEDDING_DIM` | auto | Override auto-detected embedding dimension |
| `WEIGHT_SIMILARITY` | `0.65` | Vector similarity weight in scoring |
| `WEIGHT_IMPORTANCE` | `0.30` | Importance weight in scoring |
| `WEIGHT_RECENCY` | `0.05` | Recency weight (low for lifetime safety) |
| `RECENCY_HALF_LIFE_DAYS` | `365` | Days for recency to decay by 50% |
| `IMPORTANCE_FLOOR` | `0.75` | Episodes above this ignore recency decay |
| `VECTOR_CANDIDATES_MIN` | `50` | Minimum ANN candidates |
| `VECTOR_CANDIDATES_MAX` | `500` | Maximum ANN candidates |
| `VECTOR_CANDIDATES_RATIO` | `0.02` | Fraction of total episodes to search |
| `CONSOLIDATION_AGE_DAYS` | `90` | Episodes older than this are eligible |
| `CONSOLIDATION_BATCH_SIZE` | `50` | Max episodes per consolidation round |
| `CONSOLIDATION_IMPORTANCE_CAP` | `0.3` | Only consolidate below this importance |

## Lifetime Design Decisions

| Problem | Solution |
|---|---|
| Recency bias buries old memories | Recency weight = 5%, half-life = 365 days |
| Important memories forgotten | Importance floor bypass (≥ 0.75 → recency = 1.0) |
| Fixed candidate pool too small | Dynamic pool: 2% of graph, clamped 50–500 |
| Graph grows unbounded | Consolidation engine merges old trivia |
| State contradictions | SUPERSEDES chain — only latest value active |
| Entity fragmentation | 3-layer entity resolution + background dedup |
| Embedding model changes | Model version tag + batch reindex utility |
| Vector index lag | 5-minute recent-episode safety net |
| No disaster recovery | Full JSON export with incremental backup |

## Development

```bash
# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Lint
ruff check .

# Type check
mypy reeve/

Tech Stack

  • Python 3.10+
  • Neo4j 5.x — Graph database with vector index
  • Ollama (default) — Local LLM & embeddings (llama3.2, nomic-embed-text)
  • OpenAI (optional) — GPT-4o (chat) + text-embedding-ada-002 (embeddings)
  • LangChain — Unified LLM/embedding wrappers
  • NumPy — Cosine similarity computation

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reeve-0.1.9.tar.gz (87.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reeve-0.1.9-py3-none-any.whl (85.4 kB view details)

Uploaded Python 3

File details

Details for the file reeve-0.1.9.tar.gz.

File metadata

  • Download URL: reeve-0.1.9.tar.gz
  • Upload date:
  • Size: 87.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for reeve-0.1.9.tar.gz
Algorithm Hash digest
SHA256 e903a2ac7bf2fd28781e3227f37f3e83d5a7021a3da3f8d3bf323c8d34bdca4a
MD5 23ed1b834148d85493653a3325e78356
BLAKE2b-256 a97ef43019fbb1f86ab643ba6895eac476cfd20f6331365b1b0431ee2b26d497

See more details on using hashes here.

File details

Details for the file reeve-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: reeve-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 85.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for reeve-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 ec219810952cda2e42dcf197caccfeb65a21b425b2fa726f337d5cf6c25bcbd9
MD5 da9b3d4dcbbb8e43f2ebcf8ef66e6d4e
BLAKE2b-256 46638a46433de9f16317d4d5baa130d00c6db14aab5dff1c677a6ea73899fac8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page