Skip to main content

Local-first vector memory for AI agents using Zvec and Ollama

Project description

๐Ÿง  zvec-memory

Local-first vector memory for AI agents.

Python License: MIT Zvec Ollama

In-process vector database with hybrid semantic + keyword search. No cloud. No API keys. Just local embeddings and fast recall.


Quick Start

# Install
pip install zvec-memory

# Ensure Ollama is running with an embedding model
ollama pull nomic-embed-text

# Store a memory
zvec-memory add "Alice prefers local-first architecture" \
    --type semantic --importance 8 --tags preferences,architecture

# Search memories
zvec-memory search "what are the architecture preferences"

What Makes This Different

Most agent memory systems are either:

  • Cloud-hosted (privacy concerns, latency, cost)
  • File-based only (no semantic search, just grep)
  • Require external vector DBs (Qdrant, Chroma, etc. โ€” more infrastructure)

zvec-memory is different:

Feature zvec-memory Alternatives
Architecture In-process (Zvec) Client-server
Embeddings Local (Ollama) Cloud APIs
Hybrid search Dense + Sparse + Filters Dense only
Setup pip install Docker + config
Privacy 100% local Cloud exfiltration
Latency <5ms search Network roundtrip

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           Your AI Agent                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚      zvec-memory Engine                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”‚
โ”‚  โ”‚  Hybrid Search Layer    โ”‚            โ”‚
โ”‚  โ”‚  โ€ข Dense (semantic)     โ”‚            โ”‚
โ”‚  โ”‚  โ€ข Sparse (keywords)    โ”‚            โ”‚
โ”‚  โ”‚  โ€ข Metadata filters     โ”‚            โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”‚
โ”‚  โ”‚  Embedding Layer        โ”‚            โ”‚
โ”‚  โ”‚  โ€ข nomic-embed-text     โ”‚            โ”‚
โ”‚  โ”‚  โ€ข BM25 sparse vectors  โ”‚            โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”‚
โ”‚  โ”‚  Cognitive Layer        โ”‚            โ”‚
โ”‚  โ”‚  โ€ข Decay scoring        โ”‚            โ”‚
โ”‚  โ”‚  โ€ข Access reinforcement โ”‚            โ”‚
โ”‚  โ”‚  โ€ข Deduplication        โ”‚            โ”‚
โ”‚  โ”‚  โ€ข Contradiction detect โ”‚            โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚                  โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Zvec       โ”‚   โ”‚  SQLite (WAL)        โ”‚
โ”‚  โ€ข HNSW     โ”‚   โ”‚  โ€ข Graph edges       โ”‚
โ”‚  โ€ข Dense    โ”‚   โ”‚  โ€ข Embedding cache   โ”‚
โ”‚  โ€ข Sparse   โ”‚   โ”‚  โ€ข FTS5 full-text    โ”‚
โ”‚  โ€ข Filters  โ”‚   โ”‚  โ€ข Version chains    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Memory Taxonomy

zvec-memory uses five cognitive-inspired memory types:

Type Purpose Example
episodic Events, conversations "[2026-02-15] Discussed Zvec integration"
semantic Facts, preferences "Alice prefers local-first architecture"
procedural How-to patterns "To check weather: use weather skill"
entity People, projects, things "Bob: full access to example-project repo"
core Identity, values "Agent values transparency and privacy"

Installation

Requirements

  • Python 3.10, 3.11, or 3.12 (Zvec requirement)
  • macOS or Linux
  • Ollama running locally

Install Ollama

# macOS
brew install ollama
ollama serve

# Linux
curl -fsSL https://ollama.com/install.sh | sh

Install zvec-memory

pip install zvec-memory

# Or from source
git clone https://github.com/ereid7/zvec-memory.git
cd zvec-memory
pip install -e ".[dev]"

Pull Embedding Model

ollama pull nomic-embed-text

Usage

CLI

# Store a memory
zvec-memory add "Memory text" --type semantic --importance 7

# Search (hybrid semantic + keyword)
zvec-memory search "query text" --topk 10

# Extract memories from text (requires llama3.2)
zvec-memory extract "conversation text" --source telegram

# Maintenance: run decay and optimize
zvec-memory maintain

# Stats
zvec-memory stats

# Reindex from files
zvec-memory reindex --source all

# Start REST API server
zvec-memory serve --port 8400

Python API

from zvec_memory import MemoryEngine

# Initialize
engine = MemoryEngine()

# Store
engine.store(
    text="Alice prefers local-first architecture",
    memory_type="semantic",
    importance=8.0,
    tags=["preferences", "architecture"]
)

# Search
results = engine.recall(
    query="cloud vs local",
    topk=5,
    memory_types=["semantic", "episodic"]
)

for r in results:
    print(f"[{r['score']:.3f}] {r['fields']['text']}")

# Context for prompts
from zvec_memory.context import get_context

context = get_context("user message here", max_tokens=500)

REST API

Start the server:

zvec-memory serve --port 8400
Method Endpoint Description
GET /health Health check
GET /memories List memories
POST /memories Store a memory
POST /memories/extract Extract facts from text
POST /memories/ingest Ingest a document
GET /search Search memories
GET /memories/{id} Get a memory by ID
GET /memories/{id}/graph Get graph edges for a memory
DELETE /memories/{id} Forget a memory
POST /memories/{id}/restore Restore a forgotten memory
POST /maintain Run maintenance (decay + cleanup)
GET /stats Memory statistics
GET /graph/stats Graph edge statistics
GET /graph/export Export graph as JSON
GET /config Current embedding config

๐Ÿ“– Interactive API docs available at http://localhost:8400/docs (Swagger UI) and http://localhost:8400/redoc (ReDoc) when the server is running.


Embedding Models

zvec-memory works with any embedding model via two providers:

Ollama (default, local)

# Default: nomic-embed-text (768-dim, 8K context)
ollama pull nomic-embed-text

# Recommended upgrade: qwen3-embedding (1024-dim, 32K context, #1 MTEB)
ollama pull qwen3-embedding:0.6b
export EMBED_MODEL=qwen3-embedding:0.6b

OpenAI-compatible APIs

# OpenAI
export EMBED_PROVIDER=openai
export EMBED_MODEL=text-embedding-3-small
export EMBED_API_KEY=sk-...

# Any compatible API (Voyage, Cohere, Together, vLLM, etc.)
export EMBED_PROVIDER=openai
export EMBED_URL=https://api.voyageai.com/v1
export EMBED_MODEL=voyage-3
export EMBED_API_KEY=pa-...

Switching Models

Changing embedding models requires rebuilding the vector index:

zvec-memory reindex --source all

Dimensions are auto-detected โ€” no manual config needed.


Configuration

๐Ÿ“– Full reference: See docs/CONFIG.md for all 30+ environment variables and internal constants.

Key environment variables:

export ZVEC_MEMORY_PATH="~/.zvec-memory"
export OLLAMA_URL="http://127.0.0.1:11434"
export EMBED_MODEL="nomic-embed-text"
export EMBED_PROVIDER="ollama"        # or "openai" or "none"
export EMBED_URL=""                   # for openai provider
export EMBED_API_KEY=""               # for openai provider
export EMBED_DIM="0"                  # 0 = auto-detect

How It Works

1. Hybrid Search

Every query uses three search strategies simultaneously:

  • Dense vectors (768-dim): Semantic meaning via nomic-embed-text
  • Sparse vectors (BM25): Keyword matching for exact terms
  • Metadata filters: Memory type, source, participants, time ranges

Results are merged with weighted reranking (dense weighted 1.2ร—, sparse 0.8ร—).

2. Cognitive Decay

Memories fade naturally based on:

decay_score = (importance/10) ร— recency ร— access_factor

recency = exp(-ฮป ร— days_since_last_access)  # ฮป = 0.03, half-life ~23 days
access_factor = log2(access_count + 1) / 5

Unused memories fade. Frequently-accessed memories stay sharp.

3. Deduplication

Before storing, we check for existing similar memories (>92% similarity). If found:

  • Reinforce the existing memory (bump access_count, update last_accessed)
  • Skip creating a duplicate

Integration Examples

Custom Agent

class AgentWithMemory:
    def __init__(self):
        self.memory = MemoryEngine()

    def chat(self, message: str) -> str:
        # Get relevant context
        context = self.memory.recall(message, topk=5)

        # Build prompt with context
        prompt = f"""Relevant memories:
{format_memories(context)}

User: {message}
Assistant:"""

        # Get LLM response
        response = llm.generate(prompt)

        # Store this exchange
        self.memory.store(
            text=f"User asked: {message}\nAssistant responded: {response}",
            memory_type="episodic",
            importance=5
        )

        return response

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zvec_memory-0.2.1.tar.gz (97.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zvec_memory-0.2.1-py3-none-any.whl (48.4 kB view details)

Uploaded Python 3

File details

Details for the file zvec_memory-0.2.1.tar.gz.

File metadata

  • Download URL: zvec_memory-0.2.1.tar.gz
  • Upload date:
  • Size: 97.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for zvec_memory-0.2.1.tar.gz
Algorithm Hash digest
SHA256 28dfb05ce23b824b0373b1907e76cdc1ab553ed9ea92f8525639e8ffa12d70a2
MD5 799b023e132ccb0eabf9bbbe8b58ba68
BLAKE2b-256 e2d4c87b5bca74c8e58fb0abb8cd2d84e060f4bf40eddcb236c634b4500fd635

See more details on using hashes here.

File details

Details for the file zvec_memory-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: zvec_memory-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 48.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for zvec_memory-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ae014587214554c52b331523d08a4e846f7d9d4614bca3922b8c4798bd807298
MD5 8699f7d4b946b003607fe7993d18eeaf
BLAKE2b-256 f4e4667b9812c853dbbc87f17dfa16d4f43971ba0f40f044850ce86c848231e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page