Local-first vector memory for AI agents using Zvec and Ollama

These details have not been verified by PyPI

Project links

Project description

🧠 zvec-memory

Local-first vector memory for AI agents.

In-process vector database with hybrid semantic + keyword search. No cloud. No API keys. Just local embeddings and fast recall.

Quick Start

# Install
pip install zvec-memory

# Ensure Ollama is running with an embedding model
ollama pull nomic-embed-text

# Store a memory
zvec-memory add "Alice prefers local-first architecture" \
    --type semantic --importance 8 --tags preferences,architecture

# Search memories
zvec-memory search "what are the architecture preferences"

What Makes This Different

Most agent memory systems are either:

Cloud-hosted (privacy concerns, latency, cost)
File-based only (no semantic search, just grep)
Require external vector DBs (Qdrant, Chroma, etc. — more infrastructure)

zvec-memory is different:

Feature	zvec-memory	Alternatives
Architecture	In-process (Zvec)	Client-server
Embeddings	Local (Ollama)	Cloud APIs
Hybrid search	Dense + Sparse + Filters	Dense only
Setup	`pip install`	Docker + config
Privacy	100% local	Cloud exfiltration
Latency	<5ms search	Network roundtrip

Architecture

┌─────────────────────────────────────────┐
│           Your AI Agent                 │
└─────────────┬───────────────────────────┘
              │
┌─────────────▼───────────────────────────┐
│      zvec-memory Engine                 │
│  ┌─────────────────────────┐            │
│  │  Hybrid Search Layer    │            │
│  │  • Dense (semantic)     │            │
│  │  • Sparse (keywords)    │            │
│  │  • Metadata filters     │            │
│  └─────────────────────────┘            │
│  ┌─────────────────────────┐            │
│  │  Embedding Layer        │            │
│  │  • nomic-embed-text     │            │
│  │  • BM25 sparse vectors  │            │
│  └─────────────────────────┘            │
│  ┌─────────────────────────┐            │
│  │  Cognitive Layer        │            │
│  │  • Decay scoring        │            │
│  │  • Access reinforcement │            │
│  │  • Deduplication        │            │
│  │  • Contradiction detect │            │
│  └─────────────────────────┘            │
└──────┬──────────────────┬───────────────┘
       │                  │
┌──────▼──────┐   ┌──────▼───────────────┐
│  Zvec       │   │  SQLite (WAL)        │
│  • HNSW     │   │  • Graph edges       │
│  • Dense    │   │  • Embedding cache   │
│  • Sparse   │   │  • FTS5 full-text    │
│  • Filters  │   │  • Version chains    │
└─────────────┘   └──────────────────────┘

Memory Taxonomy

zvec-memory uses five cognitive-inspired memory types:

Type	Purpose	Example
episodic	Events, conversations	"[2026-02-15] Discussed Zvec integration"
semantic	Facts, preferences	"Alice prefers local-first architecture"
procedural	How-to patterns	"To check weather: use weather skill"
entity	People, projects, things	"Bob: full access to example-project repo"
core	Identity, values	"Agent values transparency and privacy"

Installation

Requirements

Python 3.10, 3.11, or 3.12 (Zvec requirement)
macOS or Linux
Ollama running locally

Install Ollama

# macOS
brew install ollama
ollama serve

# Linux
curl -fsSL https://ollama.com/install.sh | sh

Install zvec-memory

pip install zvec-memory

# Or from source
git clone https://github.com/ereid7/zvec-memory.git
cd zvec-memory
pip install -e ".[dev]"

Pull Embedding Model

ollama pull nomic-embed-text

Usage

CLI

# Store a memory
zvec-memory add "Memory text" --type semantic --importance 7

# Search (hybrid semantic + keyword)
zvec-memory search "query text" --topk 10

# Extract memories from text (requires llama3.2)
zvec-memory extract "conversation text" --source telegram

# Maintenance: run decay and optimize
zvec-memory maintain

# Stats
zvec-memory stats

# Reindex from files
zvec-memory reindex --source all

# Start REST API server
zvec-memory serve --port 8400

Python API

from zvec_memory import MemoryEngine

# Initialize
engine = MemoryEngine()

# Store
engine.store(
    text="Alice prefers local-first architecture",
    memory_type="semantic",
    importance=8.0,
    tags=["preferences", "architecture"]
)

# Search
results = engine.recall(
    query="cloud vs local",
    topk=5,
    memory_types=["semantic", "episodic"]
)

for r in results:
    print(f"[{r['score']:.3f}] {r['fields']['text']}")

# Context for prompts
from zvec_memory.context import get_context

context = get_context("user message here", max_tokens=500)

REST API

Start the server:

zvec-memory serve --port 8400

Method	Endpoint	Description
`GET`	`/health`	Health check
`GET`	`/memories`	List memories
`POST`	`/memories`	Store a memory
`POST`	`/memories/extract`	Extract facts from text
`POST`	`/memories/ingest`	Ingest a document
`GET`	`/search`	Search memories
`GET`	`/memories/{id}`	Get a memory by ID
`GET`	`/memories/{id}/graph`	Get graph edges for a memory
`DELETE`	`/memories/{id}`	Forget a memory
`POST`	`/memories/{id}/restore`	Restore a forgotten memory
`POST`	`/maintain`	Run maintenance (decay + cleanup)
`GET`	`/stats`	Memory statistics
`GET`	`/graph/stats`	Graph edge statistics
`GET`	`/graph/export`	Export graph as JSON
`GET`	`/config`	Current embedding config

📖 Interactive API docs available at http://localhost:8400/docs (Swagger UI) and http://localhost:8400/redoc (ReDoc) when the server is running.

Embedding Models

zvec-memory works with any embedding model via two providers:

Ollama (default, local)

# Default: nomic-embed-text (768-dim, 8K context)
ollama pull nomic-embed-text

# Recommended upgrade: qwen3-embedding (1024-dim, 32K context, #1 MTEB)
ollama pull qwen3-embedding:0.6b
export EMBED_MODEL=qwen3-embedding:0.6b

OpenAI-compatible APIs

# OpenAI
export EMBED_PROVIDER=openai
export EMBED_MODEL=text-embedding-3-small
export EMBED_API_KEY=sk-...

# Any compatible API (Voyage, Cohere, Together, vLLM, etc.)
export EMBED_PROVIDER=openai
export EMBED_URL=https://api.voyageai.com/v1
export EMBED_MODEL=voyage-3
export EMBED_API_KEY=pa-...

Switching Models

Changing embedding models requires rebuilding the vector index:

zvec-memory reindex --source all

Dimensions are auto-detected — no manual config needed.

Configuration

📖 Full reference: See docs/CONFIG.md for all 30+ environment variables and internal constants.

Key environment variables:

export ZVEC_MEMORY_PATH="~/.zvec-memory"
export OLLAMA_URL="http://127.0.0.1:11434"
export EMBED_MODEL="nomic-embed-text"
export EMBED_PROVIDER="ollama"        # or "openai" or "none"
export EMBED_URL=""                   # for openai provider
export EMBED_API_KEY=""               # for openai provider
export EMBED_DIM="0"                  # 0 = auto-detect

How It Works

1. Hybrid Search

Every query uses three search strategies simultaneously:

Dense vectors (768-dim): Semantic meaning via nomic-embed-text
Sparse vectors (BM25): Keyword matching for exact terms
Metadata filters: Memory type, source, participants, time ranges

Results are merged with weighted reranking (dense weighted 1.2×, sparse 0.8×).

2. Cognitive Decay

Memories fade naturally based on:

decay_score = (importance/10) × recency × access_factor

recency = exp(-λ × days_since_last_access)  # λ = 0.03, half-life ~23 days
access_factor = log2(access_count + 1) / 5

Unused memories fade. Frequently-accessed memories stay sharp.

3. Deduplication

Before storing, we check for existing similar memories (>92% similarity). If found:

Reinforce the existing memory (bump access_count, update last_accessed)
Skip creating a duplicate

Integration Examples

Custom Agent

class AgentWithMemory:
    def __init__(self):
        self.memory = MemoryEngine()

    def chat(self, message: str) -> str:
        # Get relevant context
        context = self.memory.recall(message, topk=5)

        # Build prompt with context
        prompt = f"""Relevant memories:
{format_memories(context)}

User: {message}
Assistant:"""

        # Get LLM response
        response = llm.generate(prompt)

        # Store this exchange
        self.memory.store(
            text=f"User asked: {message}\nAssistant responded: {response}",
            memory_type="episodic",
            importance=5
        )

        return response

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.1

Feb 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zvec_memory-0.2.1.tar.gz (97.7 kB view details)

Uploaded Feb 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zvec_memory-0.2.1-py3-none-any.whl (48.4 kB view details)

Uploaded Feb 17, 2026 Python 3

File details

Details for the file zvec_memory-0.2.1.tar.gz.

File metadata

Download URL: zvec_memory-0.2.1.tar.gz
Upload date: Feb 17, 2026
Size: 97.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for zvec_memory-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`28dfb05ce23b824b0373b1907e76cdc1ab553ed9ea92f8525639e8ffa12d70a2`
MD5	`799b023e132ccb0eabf9bbbe8b58ba68`
BLAKE2b-256	`e2d4c87b5bca74c8e58fb0abb8cd2d84e060f4bf40eddcb236c634b4500fd635`

See more details on using hashes here.

File details

Details for the file zvec_memory-0.2.1-py3-none-any.whl.

File metadata

Download URL: zvec_memory-0.2.1-py3-none-any.whl
Upload date: Feb 17, 2026
Size: 48.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for zvec_memory-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ae014587214554c52b331523d08a4e846f7d9d4614bca3922b8c4798bd807298`
MD5	`8699f7d4b946b003607fe7993d18eeaf`
BLAKE2b-256	`f4e4667b9812c853dbbc87f17dfa16d4f43971ba0f40f044850ce86c848231e8`

See more details on using hashes here.

zvec-memory 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🧠 zvec-memory

Quick Start

What Makes This Different

Architecture

Memory Taxonomy

Installation

Requirements

Install Ollama

Install zvec-memory

Pull Embedding Model

Usage

CLI

Python API

REST API

Embedding Models

Ollama (default, local)

OpenAI-compatible APIs

Switching Models

Configuration

How It Works

1. Hybrid Search

2. Cognitive Decay

3. Deduplication

Integration Examples

Custom Agent

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes