Local-first vector memory for AI agents using Zvec and Ollama
Project description
๐ง zvec-memory
Local-first vector memory for AI agents.
In-process vector database with hybrid semantic + keyword search. No cloud. No API keys. Just local embeddings and fast recall.
Quick Start
# Install
pip install zvec-memory
# Ensure Ollama is running with an embedding model
ollama pull nomic-embed-text
# Store a memory
zvec-memory add "Alice prefers local-first architecture" \
--type semantic --importance 8 --tags preferences,architecture
# Search memories
zvec-memory search "what are the architecture preferences"
What Makes This Different
Most agent memory systems are either:
- Cloud-hosted (privacy concerns, latency, cost)
- File-based only (no semantic search, just grep)
- Require external vector DBs (Qdrant, Chroma, etc. โ more infrastructure)
zvec-memory is different:
| Feature | zvec-memory | Alternatives |
|---|---|---|
| Architecture | In-process (Zvec) | Client-server |
| Embeddings | Local (Ollama) | Cloud APIs |
| Hybrid search | Dense + Sparse + Filters | Dense only |
| Setup | pip install |
Docker + config |
| Privacy | 100% local | Cloud exfiltration |
| Latency | <5ms search | Network roundtrip |
Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Your AI Agent โ
โโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ zvec-memory Engine โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Hybrid Search Layer โ โ
โ โ โข Dense (semantic) โ โ
โ โ โข Sparse (keywords) โ โ
โ โ โข Metadata filters โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Embedding Layer โ โ
โ โ โข nomic-embed-text โ โ
โ โ โข BM25 sparse vectors โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Cognitive Layer โ โ
โ โ โข Decay scoring โ โ
โ โ โข Access reinforcement โ โ
โ โ โข Deduplication โ โ
โ โ โข Contradiction detect โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโผโโโโโโโ โโโโโโโโผโโโโโโโโโโโโโโโโ
โ Zvec โ โ SQLite (WAL) โ
โ โข HNSW โ โ โข Graph edges โ
โ โข Dense โ โ โข Embedding cache โ
โ โข Sparse โ โ โข FTS5 full-text โ
โ โข Filters โ โ โข Version chains โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
Memory Taxonomy
zvec-memory uses five cognitive-inspired memory types:
| Type | Purpose | Example |
|---|---|---|
| episodic | Events, conversations | "[2026-02-15] Discussed Zvec integration" |
| semantic | Facts, preferences | "Alice prefers local-first architecture" |
| procedural | How-to patterns | "To check weather: use weather skill" |
| entity | People, projects, things | "Bob: full access to example-project repo" |
| core | Identity, values | "Agent values transparency and privacy" |
Installation
Requirements
- Python 3.10, 3.11, or 3.12 (Zvec requirement)
- macOS or Linux
- Ollama running locally
Install Ollama
# macOS
brew install ollama
ollama serve
# Linux
curl -fsSL https://ollama.com/install.sh | sh
Install zvec-memory
pip install zvec-memory
# Or from source
git clone https://github.com/ereid7/zvec-memory.git
cd zvec-memory
pip install -e ".[dev]"
Pull Embedding Model
ollama pull nomic-embed-text
Usage
CLI
# Store a memory
zvec-memory add "Memory text" --type semantic --importance 7
# Search (hybrid semantic + keyword)
zvec-memory search "query text" --topk 10
# Extract memories from text (requires llama3.2)
zvec-memory extract "conversation text" --source telegram
# Maintenance: run decay and optimize
zvec-memory maintain
# Stats
zvec-memory stats
# Reindex from files
zvec-memory reindex --source all
# Start REST API server
zvec-memory serve --port 8400
Python API
from zvec_memory import MemoryEngine
# Initialize
engine = MemoryEngine()
# Store
engine.store(
text="Alice prefers local-first architecture",
memory_type="semantic",
importance=8.0,
tags=["preferences", "architecture"]
)
# Search
results = engine.recall(
query="cloud vs local",
topk=5,
memory_types=["semantic", "episodic"]
)
for r in results:
print(f"[{r['score']:.3f}] {r['fields']['text']}")
# Context for prompts
from zvec_memory.context import get_context
context = get_context("user message here", max_tokens=500)
REST API
Start the server:
zvec-memory serve --port 8400
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Health check |
GET |
/memories |
List memories |
POST |
/memories |
Store a memory |
POST |
/memories/extract |
Extract facts from text |
POST |
/memories/ingest |
Ingest a document |
GET |
/search |
Search memories |
GET |
/memories/{id} |
Get a memory by ID |
GET |
/memories/{id}/graph |
Get graph edges for a memory |
DELETE |
/memories/{id} |
Forget a memory |
POST |
/memories/{id}/restore |
Restore a forgotten memory |
POST |
/maintain |
Run maintenance (decay + cleanup) |
GET |
/stats |
Memory statistics |
GET |
/graph/stats |
Graph edge statistics |
GET |
/graph/export |
Export graph as JSON |
GET |
/config |
Current embedding config |
๐ Interactive API docs available at
http://localhost:8400/docs(Swagger UI) andhttp://localhost:8400/redoc(ReDoc) when the server is running.
Embedding Models
zvec-memory works with any embedding model via two providers:
Ollama (default, local)
# Default: nomic-embed-text (768-dim, 8K context)
ollama pull nomic-embed-text
# Recommended upgrade: qwen3-embedding (1024-dim, 32K context, #1 MTEB)
ollama pull qwen3-embedding:0.6b
export EMBED_MODEL=qwen3-embedding:0.6b
OpenAI-compatible APIs
# OpenAI
export EMBED_PROVIDER=openai
export EMBED_MODEL=text-embedding-3-small
export EMBED_API_KEY=sk-...
# Any compatible API (Voyage, Cohere, Together, vLLM, etc.)
export EMBED_PROVIDER=openai
export EMBED_URL=https://api.voyageai.com/v1
export EMBED_MODEL=voyage-3
export EMBED_API_KEY=pa-...
Switching Models
Changing embedding models requires rebuilding the vector index:
zvec-memory reindex --source all
Dimensions are auto-detected โ no manual config needed.
Configuration
๐ Full reference: See
docs/CONFIG.mdfor all 30+ environment variables and internal constants.
Key environment variables:
export ZVEC_MEMORY_PATH="~/.zvec-memory"
export OLLAMA_URL="http://127.0.0.1:11434"
export EMBED_MODEL="nomic-embed-text"
export EMBED_PROVIDER="ollama" # or "openai" or "none"
export EMBED_URL="" # for openai provider
export EMBED_API_KEY="" # for openai provider
export EMBED_DIM="0" # 0 = auto-detect
How It Works
1. Hybrid Search
Every query uses three search strategies simultaneously:
- Dense vectors (768-dim): Semantic meaning via nomic-embed-text
- Sparse vectors (BM25): Keyword matching for exact terms
- Metadata filters: Memory type, source, participants, time ranges
Results are merged with weighted reranking (dense weighted 1.2ร, sparse 0.8ร).
2. Cognitive Decay
Memories fade naturally based on:
decay_score = (importance/10) ร recency ร access_factor
recency = exp(-ฮป ร days_since_last_access) # ฮป = 0.03, half-life ~23 days
access_factor = log2(access_count + 1) / 5
Unused memories fade. Frequently-accessed memories stay sharp.
3. Deduplication
Before storing, we check for existing similar memories (>92% similarity). If found:
- Reinforce the existing memory (bump access_count, update last_accessed)
- Skip creating a duplicate
Integration Examples
Custom Agent
class AgentWithMemory:
def __init__(self):
self.memory = MemoryEngine()
def chat(self, message: str) -> str:
# Get relevant context
context = self.memory.recall(message, topk=5)
# Build prompt with context
prompt = f"""Relevant memories:
{format_memories(context)}
User: {message}
Assistant:"""
# Get LLM response
response = llm.generate(prompt)
# Store this exchange
self.memory.store(
text=f"User asked: {message}\nAssistant responded: {response}",
memory_type="episodic",
importance=5
)
return response
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zvec_memory-0.2.1.tar.gz.
File metadata
- Download URL: zvec_memory-0.2.1.tar.gz
- Upload date:
- Size: 97.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28dfb05ce23b824b0373b1907e76cdc1ab553ed9ea92f8525639e8ffa12d70a2
|
|
| MD5 |
799b023e132ccb0eabf9bbbe8b58ba68
|
|
| BLAKE2b-256 |
e2d4c87b5bca74c8e58fb0abb8cd2d84e060f4bf40eddcb236c634b4500fd635
|
File details
Details for the file zvec_memory-0.2.1-py3-none-any.whl.
File metadata
- Download URL: zvec_memory-0.2.1-py3-none-any.whl
- Upload date:
- Size: 48.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae014587214554c52b331523d08a4e846f7d9d4614bca3922b8c4798bd807298
|
|
| MD5 |
8699f7d4b946b003607fe7993d18eeaf
|
|
| BLAKE2b-256 |
f4e4667b9812c853dbbc87f17dfa16d4f43971ba0f40f044850ce86c848231e8
|