PostgreSQL pgvector-based RAG memory system with MCP server
Project description
RAG Memory
A PostgreSQL pgvector-based RAG (Retrieval-Augmented Generation) memory system with MCP (Model Context Protocol) server for AI agents.
Overview
This POC validates that pgvector with proper vector normalization and HNSW indexing provides significantly better similarity search accuracy compared to ChromaDB. The goal is to achieve similarity scores in the 0.7-0.95 range for semantically similar content, compared to the ~0.3 range currently experienced.
Key Features
- PostgreSQL 17 with pgvector extension
- OpenAI text-embedding-3-small (1536 dimensions, cost-effective)
- Vector normalization for accurate cosine similarity
- HNSW indexing for optimal search accuracy (95%+ recall)
- Collection management for organizing documents
- Metadata support for advanced filtering
- CLI interface for easy testing and validation
Architecture
Database Schema
documentstable with pgvector supportcollectionstable for organizationdocument_collectionsjunction table- HNSW index on embeddings for fast similarity search
- GIN index on metadata for efficient filtering
Python Application
src/
├── database.py # PostgreSQL connection management
├── embeddings.py # OpenAI embeddings with normalization
├── collections.py # Collection CRUD operations
├── ingestion.py # Document ingestion pipeline
├── search.py # Similarity search with pgvector
└── cli.py # Command-line interface
Prerequisites
- Docker & Docker Compose - For PostgreSQL container
- uv - Fast Python package manager
- Python 3.12 - Specified in .python-version
- OpenAI API Key - For embedding generation
Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
Quick Start
1. Clone and Setup
cd /Users/timkitchens/projects/ai-projects/rag-memory
# Install dependencies with uv (super fast!)
uv sync
2. Configure Environment
# Copy example environment file
cp .env.example .env
# Edit .env and add your OpenAI API key
# OPENAI_API_KEY=sk-your-api-key-here
3. Start PostgreSQL
# Start PostgreSQL 17 with pgvector on port 5433
docker-compose up -d
# Check container is running
docker-compose ps
# View logs
docker-compose logs -f
4. Initialize Database
# Initialize and test connection
uv run rag init
# Check status
uv run rag status
5. Run Similarity Tests
# This is the key validation step!
# Tests high, medium, and low similarity scenarios
uv run rag test-similarity
Expected output:
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Test ┃ Expected Range ┃ Actual Score ┃ Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━┩
│ High Similarity Test │ 0.70 - 0.95 │ 0.8542 │ ✓ PASS │
│ Medium Similarity Test │ 0.50 - 0.75 │ 0.6234 │ ✓ PASS │
│ Low Similarity Test │ 0.10 - 0.40 │ 0.2145 │ ✓ PASS │
└───────────────────────────┴────────────────┴──────────────┴────────┘
CLI Commands
Collection Management
# Create a collection
uv run rag collection create my-docs --description "My document collection"
# List all collections
uv run rag collection list
# Delete a collection
uv run rag collection delete my-docs
Document Ingestion
# Ingest a single text
uv run rag ingest text "PostgreSQL is a powerful database" --collection tech-docs
# Ingest a file
uv run rag ingest file document.txt --collection tech-docs
# Ingest a directory
uv run rag ingest directory ./docs --collection tech-docs --extensions .txt,.md
# With metadata
uv run rag ingest text "Python tutorial" --collection tutorials --metadata '{"author":"John","topic":"python"}'
Search
# Basic search (searches document chunks)
uv run rag search "What is PostgreSQL?"
# Search within a collection
uv run rag search "database performance" --collection tech-docs
# Limit results
uv run rag search "machine learning" --limit 5
# Filter by similarity threshold
uv run rag search "RAG systems" --threshold 0.7
# Filter by metadata (JSONB containment)
uv run rag search "python tutorial" --metadata '{"language":"python","level":"beginner"}'
# Combine collection and metadata filters
uv run rag search "programming guide" --collection tutorials --metadata '{"language":"python"}'
# Verbose output (show full chunk content)
uv run rag search "vector embeddings" --verbose
# Include full source document content
uv run rag search "embeddings" --show-source
Testing & Benchmarking
# Test similarity scores (validation)
uv run rag test-similarity
# Run performance benchmarks
uv run rag benchmark
# Check database status
uv run rag status
Usage Examples
Example 1: Build a Knowledge Base
# Create collection
uv run rag collection create knowledge-base
# Ingest documents
uv run rag ingest directory ./documentation --collection knowledge-base --extensions .md,.txt
# Search
uv run rag search "How do I configure authentication?" --collection knowledge-base --limit 5
Example 2: Compare Similarity Scores
# Ingest related documents
uv run rag ingest text "PostgreSQL is a relational database" --collection db-test
uv run rag ingest text "MySQL is also a relational database" --collection db-test
uv run rag ingest text "The weather is sunny today" --collection db-test
# Search and compare
uv run rag search "Tell me about databases" --collection db-test --verbose
You should see:
- PostgreSQL document: ~0.85 similarity
- MySQL document: ~0.75 similarity
- Weather document: ~0.15 similarity
Critical Implementation Details
Vector Normalization
This is the #1 most important aspect for accurate similarity scores.
All embeddings are normalized to unit length before storage and during queries:
def normalize_embedding(embedding):
arr = np.array(embedding)
norm = np.linalg.norm(arr)
return (arr / norm).tolist() if norm > 0 else arr.tolist()
Without normalization, you'll see artificially low scores (0.3 range) like ChromaDB.
Distance to Similarity Conversion
pgvector's <=> operator returns cosine distance (0-2), not similarity:
similarity = 1.0 - distance
This converts to a 0-1 scale where 1.0 = identical, 0.0 = orthogonal.
HNSW Index Configuration
CREATE INDEX documents_embedding_idx ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
Parameters:
m = 16: Number of connections per node (good default)ef_construction = 64: Construction-time search depth- Higher values = better recall but slower indexing
Expected Results
Similarity Score Improvements
| Content Type | Expected Range | ChromaDB (Current) | pgvector (POC) |
|---|---|---|---|
| Near-identical | 0.90-0.99 | ~0.3 | 0.90-0.99 |
| Semantically similar | 0.70-0.90 | ~0.3 | 0.70-0.90 |
| Related topics | 0.50-0.70 | ~0.2 | 0.50-0.70 |
| Unrelated | 0.00-0.30 | ~0.1 | 0.00-0.30 |
Performance Targets
- Search latency: < 50ms for 100K documents
- Recall: 95%+ with HNSW index
- Ingestion: ~2-5 docs/second (OpenAI API limited)
Troubleshooting
Database Connection Errors
# Check if container is running
docker-compose ps
# View logs
docker-compose logs postgres
# Restart container
docker-compose restart
# Reset everything
docker-compose down -v
docker-compose up -d
Low Similarity Scores
If you're seeing low scores (< 0.5 for similar content):
- Check normalization: Run
uv run rag test-similarity - Verify embeddings: Check that embeddings have unit length
- Check HNSW index: Ensure index was created properly
# Connect to database
docker exec -it rag-memory psql -U raguser -d rag_poc
# Check index
\d documents
OpenAI API Errors
# Verify API key is set
echo $OPENAI_API_KEY
# Check .env file
cat .env
# Test with a simple command
uv run rag ingest text "test" --collection test-col
Import Errors
# Reinstall dependencies
uv sync
# Check Python version
python --version # Should be 3.12
# Verify uv installation
uv --version
MCP Server Usage
This RAG system can be accessed by AI agents via Model Context Protocol (MCP). The MCP server exposes 11 tools for complete document lifecycle management.
What is MCP?
MCP is Anthropic's open standard for connecting AI agents to external systems (adopted by Claude Desktop, OpenAI, and Google DeepMind). Think "USB-C for AI" - provides standardized way for agents to discover and use capabilities.
Quick Start
Convenience commands:
uv run rag-mcp-stdio # For Claude Desktop/Cursor
uv run rag-mcp-sse # For MCP Inspector (port 3001)
uv run rag-mcp-http # For web integrations (port 3001)
Or use the general command with options:
uv run rag-mcp --transport stdio
uv run rag-mcp --transport sse --port 3001
uv run rag-mcp --transport streamable-http --port 3001
Available Tools (11 Total)
Core RAG (3 tools)
search_documents- Semantic search with vector similaritylist_collections- Discover available knowledge basesingest_text- Add text content with auto-chunking
Document Management (4 tools)
list_documents- Browse documents with paginationget_document_by_id- Retrieve full source documentupdate_document⭐ - Edit existing documentsdelete_document⭐ - Remove outdated content
Advanced Ingestion (4 tools)
get_collection_info- Collection stats + crawl historyanalyze_website⭐ NEW - Sitemap analysis for planning crawlsingest_url- Crawl web pages with duplicate preventioningest_file- Ingest text files from filesystemingest_directory- Batch ingest from directory
Key Features
- Context window optimization: Minimal responses by default, optional extended data
- Duplicate prevention:
ingest_urlprevents accidental re-crawling - Website analysis:
analyze_websitehelps agents discover site structure before crawling - Crawl tracking:
get_collection_infoshows crawl history to avoid duplicates - Memory management:
update_documentanddelete_documentkeep knowledge current
Complete Documentation
📚 See docs/MCP_SERVER_GUIDE.md for:
- Detailed transport mode setup
- MCP Inspector testing instructions
- Claude Desktop configuration
- Tool reference with examples
- Common workflows
- Troubleshooting guide
Quick Claude Desktop Setup
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"rag-memory": {
"command": "uv",
"args": ["--directory", "/FULL/PATH/TO/rag-memory", "run", "rag-mcp-stdio"],
"env": {
"OPENAI_API_KEY": "sk-your-key-here"
}
}
}
}
Replace /FULL/PATH/TO/rag-memory with your actual path (run pwd in project directory).
Development
Running Tests
# Run all tests (requires database and API key)
uv run pytest
# Run specific test file
uv run pytest tests/test_embeddings.py -v
# Run without API calls
uv run pytest tests/test_embeddings.py::TestEmbeddingNormalization -v
Code Quality
# Format code
uv run black src/ tests/
# Lint code
uv run ruff check src/ tests/
Project Structure
rag-memory/
├── .env # Environment variables (create from .env.example)
├── .env.example # Environment template
├── .gitignore # Git ignore patterns
├── .python-version # Python version for uv
├── docker-compose.yml # PostgreSQL with pgvector
├── init.sql # Database schema initialization
├── pyproject.toml # Project configuration and dependencies
├── README.md # This file
├── src/
│ ├── __init__.py
│ ├── cli.py # Command-line interface
│ ├── collections.py # Collection management
│ ├── database.py # Database connection
│ ├── embeddings.py # Embedding generation with normalization
│ ├── ingestion.py # Document ingestion
│ └── search.py # Similarity search
└── tests/
├── __init__.py
├── sample_documents.py # Test data
├── test_embeddings.py # Embedding tests
└── test_search.py # Search tests
Technology Stack
- Database: PostgreSQL 17 with pgvector extension
- Language: Python 3.12
- Package Manager: uv (Astral)
- Embedding Model: OpenAI text-embedding-3-small (1536 dims)
- CLI Framework: Click + Rich
- Testing: pytest
- Deployment: Docker Compose
Cost Analysis
OpenAI Embedding Costs
text-embedding-3-small: $0.02 per 1M tokens
Example usage:
- 10,000 documents × 750 tokens avg = 7.5M tokens
- Cost: $0.15 for entire corpus
- Per-query: ~$0.00003 (negligible)
Alternative models:
- text-embedding-3-large: $0.13/1M tokens (6.5x more expensive)
- Cohere Embed v3: $0.10/1M tokens
- Self-hosted SBERT: Free (infrastructure costs only)
Migration Path to RAG Retriever
Once POC validates pgvector superiority:
- Create adapter layer matching existing VectorStore interface
- Parallel run both ChromaDB and pgvector for comparison
- Data migration script to transfer embeddings
- A/B testing to validate improvements
- Gradual rollout starting with new collections
- Deprecate ChromaDB after full validation
Success Criteria
- ✅ Similarity scores in 0.7-0.95 range for good matches
- ✅ Significantly better than ChromaDB's ~0.3 scores
- ✅ Query latency < 100ms for reasonable dataset sizes
- ✅ Easy to integrate into existing RAG Retriever
- ✅ Clear migration path documented
References
License
This is a proof-of-concept project for internal evaluation.
Support
For issues or questions:
- Check the Troubleshooting section above
- Review Docker logs:
docker-compose logs -f - Verify environment setup:
uv run rag status - Run validation tests:
uv run rag test-similarity
Key Takeaway: The critical difference between ChromaDB (0.3 scores) and pgvector (0.7-0.95 scores) is vector normalization. This POC demonstrates that proper normalization combined with HNSW indexing provides the accuracy needed for production RAG systems.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rag_memory-0.1.0.tar.gz.
File metadata
- Download URL: rag_memory-0.1.0.tar.gz
- Upload date:
- Size: 179.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af24fcc875d25340c117449ba2f050c52f8daf35a3eeb26e50b6a13cd14248c7
|
|
| MD5 |
c4c6eb9739c37cf2a1064ffdf503a496
|
|
| BLAKE2b-256 |
1d9fcfda33dec954d71433fa9637cb04e1acdfd52964a84843e1f8b97b18246f
|
File details
Details for the file rag_memory-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rag_memory-0.1.0-py3-none-any.whl
- Upload date:
- Size: 59.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fd838fdb4872a7ff0387fb0501a423e81723bea615384555abd3a10b95d528b
|
|
| MD5 |
102bb1896f0721781ae66dfcf38e6bda
|
|
| BLAKE2b-256 |
77131389ecd6e89378a6e70dfb0c710be91d5844e7ce0af737d95fad3161f899
|