PostgreSQL pgvector-based RAG memory system with MCP server

These details have not been verified by PyPI

Project description

RAG Memory

A PostgreSQL pgvector-based RAG (Retrieval-Augmented Generation) memory system with MCP (Model Context Protocol) server for AI agents.

Overview

This POC validates that pgvector with proper vector normalization and HNSW indexing provides significantly better similarity search accuracy compared to ChromaDB. The goal is to achieve similarity scores in the 0.7-0.95 range for semantically similar content, compared to the ~0.3 range currently experienced.

Key Features

PostgreSQL 17 with pgvector extension
OpenAI text-embedding-3-small (1536 dimensions, cost-effective)
Vector normalization for accurate cosine similarity
HNSW indexing for optimal search accuracy (95%+ recall)
Collection management for organizing documents
Metadata support for advanced filtering
CLI interface for easy testing and validation

Architecture

Database Schema

documents table with pgvector support
collections table for organization
document_collections junction table
HNSW index on embeddings for fast similarity search
GIN index on metadata for efficient filtering

Python Application

src/
├── database.py      # PostgreSQL connection management
├── embeddings.py    # OpenAI embeddings with normalization
├── collections.py   # Collection CRUD operations
├── ingestion.py     # Document ingestion pipeline
├── search.py        # Similarity search with pgvector
└── cli.py          # Command-line interface

Prerequisites

Docker & Docker Compose - For PostgreSQL container
uv - Fast Python package manager
Python 3.12 - Specified in .python-version
OpenAI API Key - For embedding generation

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

Quick Start

1. Clone and Setup

cd /Users/timkitchens/projects/ai-projects/rag-memory

# Install dependencies with uv (super fast!)
uv sync

2. Configure Environment

# Copy example environment file
cp .env.example .env

# Edit .env and add your OpenAI API key
# OPENAI_API_KEY=sk-your-api-key-here

3. Start PostgreSQL

# Start PostgreSQL 17 with pgvector on port 5433
docker-compose up -d

# Check container is running
docker-compose ps

# View logs
docker-compose logs -f

4. Initialize Database

# Initialize and test connection
uv run rag init

# Check status
uv run rag status

5. Run Similarity Tests

# This is the key validation step!
# Tests high, medium, and low similarity scenarios
uv run rag test-similarity

Expected output:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Test                      ┃ Expected Range ┃ Actual Score ┃ Status ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━┩
│ High Similarity Test      │ 0.70 - 0.95    │ 0.8542       │ ✓ PASS │
│ Medium Similarity Test    │ 0.50 - 0.75    │ 0.6234       │ ✓ PASS │
│ Low Similarity Test       │ 0.10 - 0.40    │ 0.2145       │ ✓ PASS │
└───────────────────────────┴────────────────┴──────────────┴────────┘

CLI Commands

Collection Management

# Create a collection
uv run rag collection create my-docs --description "My document collection"

# List all collections
uv run rag collection list

# Delete a collection
uv run rag collection delete my-docs

Document Ingestion

# Ingest a single text
uv run rag ingest text "PostgreSQL is a powerful database" --collection tech-docs

# Ingest a file
uv run rag ingest file document.txt --collection tech-docs

# Ingest a directory
uv run rag ingest directory ./docs --collection tech-docs --extensions .txt,.md

# With metadata
uv run rag ingest text "Python tutorial" --collection tutorials --metadata '{"author":"John","topic":"python"}'

Search

# Basic search (searches document chunks)
uv run rag search "What is PostgreSQL?"

# Search within a collection
uv run rag search "database performance" --collection tech-docs

# Limit results
uv run rag search "machine learning" --limit 5

# Filter by similarity threshold
uv run rag search "RAG systems" --threshold 0.7

# Filter by metadata (JSONB containment)
uv run rag search "python tutorial" --metadata '{"language":"python","level":"beginner"}'

# Combine collection and metadata filters
uv run rag search "programming guide" --collection tutorials --metadata '{"language":"python"}'

# Verbose output (show full chunk content)
uv run rag search "vector embeddings" --verbose

# Include full source document content
uv run rag search "embeddings" --show-source

Testing & Benchmarking

# Test similarity scores (validation)
uv run rag test-similarity

# Run performance benchmarks
uv run rag benchmark

# Check database status
uv run rag status

Usage Examples

Example 1: Build a Knowledge Base

# Create collection
uv run rag collection create knowledge-base

# Ingest documents
uv run rag ingest directory ./documentation --collection knowledge-base --extensions .md,.txt

# Search
uv run rag search "How do I configure authentication?" --collection knowledge-base --limit 5

Example 2: Compare Similarity Scores

# Ingest related documents
uv run rag ingest text "PostgreSQL is a relational database" --collection db-test
uv run rag ingest text "MySQL is also a relational database" --collection db-test
uv run rag ingest text "The weather is sunny today" --collection db-test

# Search and compare
uv run rag search "Tell me about databases" --collection db-test --verbose

You should see:

PostgreSQL document: ~0.85 similarity
MySQL document: ~0.75 similarity
Weather document: ~0.15 similarity

Critical Implementation Details

Vector Normalization

This is the #1 most important aspect for accurate similarity scores.

All embeddings are normalized to unit length before storage and during queries:

def normalize_embedding(embedding):
    arr = np.array(embedding)
    norm = np.linalg.norm(arr)
    return (arr / norm).tolist() if norm > 0 else arr.tolist()

Without normalization, you'll see artificially low scores (0.3 range) like ChromaDB.

Distance to Similarity Conversion

pgvector's <=> operator returns cosine distance (0-2), not similarity:

similarity = 1.0 - distance

This converts to a 0-1 scale where 1.0 = identical, 0.0 = orthogonal.

HNSW Index Configuration

CREATE INDEX documents_embedding_idx ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Parameters:

m = 16: Number of connections per node (good default)
ef_construction = 64: Construction-time search depth
Higher values = better recall but slower indexing

Expected Results

Similarity Score Improvements

Content Type	Expected Range	ChromaDB (Current)	pgvector (POC)
Near-identical	0.90-0.99	~0.3	0.90-0.99
Semantically similar	0.70-0.90	~0.3	0.70-0.90
Related topics	0.50-0.70	~0.2	0.50-0.70
Unrelated	0.00-0.30	~0.1	0.00-0.30

Performance Targets

Search latency: < 50ms for 100K documents
Recall: 95%+ with HNSW index
Ingestion: ~2-5 docs/second (OpenAI API limited)

Troubleshooting

Database Connection Errors

# Check if container is running
docker-compose ps

# View logs
docker-compose logs postgres

# Restart container
docker-compose restart

# Reset everything
docker-compose down -v
docker-compose up -d

Low Similarity Scores

If you're seeing low scores (< 0.5 for similar content):

Check normalization: Run uv run rag test-similarity
Verify embeddings: Check that embeddings have unit length
Check HNSW index: Ensure index was created properly

# Connect to database
docker exec -it rag-memory psql -U raguser -d rag_poc

# Check index
\d documents

OpenAI API Errors

# Verify API key is set
echo $OPENAI_API_KEY

# Check .env file
cat .env

# Test with a simple command
uv run rag ingest text "test" --collection test-col

Import Errors

# Reinstall dependencies
uv sync

# Check Python version
python --version  # Should be 3.12

# Verify uv installation
uv --version

MCP Server Usage

This RAG system can be accessed by AI agents via Model Context Protocol (MCP). The MCP server exposes 11 tools for complete document lifecycle management.

What is MCP?

MCP is Anthropic's open standard for connecting AI agents to external systems (adopted by Claude Desktop, OpenAI, and Google DeepMind). Think "USB-C for AI" - provides standardized way for agents to discover and use capabilities.

Quick Start

Convenience commands:

uv run rag-mcp-stdio    # For Claude Desktop/Cursor
uv run rag-mcp-sse      # For MCP Inspector (port 3001)
uv run rag-mcp-http     # For web integrations (port 3001)

Or use the general command with options:

uv run rag-mcp --transport stdio
uv run rag-mcp --transport sse --port 3001
uv run rag-mcp --transport streamable-http --port 3001

Available Tools (11 Total)

Core RAG (3 tools)

search_documents - Semantic search with vector similarity
list_collections - Discover available knowledge bases
ingest_text - Add text content with auto-chunking

Document Management (4 tools)

list_documents - Browse documents with pagination
get_document_by_id - Retrieve full source document
update_document ⭐ - Edit existing documents
delete_document ⭐ - Remove outdated content

Advanced Ingestion (4 tools)

get_collection_info - Collection stats + crawl history
analyze_website ⭐ NEW - Sitemap analysis for planning crawls
ingest_url - Crawl web pages with duplicate prevention
ingest_file - Ingest text files from filesystem
ingest_directory - Batch ingest from directory

Key Features

Context window optimization: Minimal responses by default, optional extended data
Duplicate prevention: ingest_url prevents accidental re-crawling
Website analysis: analyze_website helps agents discover site structure before crawling
Crawl tracking: get_collection_info shows crawl history to avoid duplicates
Memory management: update_document and delete_document keep knowledge current

Complete Documentation

📚 See docs/MCP_SERVER_GUIDE.md for:

Detailed transport mode setup
MCP Inspector testing instructions
Claude Desktop configuration
Tool reference with examples
Common workflows
Troubleshooting guide

Quick Claude Desktop Setup

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "rag-memory": {
      "command": "uv",
      "args": ["--directory", "/FULL/PATH/TO/rag-memory", "run", "rag-mcp-stdio"],
      "env": {
        "OPENAI_API_KEY": "sk-your-key-here"
      }
    }
  }
}

Replace /FULL/PATH/TO/rag-memory with your actual path (run pwd in project directory).

Development

Running Tests

# Run all tests (requires database and API key)
uv run pytest

# Run specific test file
uv run pytest tests/test_embeddings.py -v

# Run without API calls
uv run pytest tests/test_embeddings.py::TestEmbeddingNormalization -v

Code Quality

# Format code
uv run black src/ tests/

# Lint code
uv run ruff check src/ tests/

Project Structure

rag-memory/
├── .env                    # Environment variables (create from .env.example)
├── .env.example           # Environment template
├── .gitignore             # Git ignore patterns
├── .python-version        # Python version for uv
├── docker-compose.yml     # PostgreSQL with pgvector
├── init.sql              # Database schema initialization
├── pyproject.toml        # Project configuration and dependencies
├── README.md             # This file
├── src/
│   ├── __init__.py
│   ├── cli.py           # Command-line interface
│   ├── collections.py   # Collection management
│   ├── database.py      # Database connection
│   ├── embeddings.py    # Embedding generation with normalization
│   ├── ingestion.py     # Document ingestion
│   └── search.py        # Similarity search
└── tests/
    ├── __init__.py
    ├── sample_documents.py  # Test data
    ├── test_embeddings.py   # Embedding tests
    └── test_search.py       # Search tests

Technology Stack

Database: PostgreSQL 17 with pgvector extension
Language: Python 3.12
Package Manager: uv (Astral)
Embedding Model: OpenAI text-embedding-3-small (1536 dims)
CLI Framework: Click + Rich
Testing: pytest
Deployment: Docker Compose

Cost Analysis

OpenAI Embedding Costs

text-embedding-3-small: $0.02 per 1M tokens

Example usage:

10,000 documents × 750 tokens avg = 7.5M tokens
Cost: $0.15 for entire corpus
Per-query: ~$0.00003 (negligible)

Alternative models:

text-embedding-3-large: $0.13/1M tokens (6.5x more expensive)
Cohere Embed v3: $0.10/1M tokens
Self-hosted SBERT: Free (infrastructure costs only)

Migration Path to RAG Retriever

Once POC validates pgvector superiority:

Create adapter layer matching existing VectorStore interface
Parallel run both ChromaDB and pgvector for comparison
Data migration script to transfer embeddings
A/B testing to validate improvements
Gradual rollout starting with new collections
Deprecate ChromaDB after full validation

Success Criteria

✅ Similarity scores in 0.7-0.95 range for good matches
✅ Significantly better than ChromaDB's ~0.3 scores
✅ Query latency < 100ms for reasonable dataset sizes
✅ Easy to integrate into existing RAG Retriever
✅ Clear migration path documented

References

License

This is a proof-of-concept project for internal evaluation.

Support

For issues or questions:

Check the Troubleshooting section above
Review Docker logs: docker-compose logs -f
Verify environment setup: uv run rag status
Run validation tests: uv run rag test-similarity

Key Takeaway: The critical difference between ChromaDB (0.3 scores) and pgvector (0.7-0.95 scores) is vector normalization. This POC demonstrates that proper normalization combined with HNSW indexing provides the accuracy needed for production RAG systems.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.25.1

Jan 19, 2026

0.25.0

Jan 15, 2026

0.24.0

Jan 14, 2026

0.23.0

Jan 14, 2026

0.22.0

Jan 6, 2026

0.21.0

Dec 23, 2025

0.20.0

Dec 4, 2025

0.19.4

Nov 10, 2025

0.19.3

Nov 10, 2025

0.19.2

Nov 10, 2025

0.19.1

Nov 10, 2025

0.19.0

Nov 10, 2025

0.18.0

Nov 10, 2025

0.17.0

Oct 31, 2025

0.16.0

Oct 30, 2025

0.15.0

Oct 30, 2025

0.14.0

Oct 30, 2025

0.13.0

Oct 28, 2025

0.12.1

Oct 28, 2025

0.12.0

Oct 28, 2025

0.11.0

Oct 28, 2025

0.10.0

Oct 28, 2025

0.9.0

Oct 22, 2025

0.8.0

Oct 22, 2025

0.7.0

Oct 13, 2025

0.6.0

Oct 13, 2025

0.5.3

Oct 12, 2025

0.5.2

Oct 12, 2025

0.5.1

Oct 12, 2025

0.5.0

Oct 12, 2025

0.4.0

Oct 12, 2025

0.3.0

Oct 12, 2025

This version

0.2.0

Oct 12, 2025

0.1.0

Oct 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_memory-0.2.0.tar.gz (192.2 kB view details)

Uploaded Oct 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rag_memory-0.2.0-py3-none-any.whl (63.3 kB view details)

Uploaded Oct 12, 2025 Python 3

File details

Details for the file rag_memory-0.2.0.tar.gz.

File metadata

Download URL: rag_memory-0.2.0.tar.gz
Upload date: Oct 12, 2025
Size: 192.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for rag_memory-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8a1fcb6054a5cc1cbc0718f0ca107c282c6f29c7487a76b60a8b3f2395ed8bba`
MD5	`83f011d2c17db666bc2e25cb11b2c5a9`
BLAKE2b-256	`f36ccbb8c18ffd140f2b648ee541c7c7a32f672f071347d7af3993b6a9580535`

See more details on using hashes here.

File details

Details for the file rag_memory-0.2.0-py3-none-any.whl.

File metadata

Download URL: rag_memory-0.2.0-py3-none-any.whl
Upload date: Oct 12, 2025
Size: 63.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.12

File hashes

Hashes for rag_memory-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aca6d43dca86ef5b3c1e03aa6706a3d7e9bd674e1b76622aea307a78e9e8da6a`
MD5	`f41f71423299358f756e08cece00fb76`
BLAKE2b-256	`ce1b354e0053a3ead0c5ffd2a5b9fdca8d35b8e40c3fe732b961e4b881182b7d`

See more details on using hashes here.

rag-memory 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

RAG Memory

Overview

Key Features

Architecture

Database Schema

Python Application

Prerequisites

Install uv

Quick Start

1. Clone and Setup

2. Configure Environment

3. Start PostgreSQL

4. Initialize Database

5. Run Similarity Tests

CLI Commands

Collection Management

Document Ingestion

Search

Testing & Benchmarking

Usage Examples

Example 1: Build a Knowledge Base

Example 2: Compare Similarity Scores

Critical Implementation Details

Vector Normalization

Distance to Similarity Conversion

HNSW Index Configuration

Expected Results

Similarity Score Improvements

Performance Targets

Troubleshooting

Database Connection Errors

Low Similarity Scores

OpenAI API Errors

Import Errors

MCP Server Usage

What is MCP?

Quick Start

Available Tools (11 Total)

Core RAG (3 tools)

Document Management (4 tools)

Advanced Ingestion (4 tools)

Key Features

Complete Documentation

Quick Claude Desktop Setup

Development

Running Tests

Code Quality

Project Structure

Technology Stack

Cost Analysis

OpenAI Embedding Costs

Migration Path to RAG Retriever

Success Criteria

References

License

Support

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes