MCP server for RAG (Retrieval-Augmented Generation) operations with local document indexing - Fork with enhancements

These details have not been verified by PyPI

Project links

Project description

MCP RAG Librarian 📚🦸‍♀️

MCP RAG Librarian

Fork of tungetti/rag-mcp-server with persistence enhancements

Original work by Tommaso Maria Ungetti

Your superhero librarian for intelligent document retrieval! 🦸‍♀️📚

A Model Context Protocol (MCP) server for Retrieval-Augmented Generation (RAG) operations. RAG Librarian provides tools for building and querying vector-based knowledge bases from document collections, enabling semantic search and document retrieval capabilities with the power of a superhero librarian!

Features
Architecture
Installation
Setup
Usage Examples
- Sample LLM Queries
- Command Line Examples
MCP Tools
Technical Details
Configuration Examples
Troubleshooting
Contributing
License

Features

Document Processing: Supports multiple file formats (.txt, .pdf, .md) with automatic text extraction and recursive directory scanning
Intelligent Chunking: Configurable text chunking with overlap to preserve context
Vector Embeddings: Uses SentenceTransformers for high-quality text embeddings
Semantic Search: FAISS-powered similarity search for fast and accurate retrieval
Incremental Updates: Smart document tracking to only process new or changed files
Persistent Caching: FAISS index and embeddings persistence with --persist-cache flag
Auto-Load on Startup: Cached knowledge bases load automatically when server starts
SOLID Architecture: Clean, extensible persistence layer following SOLID principles
SQLite Document Store: Document metadata and change tracking
Flexible Configuration: Customizable embedding models, chunk sizes, and search parameters

Architecture

mcp-rag-librarian/
├── src/rag_mcp_server/
│   ├── server.py              # Main MCP server implementation
│   └── core/
│       ├── document_processor.py   # Document loading and chunking
│       ├── embedding_service.py    # Text embedding generation
│       ├── faiss_index.py          # Vector similarity search
│       ├── document_store.py       # Document metadata storage
│       ├── persistence.py          # SOLID persistence abstractions
│       ├── file_persistence.py     # File-based persistence provider
│       └── persistence_factory.py  # Dependency injection factory

Installation

Using uvx (Recommended)

# Install with uvx (comes with uv)
uvx mcp-rag-librarian

Using pip

pip install mcp-rag-librarian

From source

git clone <repository-url>
cd mcp-rag-librarian
pip install -e .

Setup

The easiest way to run the MCP server is with uvx, but manual setup is also available.

Find the MCP settings file for the client

Claude Desktop

Install Claude Desktop as needed
Open the config file by opening the Claude Desktop app, going into its Settings, opening the 'Developer' tab, and clicking the 'Edit Config' button
Follow the 'Set up the MCP server' steps below

Claude Code

Install Claude Code as needed

Run the following command to add the RAG server:

claude mcp add rag

Or manually add with custom configuration:

claude mcp add-json rag '{"command":"uvx","args":["mcp-rag-librarian","--knowledge-base","/path/to/your/docs","--embedding-model","all-MiniLM-L6-v2","--chunk-size","1000","--chunk-overlap","200"]}'

Cursor

Install Cursor as needed
Open the config file by opening Cursor, going into 'Cursor Settings' (not the normal VSCode IDE settings), opening the 'MCP' tab, and clicking the 'Add new global MCP server' button
Follow the 'Set up the MCP server' steps below

Cline

Install Cline in your IDE as needed
Open the config file by opening your IDE, opening the Cline sidebar, clicking the 'MCP Servers' icon button that is second from left at the top, opening the 'Installed' tab, and clicking the 'Configure MCP Servers' button
Follow the 'Set up the MCP server' steps below

Windsurf

Install Windsurf as needed
Open the config file by opening Windsurf, going into 'Windsurf Settings' (not the normal VSCode IDE settings), opening the 'Cascade' tab, and clicking the 'View raw config' button in the 'Model Context Protocol (MCP) Servers' section
Follow the 'Set up the MCP server' steps below

Any other client

Find the MCP settings file, usually something like [client]_mcp_config.json
Follow the 'Set up the MCP server' steps below

Set up the MCP server

Install uv as needed (uvx comes bundled with uv)

Add the following to your MCP setup:

Basic Configuration:

{
  "mcpServers": {
    "rag": {
      "command": "uvx",
      "args": ["mcp-rag-librarian"]
    }
  }
}

Full Configuration with All Parameters (including persistence):

{
  "mcpServers": {
    "rag": {
      "command": "uvx",
      "args": [
        "mcp-rag-librarian",
        "--knowledge-base", "/path/to/your/documents",
        "--embedding-model", "ibm-granite/granite-embedding-278m-multilingual",
        "--chunk-size", "500",
        "--chunk-overlap", "200",
        "--top-k", "7",
        "--persist-cache",
        "--verbose"
      ]
    }
  }
}

Variant: Manual setup with uvx

If you prefer to run the server manually or need specific Python version:

# Run with default settings
uvx mcp-rag-librarian

# Run with all parameters specified (including persistence)
uvx mcp-rag-librarian \
  --knowledge-base /path/to/documents \
  --embedding-model "ibm-granite/granite-embedding-278m-multilingual" \
  --chunk-size 500 \
  --chunk-overlap 200 \
  --top-k 7 \
  --persist-cache \
  --verbose

# Run from source directory with persistence
uvx --from . mcp-rag-librarian \
  --knowledge-base /home/user/documents \
  --embedding-model "all-MiniLM-L6-v2" \
  --chunk-size 800 \
  --chunk-overlap 100 \
  --top-k 5 \
  --persist-cache

Persistence Feature

🆕 NEW: This fork adds persistent caching capabilities to avoid re-processing documents on server restart.

Key Benefits

Faster Startup: Skip re-initialization when cache exists
Automatic Loading: Cached knowledge bases load on server startup
Smart Caching: Uses MD5 hash of configuration to ensure cache consistency
SOLID Architecture: Extensible persistence layer with clean abstractions

How to Enable Persistence

Add the --persist-cache flag to your configuration:

{
  "mcpServers": {
    "rag": {
      "command": "uvx",
      "args": [
        "mcp-rag-librarian",
        "--knowledge-base", "/path/to/your/docs",
        "--persist-cache"
      ]
    }
  }
}

Cache Behavior

Cache Location: .rag_cache/ directory alongside your knowledge base
Cache Key: Based on path + embedding model + chunk settings
Auto-Load: Cached data loads automatically when server starts
Fallback: Falls back to normal initialization if cache is invalid

Example Workflow

First Run: Initialize knowledge base normally (creates cache)
Server Restart: Cached data loads automatically at startup
Ready to Search: No manual initialization needed

Usage Examples

Sample LLM Queries

Here are example queries you can use with your LLM to interact with the RAG server:

Initialize a knowledge base with custom parameters:

Initialize the knowledge base with:
- knowledge_base_path: "/home/user/research_papers"
- embedding_model: "ibm-granite/granite-embedding-278m-multilingual"
- chunk_size: 300
- chunk_overlap: 50

Search with specific parameters:

Search for "machine learning optimization techniques" in the knowledge base at "/home/user/research_papers" and return the top 10 results with similarity scores.

Initialize with high-quality embeddings:

Set up a knowledge base at "/data/technical_docs" using the "all-mpnet-base-v2" model with chunk_size of 1000 and chunk_overlap of 400 for better context preservation.

Refresh and get statistics:

Refresh the knowledge base at "/home/user/documents" to include any new files, then show me the statistics including total documents, chunks, and current configuration.

List and search documents:

List all documents in the knowledge base, then search for information about "API authentication" and show me the top 5 most relevant chunks.

Complex workflow example:

1. Initialize a knowledge base at "/home/user/project_docs" with embedding_model "all-MiniLM-L6-v2", chunk_size 800, and chunk_overlap 150
2. Show me the statistics
3. Search for "database optimization strategies"
4. List all documents that were processed

Multilingual search example:

Initialize the knowledge base at "/docs/international" using the multilingual model "ibm-granite/granite-embedding-278m-multilingual", then search for "machine learning" in multiple languages and show the top 7 results.

Command Line Examples

High-Quality Configuration for Research:

uvx mcp-rag-librarian \
  --knowledge-base /home/tommasomariaungetti/RAG \
  --embedding-model "all-mpnet-base-v2" \
  --chunk-size 1000 \
  --chunk-overlap 400 \
  --top-k 10 \
  --verbose

Fast Processing for Large Document Sets:

uvx mcp-rag-librarian \
  --knowledge-base /data/large_corpus \
  --embedding-model "all-MiniLM-L6-v2" \
  --chunk-size 2000 \
  --chunk-overlap 100 \
  --top-k 5

Multilingual Document Processing:

uvx mcp-rag-librarian \
  --knowledge-base /docs/multilingual \
  --embedding-model "ibm-granite/granite-embedding-278m-multilingual" \
  --chunk-size 500 \
  --chunk-overlap 200 \
  --top-k 7

Running from Source with Custom Settings:

uvx --from . mcp-rag-librarian \
  --embedding-model "all-MiniLM-L6-v2" \
  --chunk-size 800 \
  --chunk-overlap 100 \
  --top-k 5 \
  --knowledge-base /home/tommasomariaungetti/RAG

MCP Tools

The following tools are available:

1. initialize_knowledge_base

Initialize a knowledge base from a directory of documents.

Parameters:

knowledge_base_path (optional): Path to document directory - defaults to server config
embedding_model (optional): Model name for embeddings - defaults to "ibm-granite/granite-embedding-278m-multilingual"
chunk_size (optional): Maximum chunk size in characters - defaults to 500
chunk_overlap (optional): Chunk overlap size in characters - defaults to 200

Example Tool Call:

{
  "tool": "initialize_knowledge_base",
  "arguments": {
    "knowledge_base_path": "/path/to/docs",
    "embedding_model": "all-mpnet-base-v2",
    "chunk_size": 1000,
    "chunk_overlap": 200
  }
}

Example LLM Query:

"Initialize a knowledge base from /home/user/documents using the all-mpnet-base-v2 embedding model with 1000 character chunks and 200 character overlap"

2. semantic_search

Perform semantic search on the knowledge base.

Parameters:

query: Search query text
knowledge_base_path (optional): Path to knowledge base - defaults to current KB
top_k (optional): Number of results to return - defaults to 7
include_scores (optional): Include similarity scores - defaults to false

Example Tool Call:

{
  "tool": "semantic_search",
  "arguments": {
    "query": "How to implement RAG systems?",
    "knowledge_base_path": "/path/to/docs",
    "top_k": 5,
    "include_scores": true
  }
}

Example LLM Query:

"Search for 'machine learning optimization techniques' and show me the top 5 results with similarity scores"

3. refresh_knowledge_base

Update the knowledge base with new or changed documents.

Parameters:

knowledge_base_path (optional): Path to knowledge base - defaults to current KB

Example Tool Call:

{
  "tool": "refresh_knowledge_base",
  "arguments": {
    "knowledge_base_path": "/path/to/docs"
  }
}

Example LLM Query:

"Refresh the knowledge base to include any new or modified documents"

4. get_knowledge_base_stats

Get detailed statistics about the knowledge base.

Parameters:

knowledge_base_path (optional): Path to knowledge base - defaults to current KB

Example Tool Call:

{
  "tool": "get_knowledge_base_stats",
  "arguments": {
    "knowledge_base_path": "/path/to/docs"
  }
}

Example LLM Query:

"Show me the statistics for the knowledge base including document count, chunk information, and current configuration"

5. list_documents

List all documents in the knowledge base with metadata.

Parameters:

knowledge_base_path (optional): Path to knowledge base - defaults to current KB

Example Tool Call:

{
  "tool": "list_documents",
  "arguments": {
    "knowledge_base_path": "/path/to/docs"
  }
}

Example LLM Query:

"List all documents in the knowledge base with their chunk counts and metadata"

Technical Details

Document Processing

The system uses a sophisticated document processing pipeline:

File Discovery: Recursively scans directories and subdirectories for supported file types
Supported Formats:
- .txt files: Plain text documents
- .pdf files: PDF documents with text extraction
- .md files: Markdown documents (processed as text)
Content Extraction:
- Plain text/Markdown: Direct UTF-8/Latin-1 reading with encoding fallback
- PDF files: PyMuPDF-based text extraction
Text Chunking:
- Splits documents into manageable chunks
- Preserves word boundaries
- Maintains context with configurable overlap

Embedding Generation

Default Model: ibm-granite/granite-embedding-278m-multilingual
Batch Processing: Efficient batch encoding for large document sets
Fallback Support: Automatic fallback to all-MiniLM-L6-v2 if primary model fails
Progress Tracking: Visual progress bars for large operations

Vector Search

Index Type: FAISS IndexFlatIP (Inner Product)
Similarity Metric: Cosine similarity (via L2 normalization)
Performance: Scales to millions of documents
Accuracy: Exact nearest neighbor search

Document Store

Storage: SQLite database
Tracking: File hash, modification time, chunk count
Incremental Updates: Only processes changed files
Location: Stored alongside knowledge base documents

Configuration Examples

MCP Client Configurations

Basic Configuration (Claude Desktop/Cursor/Cline):

{
  "mcpServers": {
    "rag": {
      "command": "uvx",
      "args": ["mcp-rag-librarian"]
    }
  }
}

Full Configuration with All Parameters:

{
  "mcpServers": {
    "rag": {
      "command": "uvx",
      "args": [
        "mcp-rag-librarian",
        "--knowledge-base", "/path/to/documents",
        "--embedding-model", "ibm-granite/granite-embedding-278m-multilingual",
        "--chunk-size", "500",
        "--chunk-overlap", "200",
        "--top-k", "7",
        "--verbose"
      ]
    }
  }
}

Multiple Knowledge Base Configuration:

{
  "mcpServers": {
    "rag-technical": {
      "command": "uvx",
      "args": [
        "mcp-rag-librarian",
        "--knowledge-base", "/docs/technical",
        "--embedding-model", "all-mpnet-base-v2",
        "--chunk-size", "1000",
        "--chunk-overlap", "400"
      ]
    },
    "rag-research": {
      "command": "uvx",
      "args": [
        "mcp-rag-librarian",
        "--knowledge-base", "/docs/research",
        "--embedding-model", "all-MiniLM-L6-v2",
        "--chunk-size", "500",
        "--chunk-overlap", "100",
        "--port", "8001"
      ]
    }
  }
}

Command Line Examples

High-Quality Configuration for Research:

uvx mcp-rag-librarian \
  --knowledge-base /path/to/research/docs \
  --embedding-model "all-mpnet-base-v2" \
  --chunk-size 1000 \
  --chunk-overlap 400 \
  --top-k 10

Fast Processing Configuration:

uvx mcp-rag-librarian \
  --knowledge-base /path/to/large/corpus \
  --embedding-model "all-MiniLM-L6-v2" \
  --chunk-size 2000 \
  --chunk-overlap 100 \
  --top-k 5

Multilingual Configuration:

uvx mcp-rag-librarian \
  --knowledge-base /path/to/multilingual/docs \
  --embedding-model "ibm-granite/granite-embedding-278m-multilingual" \
  --chunk-size 500 \
  --chunk-overlap 200 \
  --top-k 7

Development Configuration with Verbose Logging:

uvx --from . mcp-rag-librarian \
  --knowledge-base ./test_documents \
  --embedding-model "all-MiniLM-L6-v2" \
  --chunk-size 300 \
  --chunk-overlap 50 \
  --top-k 3 \
  --verbose

Error Handling

The server implements comprehensive error handling:

File Access Errors: Graceful handling of permission issues
Encoding Errors: Automatic encoding detection and fallback
Model Loading Errors: Fallback to default models
Database Errors: Transaction rollback and recovery
Search Errors: Informative error messages

Performance Considerations

Memory Usage

Embeddings are stored in memory for fast search
Approximate memory: num_chunks × embedding_dimension × 4 bytes
Example: 10,000 chunks × 384 dimensions ≈ 15 MB

Processing Speed

Document processing: ~100-500 docs/minute (depending on size)
Embedding generation: ~50-200 chunks/second (model dependent)
Search latency: <10ms for 100K documents

Optimization Tips

Use smaller embedding models for faster processing
Increase chunk size for fewer chunks (may reduce accuracy)
Decrease overlap for faster processing (may lose context)
Use SSD storage for document store database

Development

Running Tests

pytest tests/

Code Formatting

black src/
isort src/

Type Checking

mypy src/

Troubleshooting

Common Issues

"No knowledge base path provided"
- Solution: Either provide path in tool call or use --knowledge-base flag
"Model mismatch detected"
- Solution: This is a warning; the system will use the closest available model
"Failed to initialize embedding model"
- Solution: Check internet connection or use a locally cached model
"No documents found in knowledge base"
- Solution: Ensure directory contains .txt, .pdf, or .md files (searches recursively in subdirectories)

Debug Mode

Enable verbose logging for troubleshooting:

uvx mcp-rag-librarian --verbose

Testing with MCP Inspector

Test your RAG server interactively using MCP Inspector:

npx @modelcontextprotocol/inspector --config mcp_inspector_config.json

This will launch a web interface where you can test all MCP tools interactively. Make sure you have a mcp_inspector_config.json file in your project root with your server configuration.

Help and Resources

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

License

MIT License - see LICENSE file for details.

Acknowledgments

Original Work: This is a fork of rag-mcp-server by Tommaso Maria Ungetti
Built on MCP (Model Context Protocol)
Powered by Sentence Transformers
Vector search by FAISS
PDF processing by PyMuPDF

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.1

Sep 28, 2025

0.3.0

Sep 28, 2025

0.2.9

Sep 28, 2025

0.2.8

Sep 28, 2025

0.2.7

Sep 27, 2025

0.2.6

Sep 27, 2025

0.2.5

Sep 27, 2025

0.2.4

Sep 27, 2025

0.2.3

Sep 27, 2025

0.2.2

Sep 27, 2025

0.2.1

Sep 27, 2025

This version

0.2.0

Sep 26, 2025

0.1.1

Sep 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_rag_librarian-0.2.0.tar.gz (31.3 kB view details)

Uploaded Sep 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcp_rag_librarian-0.2.0-py3-none-any.whl (28.9 kB view details)

Uploaded Sep 26, 2025 Python 3

File details

Details for the file mcp_rag_librarian-0.2.0.tar.gz.

File metadata

Download URL: mcp_rag_librarian-0.2.0.tar.gz
Upload date: Sep 26, 2025
Size: 31.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for mcp_rag_librarian-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`df1c78388b721062a0a727c3a17fa220f0d5845a7a313b77d01c9182d763b236`
MD5	`6a524b9b829ced2b5f4ceda6c45c0b5d`
BLAKE2b-256	`f972a163c8347aaa2aa402bb273aaf2da7069d1b29458032ef94e97877f663af`

See more details on using hashes here.

File details

Details for the file mcp_rag_librarian-0.2.0-py3-none-any.whl.

File metadata

Download URL: mcp_rag_librarian-0.2.0-py3-none-any.whl
Upload date: Sep 26, 2025
Size: 28.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for mcp_rag_librarian-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e5b6585e0664174da24357b9b218862483b972b4992f04b729e96e5d1f7fcbe0`
MD5	`cf976b8cfcad6fa4f4ff8c961c9d54cb`
BLAKE2b-256	`4809ac399c81ca6f63106189632448f1b205f61a330e5447a9cd6b9a21652116`

See more details on using hashes here.

mcp-rag-librarian 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MCP RAG Librarian 📚🦸‍♀️

MCP RAG Librarian

Features

Architecture

Installation

Using uvx (Recommended)

Using pip

From source

Setup

Find the MCP settings file for the client

Claude Desktop

Claude Code

Cursor

Cline

Windsurf

Any other client

Set up the MCP server

Variant: Manual setup with uvx

Persistence Feature

Key Benefits

How to Enable Persistence

Cache Behavior

Example Workflow

Usage Examples

Sample LLM Queries

Command Line Examples

MCP Tools

1. initialize_knowledge_base

2. semantic_search

3. refresh_knowledge_base

4. get_knowledge_base_stats

5. list_documents

Technical Details

Document Processing

Embedding Generation

Vector Search

Document Store

Configuration Examples

MCP Client Configurations

Command Line Examples

Error Handling

Performance Considerations

Memory Usage

Processing Speed

Optimization Tips

Development

Running Tests

Code Formatting

Type Checking

Troubleshooting

Common Issues

Debug Mode

Testing with MCP Inspector

Help and Resources

Contributing

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes