A Model Context Protocol (MCP) server implementation for txtai

Project description

Embedding MCP Server

A Model Context Protocol (MCP) server implementation powered by txtai, providing semantic search, knowledge graph capabilities, and AI-driven text processing through a standardized interface.

The Power of txtai: All-in-one Embeddings Database

This project leverages txtai, an all-in-one embeddings database for RAG leveraging semantic search, knowledge graph construction, and language model workflows. txtai offers several key advantages:

Unified Vector Database: Combines vector indexes, graph networks, and relational databases in a single platform
Semantic Search: Find information based on meaning, not just keywords
Knowledge Graph Integration: Automatically build and query knowledge graphs from your data
Portable Knowledge Bases: Save entire knowledge bases as compressed archives (.tar.gz) that can be easily shared and loaded
Extensible Pipeline System: Process text, documents, audio, images, and video through a unified API
Local-first Architecture: Run everything locally without sending data to external services

How It Works

The project contains a knowledge base builder tool and a MCP server. The knowledge base builder tool is a command-line interface for creating and managing knowledge bases. The MCP server provides a standardized interface to access the knowledge base.

It is not required to use the knowledge base builder tool to build a knowledge base. You can always build a knowledge base using txtai's programming interface by writing a Python script or even using a jupyter notebook. As long as the knowledge base is built using txtai, it can be loaded by the MCP server. Better yet, the knowledge base can be a folder on the file system or an exported .tar.gz file. Just give it to the MCP server and it will load it.

1. Build a Knowledge Base with kb_builder

The kb_builder module provides a command-line interface for creating and managing knowledge bases:

Process documents from various sources (files, directories, JSON)
Extract text and create embeddings
Build knowledge graphs automatically
Export portable knowledge bases

Note it is possibly limited in functionality and currently only provided for convenience.

2. Start the MCP Server

The MCP server provides a standardized interface to access the knowledge base:

Semantic search capabilities
Knowledge graph querying and visualization
Text processing pipelines (summarization, extraction, etc.)
Full compliance with the Model Context Protocol

Installation

Using Conda (Recommended)

# Create a new conda environment
conda create -n embedding-mcp python=3.10
conda activate embedding-mcp

# Clone the repository
git clone https://github.com/Geeksfino/kb-mcp-server.git.git
cd kb-mcp-server

# Install dependencies
pip install -e .

Using uv (Faster Alternative)

# Install uv if not already installed
pip install uv

# Create a new virtual environment
uv venv
source .venv/bin/activate

# Install dependencies
uv pip install -e .

Command Line Usage

Building a Knowledge Base

You can use either the Python module directly or the convenient shell scripts:

Using the Python Module

# Build a knowledge base from documents
python -m kb_builder build --input /path/to/documents --config config.yml

# Update an existing knowledge base with new documents
python -m kb_builder build --input /path/to/new_documents --update

# Export a knowledge base for portability
python -m kb_builder build --input /path/to/documents --export my_knowledge_base.tar.gz

Using the Convenience Scripts

The repository includes convenient wrapper scripts that make it easier to build and search knowledge bases:

# Build a knowledge base using a template configuration
./scripts/kb_build.sh /path/to/documents technical_docs

# Build using a custom configuration file
./scripts/kb_build.sh /path/to/documents /path/to/my_config.yml

# Update an existing knowledge base
./scripts/kb_build.sh /path/to/documents technical_docs --update

# Search a knowledge base
./scripts/kb_search.sh /path/to/knowledge_base "What is machine learning?"

# Search with graph enhancement
./scripts/kb_search.sh /path/to/knowledge_base "What is machine learning?" --graph

Run ./scripts/kb_build.sh --help or ./scripts/kb_search.sh --help for more options.

Starting the MCP Server

# Start with a specific knowledge base folder
python -m txtai_mcp_server --embeddings /path/to/knowledge_base_folder

# Start with a given knowledge base archive
python -m txtai_mcp_server --embeddings /path/to/knowledge_base.tar.gz

MCP Server Configuration

The MCP server is configured using environment variables or command-line arguments, not YAML files. YAML files are only used for configuring txtai components during knowledge base building.

Here's how to configure the MCP server:

# Start the server with command-line arguments
python -m txtai_mcp_server --embeddings /path/to/knowledge_base --host 0.0.0.0 --port 8000

# Or use environment variables
export TXTAI_EMBEDDINGS=/path/to/knowledge_base
export MCP_SSE_HOST=0.0.0.0
export MCP_SSE_PORT=8000
python -m txtai_mcp_server

Common configuration options:

--embeddings: Path to the knowledge base (required)
--host: Host address to bind to (default: localhost)
--port: Port to listen on (default: 8000)
--transport: Transport to use, either 'sse' or 'stdio' (default: stdio)
--enable-causal-boost: Enable causal boost feature for enhanced relevance scoring
--causal-config: Path to custom causal boost configuration YAML file

Advanced Knowledge Base Configuration

Building a knowledge base with txtai requires a YAML configuration file that controls various aspects of the embedding process. This configuration is used by the kb_builder tool, not the MCP server itself.

One may need to tune segmentation/chunking strategies, embedding models, and scoring methods, as well as configure graph construction, causal boosting, weights of hybrid search, and more.

Fortunately, txtai provides a powerful YAML configuration system that requires no coding. Here's an example of a comprehensive configuration for knowledge base building:

# Path to save/load embeddings index
path: ~/.txtai/embeddings
writable: true

# Content storage in SQLite
content:
  path: sqlite:///~/.txtai/content.db

# Embeddings configuration
embeddings:
  # Model settings
  path: sentence-transformers/nli-mpnet-base-v2
  backend: faiss
  gpu: true
  batch: 32
  normalize: true
  
  # Scoring settings
  scoring: hybrid
  hybridalpha: 0.75

# Pipeline configuration
pipeline:
  workers: 2
  queue: 100
  timeout: 300

# Question-answering pipeline
extractor:
  path: distilbert-base-cased-distilled-squad
  maxlength: 512
  minscore: 0.3

# Graph configuration
graph:
  backend: sqlite
  path: ~/.txtai/graph.db
  similarity: 0.75  # Threshold for creating graph connections
  limit: 10  # Maximum connections per node

Configuration Examples

The src/kb_builder/configs directory contains configuration templates for different use cases and storage backends:

Storage and Backend Configurations

memory.yml: In-memory vectors (fastest for development, no persistence)
sqlite-faiss.yml: SQLite for content + FAISS for vectors (local file-based persistence)
postgres-pgvector.yml: PostgreSQL + pgvector (production-ready with full persistence)

Domain-Specific Configurations

base.yml: Base configuration template
code_repositories.yml: Optimized for code repositories
data_science.yml: Configured for data science documents
general_knowledge.yml: General purpose knowledge base
research_papers.yml: Optimized for academic papers
technical_docs.yml: Configured for technical documentation

You can use these as starting points for your own configurations:

python -m kb_builder build --input /path/to/documents --config src/kb_builder/configs/technical_docs.yml

# Or use a storage-specific configuration
python -m kb_builder build --input /path/to/documents --config src/kb_builder/configs/postgres-pgvector.yml

Advanced Features

Knowledge Graph Capabilities

The MCP server leverages txtai's built-in graph functionality to provide powerful knowledge graph capabilities:

Automatic Graph Construction: Build knowledge graphs from your documents automatically
Graph Traversal: Navigate through related concepts and documents
Path Finding: Discover connections between different pieces of information
Community Detection: Identify clusters of related information

Causal Boosting Mechanism

The MCP server includes a sophisticated causal boosting mechanism that enhances search relevance by identifying and prioritizing causal relationships:

Pattern Recognition: Detects causal language patterns in both queries and documents
Multilingual Support: Automatically applies appropriate patterns based on detected query language
Configurable Boost Multipliers: Different types of causal matches receive customizable boost factors
Enhanced Relevance: Results that explain causal relationships are prioritized in search results

This mechanism significantly improves responses to "why" and "how" questions by surfacing content that explains relationships between concepts. The causal boosting configuration is highly customizable through YAML files, allowing adaptation to different domains and languages.

License

MIT License - see LICENSE file for details

Project details

Release history Release notifications | RSS feed

0.3.1

Apr 25, 2025

0.3.0

Apr 1, 2025

0.2.10

Mar 30, 2025

0.2.9

Mar 30, 2025

0.2.8

Mar 30, 2025

0.2.7

Mar 11, 2025

0.2.6

Mar 11, 2025

0.2.5

Mar 11, 2025

0.2.3

Mar 11, 2025

0.2.2

Mar 11, 2025

This version

0.2.1

Mar 11, 2025

0.2.0

Mar 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kb_mcp_server-0.2.1.tar.gz (584.1 kB view details)

Uploaded Mar 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kb_mcp_server-0.2.1-py3-none-any.whl (50.8 kB view details)

Uploaded Mar 11, 2025 Python 3

File details

Details for the file kb_mcp_server-0.2.1.tar.gz.

File metadata

Download URL: kb_mcp_server-0.2.1.tar.gz
Upload date: Mar 11, 2025
Size: 584.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for kb_mcp_server-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`7799694c422319a1b721e4bdce3453eb1f9eb04e681a4ff2a8dd536e328824fb`
MD5	`411fa2176e63dfb0363bded82f7ac472`
BLAKE2b-256	`60be371c0f5f70c52a218be6f483f6ab0cf5079cee17e7835e3a527ce950668b`

See more details on using hashes here.

File details

Details for the file kb_mcp_server-0.2.1-py3-none-any.whl.

File metadata

Download URL: kb_mcp_server-0.2.1-py3-none-any.whl
Upload date: Mar 11, 2025
Size: 50.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for kb_mcp_server-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f5918cf9928061d53f236807825520f7eac82889a854853ccb960d22a97b2508`
MD5	`acf711823ab0d4e3e50f9da349054de6`
BLAKE2b-256	`9469d0dda7dd66a7c9aca45ef5cfcbb3c94e744c32ba1b45654a83fd5265f3de`

See more details on using hashes here.

kb-mcp-server 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Embedding MCP Server

The Power of txtai: All-in-one Embeddings Database

How It Works

1. Build a Knowledge Base with kb_builder

2. Start the MCP Server

Installation

Using Conda (Recommended)

Using uv (Faster Alternative)

Command Line Usage

Building a Knowledge Base

Using the Python Module

Using the Convenience Scripts

Starting the MCP Server

MCP Server Configuration

Advanced Knowledge Base Configuration

Configuration Examples

Storage and Backend Configurations

Domain-Specific Configurations

Advanced Features

Knowledge Graph Capabilities

Causal Boosting Mechanism

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes