Skip to main content

A MCP server designed to bridge the gap between specialized knowledge domains and AI assistants.

Project description

knowledge-mcp: Specialized Knowledge Bases for AI Agents

1. Overview and Concept

knowledge-mcp is a MCP server designed to bridge the gap between specialized knowledge domains and AI assistants. It allows users to create, manage, and query dedicated knowledge bases, making this information accessible to AI agents through an MCP (Model Context Protocol) server interface.

The core idea is to empower AI assistants that are MCP clients (like Claude Desktop or IDEs like Windsurf) to proactively consult these specialized knowledge bases during their reasoning process (Chain of Thought), rather than relying solely on general semantic search against user prompts or broad web searches. This enables more accurate, context-aware responses when dealing with specific domains.

Key components:

  • CLI Tool: Provides a user-friendly command-line interface for managing knowledge bases (creating, deleting, adding/removing documents, configuring, searching).
  • Knowledge Base Engine: Leverages LightRAG to handle document processing, embedding, knowledge graph creation, and complex querying.
  • MCP Server: Exposes the search functionality of the knowledge bases via the FastMCP protocol, allowing compatible AI agents to query them directly.

2. About LightRAG

This project utilizes LightRAG (HKUDS/LightRAG) as its core engine for knowledge base creation and querying. LightRAG is a powerful framework designed to enhance Large Language Models (LLMs) by integrating Retrieval-Augmented Generation (RAG) with knowledge graph techniques.

Key features of LightRAG relevant to this project:

  • Document Processing Pipeline: Ingests documents (PDF, Text, Markdown, DOCX), chunks them, extracts entities and relationships using an LLM, and builds both a knowledge graph and vector embeddings.
  • Multiple Query Modes: Supports various retrieval strategies (e.g., vector similarity, entity-centric, relationship-focused, hybrid) to find the most relevant context for a given query.
  • Flexible Storage: Can use different backends for storing key-value data, vectors, graph information, and document status (this project uses the default file-based storage).
  • LLM/Embedding Integration: Supports various providers like OpenAI (used in this project), Ollama, Hugging Face, etc.

By using LightRAG, knowledge-mcp benefits from advanced RAG capabilities that go beyond simple vector search.

3. Installation

Ensure you have Python 3.12 and uv installed.

  1. Clone the repository:

    git clone https://github.com/olafgeibig/knowledge-mcp.git
    cd knowledge-mcp
    
  2. Create a virtual environment and install dependencies using uv:

    python -m venv .venv
    source .venv/bin/activate # Or .\.venv\Scripts\activate on Windows
    uv pip install -e ".[dev]"
    

    Installing with -e . makes the package editable and installs dev dependencies.

  3. Set up configuration:

    • Copy config.example.yaml to config.yaml.
    • Copy .env.example to .env.
    • Edit config.yaml and .env to add your API keys (e.g., OPENAI_API_KEY) and adjust paths or settings as needed. The knowledge_base.base_dir in config.yaml specifies where your knowledge base directories will be created.

4. Usage (CLI)

The primary way to interact with knowledge-mcp is through its CLI, accessed via the knowledge-mcp command (if installed globally or via uvx knowledge-mcp within the activated venv).

All commands require the --config option pointing to your main configuration file.

knowledge-mcp --config config.yaml <command> [arguments...]

Available Commands:

Command Description Arguments Status
create Creates a new knowledge base directory and initializes its structure. <kb-name>: Name of the knowledge base to create. Implemented
delete Deletes an existing knowledge base directory and all its contents. <kb-name>: Name of the knowledge base to delete. Implemented
list Lists all available knowledge bases found in the base_dir. N/A Implemented
add Adds a document: processes, chunks, embeds, stores in the specified KB. <kb-name>: Target KB.
<path>: Path to the document file.
Implemented
remove Removes a document and its associated embeddings from the KB. <kb-name>: Target KB.
<doc_name>: Name/ID of the document to remove.
Implemented
config Manages the KB-specific config.yaml (query parameters). <kb_name>: Target KB.
`[show
edit]`: Subcommand (show default).
search Searches the specified knowledge base using LightRAG. <kb-name>: Target KB.
<query>: Your search query text.
Implemented
mcp Runs the MCP server to expose the search functionality to AI agents. N/A Pending
shell Starts an interactive shell session with all commands available. N/A Implemented
exit (Within shell) Exits the interactive shell. N/A Implemented
help (Within shell) Shows available commands and their usage. [command] (Optional command name) Implemented

Example:

# Create a knowledge base named 'my_docs'
knowledge-mcp --config config.yaml create my_docs

# Add a document to it
knowledge-mcp --config config.yaml add my_docs ./path/to/mydocument.pdf

# Search the knowledge base
knowledge-mcp --config config.yaml search my_docs "What is the main topic?"

# Start the interactive shell
knowledge-mcp --config config.yaml shell

(kbmcp) list
(kbmcp) search my_docs "Another query"
(kbmcp) exit

5. Configuration

Configuration is managed via YAML files:

  • Main Configuration (config.yaml): Defines global settings like the knowledge base directory (knowledge_base.base_dir), LightRAG parameters (LLM provider/model, embedding provider/model, API keys via ${ENV_VAR} substitution), and logging settings.

    # Example structure (see config.example.yaml for full details)
    knowledge_base:
      base_dir: ./kbs # Default directory for KBs
    
    lightrag:
      llm:
        provider: "openai"
        model_name: "gpt-4.1-nano"
        api_key: "${OPENAI_API_KEY}"
        # ... other LLM settings
      embedding:
        provider: "openai"
        model_name: "text-embedding-3-small"
        api_key: "${OPENAI_API_KEY}"
        # ... other embedding settings
      embedding_cache:
        enabled: true
        similarity_threshold: 0.90
    
    logging:
      level: "INFO"
      # ... logging settings
    
    env_file: .env # Optional path to .env file
    
  • Knowledge Base Specific Configuration (<base_dir>/<kb_name>/config.yaml): Contains parameters specific to querying that knowledge base, such as the LightRAG query mode, top_k results, context token limits, etc. This file is automatically created with defaults when a KB is created and can be viewed/edited using the config CLI command.

6. Development

  • Tech Stack: Python 3.12, uv (dependency management), hatchling (build system), pytest (testing).
  • Setup: Follow the installation steps, ensuring you install with uv pip install -e ".[dev]".
  • Code Style: Adheres to PEP 8.
  • Testing: Run tests using uvx test or pytest.
  • Dependencies: Managed in pyproject.toml. Use uv pip install <package> to add and uv pip uninstall <package> to remove dependencies, updating pyproject.toml accordingly.
  • Scripts: Common tasks might be defined under [project.scripts] in pyproject.toml.
  • MCP Inspector: Use npx @modelcontextprotocol/inspector uv run cli --config ./kbs/config.yaml serve to start the MCP inspector.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knowledge_mcp-0.1.0.tar.gz (106.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

knowledge_mcp-0.1.0-py3-none-any.whl (27.5 kB view details)

Uploaded Python 3

File details

Details for the file knowledge_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: knowledge_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 106.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for knowledge_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 431b3f5e730a58bd9318b978bab8f23a7ad519355173e2ace0f8f7416e5fc75a
MD5 bb1baa91ffddfbaecf6b63b6bf485c4b
BLAKE2b-256 ee6e21efe86b51adb080a72c4716205112736446f27163ea59e4d42dc4f1ea03

See more details on using hashes here.

File details

Details for the file knowledge_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: knowledge_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for knowledge_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 76b8b6cfa86ba6ded041633c2aa90c9415bac02728c48de35ef00a0b5bcb260c
MD5 f7a35defb4e895063dc6af54a4dd1dbd
BLAKE2b-256 e50495df1b53d47c76087f0eaa050c0a5e177c42ecea8555e675157819266737

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page