Skip to main content

MCP server for datapizza-ai documentation and examples

Project description

DataPizza MCP Server 🍕

A Model Context Protocol (MCP) server that provides intelligent access to datapizza-ai documentation through vector similarity search and retrieval-augmented generation.

Overview

This MCP server enables AI assistants and applications to query the comprehensive datapizza-ai documentation using natural language queries. It indexes documentation from the datapizza-ai repository and provides contextual, relevant responses through a RAG (Retrieval-Augmented Generation) pipeline.

Features

  • Intelligent Documentation Search: Natural language queries across datapizza-ai documentation
  • Vector-Based Retrieval: Uses OpenAI embeddings and Qdrant vector database for semantic search
  • MCP Protocol Compliance: Standard Model Context Protocol implementation for broad compatibility
  • Automatic Indexing: Downloads and indexes documentation from GitHub automatically
  • Cloud-Ready: Supports Qdrant Cloud for scalable vector storage
  • Configurable: Environment-based configuration for flexible deployment

Architecture

The server consists of four main components:

  • MCP Server: FastMCP-based server exposing the query_datapizza tool
  • Indexer: Downloads and processes datapizza-ai documentation into searchable chunks
  • Retriever: RAG engine for semantic search and response generation
  • Configuration: Environment-based settings management with validation

Prerequisites

  • Python 3.10 or higher
  • OpenAI API key
  • Qdrant Cloud account and API key
  • Internet connection for documentation indexing

Installation

  1. Clone the repository:
git clone https://github.com/datapizza-labs/mcp_server_datapizza.git
cd datapizza-mcp-server
  1. Navigate to the package directory:
cd datapizza-mcp-server
  1. Install the package with development dependencies:
pip install -e ".[dev]"

Configuration

Create a .env file in the datapizza-mcp-server directory with the following variables:

# Required Configuration
OPENAI_API_KEY=your_openai_api_key_here
QDRANT_URL=your_qdrant_cloud_url
QDRANT_API_KEY=your_qdrant_api_key

# Optional Configuration
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
COLLECTION_NAME=datapizza_docs
MAX_RESULTS=5
CHUNK_SIZE=1024
CHUNK_OVERLAP=200
LOG_LEVEL=INFO

Required Environment Variables

Variable Description
OPENAI_API_KEY OpenAI API key for generating embeddings
QDRANT_URL Qdrant Cloud instance URL
QDRANT_API_KEY Qdrant Cloud API key

Optional Environment Variables

Variable Default Description
EMBEDDING_MODEL text-embedding-3-small OpenAI embedding model
EMBEDDING_DIMENSIONS 1536 Embedding vector dimensions
COLLECTION_NAME datapizza_docs Qdrant collection name
MAX_RESULTS 5 Maximum search results returned
CHUNK_SIZE 1024 Document chunk size for indexing
CHUNK_OVERLAP 200 Overlap between document chunks
LOG_LEVEL INFO Logging level (DEBUG, INFO, WARNING, ERROR)

Usage

1. Index Documentation

Before using the server, index the datapizza-ai documentation:

python -m datapizza_mcp.indexer

To force re-indexing (clears existing data):

python -m datapizza_mcp.indexer --force

2. Start the MCP Server

python -m datapizza_mcp.server

Or use the provided Windows batch script:

../run_datapizza.bat

3. Query the Documentation

The server exposes a query_datapizza tool that can be called by MCP clients:

# Example query
result = await client.call_tool("query_datapizza", {
    "query": "come creare un agente con OpenAI",
    "max_results": 5
})

MCP Tools and Resources

Tools

  • query_datapizza: Search datapizza-ai documentation
    • query (string): Natural language search query
    • max_results (int, optional): Maximum number of results (default: 5)

Resources

  • datapizza://status: System status and configuration information

Development

Code Quality Tools

# Format code
black src/

# Lint code
ruff check src/
ruff check src/ --fix  # Auto-fix issues

# Type checking
mypy src/

# Run tests
pytest

Project Structure

datapizza-mcp-server/
├── src/datapizza_mcp/
│   ├── __init__.py          # Package exports
│   ├── config.py            # Configuration management
│   ├── server.py            # MCP server implementation
│   ├── indexer.py           # Documentation indexing
│   └── retriever.py         # RAG retrieval engine
├── pyproject.toml           # Package configuration
├── .env                     # Environment variables
└── README.md               # This file

Dependencies

Core Dependencies

  • mcp: Model Context Protocol framework
  • datapizza-ai-core: Core datapizza-ai functionality
  • datapizza-ai-embedders-openai: OpenAI embedding integration
  • datapizza-ai-vectorstores-qdrant: Qdrant vector store integration
  • openai: OpenAI API client
  • qdrant-client: Qdrant database client
  • requests: HTTP client for GitHub API
  • python-dotenv: Environment variable management

Development Dependencies

  • pytest: Testing framework
  • black: Code formatter
  • ruff: Linter and code style checker
  • mypy: Static type checker

Troubleshooting

Common Issues

  1. Authentication Errors

    • Verify OPENAI_API_KEY is set correctly
    • Check Qdrant Cloud credentials (QDRANT_URL and QDRANT_API_KEY)
  2. Empty Search Results

    • Ensure documentation is indexed: python -m datapizza_mcp.indexer
    • Check system status: query the datapizza://status resource
  3. Connection Issues

    • Verify internet connectivity for GitHub and Qdrant Cloud access
    • Check firewall settings for outbound HTTPS connections

Debugging

Enable debug logging by setting LOG_LEVEL=DEBUG in your .env file.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes following the code style guidelines
  4. Run the full test suite and code quality checks
  5. Submit a pull request

License

This project is licensed under the MIT License. See the LICENSE file for details.

Support

For issues and questions:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file iflow_mcp_mat1312_datapizza_mcp_server-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: iflow_mcp_mat1312_datapizza_mcp_server-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.29 {"installer":{"name":"uv","version":"0.9.29","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_mat1312_datapizza_mcp_server-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a0ff79503eae9f6a71926bb89173566be9fbf852a67f337fc0369293077afafe
MD5 8c57f5f60779f338e6e7d1e9e3bc05b6
BLAKE2b-256 d4ba6b0808777c9843e899de4f643fd3c1a1080e65371d8d4aac5916dfe53440

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page