MCP server for datapizza-ai documentation and examples
Project description
DataPizza MCP Server 🍕
A Model Context Protocol (MCP) server that provides intelligent access to datapizza-ai documentation through vector similarity search and retrieval-augmented generation.
Overview
This MCP server enables AI assistants and applications to query the comprehensive datapizza-ai documentation using natural language queries. It indexes documentation from the datapizza-ai repository and provides contextual, relevant responses through a RAG (Retrieval-Augmented Generation) pipeline.
Features
- Intelligent Documentation Search: Natural language queries across datapizza-ai documentation
- Vector-Based Retrieval: Uses OpenAI embeddings and Qdrant vector database for semantic search
- MCP Protocol Compliance: Standard Model Context Protocol implementation for broad compatibility
- Automatic Indexing: Downloads and indexes documentation from GitHub automatically
- Cloud-Ready: Supports Qdrant Cloud for scalable vector storage
- Configurable: Environment-based configuration for flexible deployment
Architecture
The server consists of four main components:
- MCP Server: FastMCP-based server exposing the
query_datapizzatool - Indexer: Downloads and processes datapizza-ai documentation into searchable chunks
- Retriever: RAG engine for semantic search and response generation
- Configuration: Environment-based settings management with validation
Prerequisites
- Python 3.10 or higher
- OpenAI API key
- Qdrant Cloud account and API key
- Internet connection for documentation indexing
Installation
- Clone the repository:
git clone https://github.com/datapizza-labs/mcp_server_datapizza.git
cd datapizza-mcp-server
- Navigate to the package directory:
cd datapizza-mcp-server
- Install the package with development dependencies:
pip install -e ".[dev]"
Configuration
Create a .env file in the datapizza-mcp-server directory with the following variables:
# Required Configuration
OPENAI_API_KEY=your_openai_api_key_here
QDRANT_URL=your_qdrant_cloud_url
QDRANT_API_KEY=your_qdrant_api_key
# Optional Configuration
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
COLLECTION_NAME=datapizza_docs
MAX_RESULTS=5
CHUNK_SIZE=1024
CHUNK_OVERLAP=200
LOG_LEVEL=INFO
Required Environment Variables
| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key for generating embeddings |
QDRANT_URL |
Qdrant Cloud instance URL |
QDRANT_API_KEY |
Qdrant Cloud API key |
Optional Environment Variables
| Variable | Default | Description |
|---|---|---|
EMBEDDING_MODEL |
text-embedding-3-small |
OpenAI embedding model |
EMBEDDING_DIMENSIONS |
1536 |
Embedding vector dimensions |
COLLECTION_NAME |
datapizza_docs |
Qdrant collection name |
MAX_RESULTS |
5 |
Maximum search results returned |
CHUNK_SIZE |
1024 |
Document chunk size for indexing |
CHUNK_OVERLAP |
200 |
Overlap between document chunks |
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
Usage
1. Index Documentation
Before using the server, index the datapizza-ai documentation:
python -m datapizza_mcp.indexer
To force re-indexing (clears existing data):
python -m datapizza_mcp.indexer --force
2. Start the MCP Server
python -m datapizza_mcp.server
Or use the provided Windows batch script:
../run_datapizza.bat
3. Query the Documentation
The server exposes a query_datapizza tool that can be called by MCP clients:
# Example query
result = await client.call_tool("query_datapizza", {
"query": "come creare un agente con OpenAI",
"max_results": 5
})
MCP Tools and Resources
Tools
query_datapizza: Search datapizza-ai documentationquery(string): Natural language search querymax_results(int, optional): Maximum number of results (default: 5)
Resources
datapizza://status: System status and configuration information
Development
Code Quality Tools
# Format code
black src/
# Lint code
ruff check src/
ruff check src/ --fix # Auto-fix issues
# Type checking
mypy src/
# Run tests
pytest
Project Structure
datapizza-mcp-server/
├── src/datapizza_mcp/
│ ├── __init__.py # Package exports
│ ├── config.py # Configuration management
│ ├── server.py # MCP server implementation
│ ├── indexer.py # Documentation indexing
│ └── retriever.py # RAG retrieval engine
├── pyproject.toml # Package configuration
├── .env # Environment variables
└── README.md # This file
Dependencies
Core Dependencies
- mcp: Model Context Protocol framework
- datapizza-ai-core: Core datapizza-ai functionality
- datapizza-ai-embedders-openai: OpenAI embedding integration
- datapizza-ai-vectorstores-qdrant: Qdrant vector store integration
- openai: OpenAI API client
- qdrant-client: Qdrant database client
- requests: HTTP client for GitHub API
- python-dotenv: Environment variable management
Development Dependencies
- pytest: Testing framework
- black: Code formatter
- ruff: Linter and code style checker
- mypy: Static type checker
Troubleshooting
Common Issues
-
Authentication Errors
- Verify
OPENAI_API_KEYis set correctly - Check Qdrant Cloud credentials (
QDRANT_URLandQDRANT_API_KEY)
- Verify
-
Empty Search Results
- Ensure documentation is indexed:
python -m datapizza_mcp.indexer - Check system status: query the
datapizza://statusresource
- Ensure documentation is indexed:
-
Connection Issues
- Verify internet connectivity for GitHub and Qdrant Cloud access
- Check firewall settings for outbound HTTPS connections
Debugging
Enable debug logging by setting LOG_LEVEL=DEBUG in your .env file.
Contributing
- Fork the repository
- Create a feature branch
- Make your changes following the code style guidelines
- Run the full test suite and code quality checks
- Submit a pull request
License
This project is licensed under the MIT License. See the LICENSE file for details.
Support
For issues and questions:
- GitHub Issues: datapizza-mcp-server/issues
- DataPizza AI Documentation: datapizza-ai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iflow_mcp_mat1312_datapizza_mcp_server-0.1.0-py3-none-any.whl.
File metadata
- Download URL: iflow_mcp_mat1312_datapizza_mcp_server-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.29 {"installer":{"name":"uv","version":"0.9.29","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0ff79503eae9f6a71926bb89173566be9fbf852a67f337fc0369293077afafe
|
|
| MD5 |
8c57f5f60779f338e6e7d1e9e3bc05b6
|
|
| BLAKE2b-256 |
d4ba6b0808777c9843e899de4f643fd3c1a1080e65371d8d4aac5916dfe53440
|