MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction
Project description
🕷️ Crawl4AI MCP Server
MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction for AI agents.
Integrate powerful web scraping capabilities into Claude, ChatGPT, and any MCP-compatible AI assistant.
📑 Table of Contents
- 🐳 Quick Start with Docker (Recommended)
- 📦 Alternative Installation Methods
- 🛠️ Available Tools
- 🚀 Usage
- ⚙️ Configuration
- 🤝 Contributing
- 📄 License
🐳 Quick Start with Docker (Recommended)
✨ Docker is the preferred way to run Crawl4AI MCP Server - everything is pre-installed and ready to go!
Option 1: Docker Hub Image (Latest)
# SSE mode (for web interfaces) - DEFAULT
docker run --rm -p 3001:9001 \
-e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
-e CRAWL4AI_BEARER_TOKEN="your-optional-token" \
stgmt/crawl4ai-mcp:latest crawl4ai-mcp --sse
# HTTP mode (for REST API)
docker run --rm -p 3000:3000 \
-e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
-e CRAWL4AI_BEARER_TOKEN="your-optional-token" \
stgmt/crawl4ai-mcp:latest crawl4ai-mcp --http --port 3000
# STDIO mode (for Claude Desktop)
docker run --rm -it \
-e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
-e CRAWL4AI_BEARER_TOKEN="your-optional-token" \
stgmt/crawl4ai-mcp:latest crawl4ai-mcp --stdio
Option 2: Build from Source (Latest fixes)
# Clone and build
git clone https://github.com/stgmt/crawl4ai-mcp.git
cd crawl4ai-mcp
docker build -t crawl4ai-mcp:local .
# Run SSE mode
docker run --rm -p 3001:9001 \
-e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
-e CRAWL4AI_BEARER_TOKEN="your-optional-token" \
crawl4ai-mcp:local crawl4ai-mcp --sse
With Claude Desktop (Docker)
Add to your claude_desktop_config.json:
{
"mcpServers": {
"crawl4ai": {
"command": "docker",
"args": [
"run", "--rm", "-i",
"-e", "CRAWL4AI_ENDPOINT=https://your-crawl4ai-server.com",
"-e", "CRAWL4AI_BEARER_TOKEN=your-optional-token",
"stgmt/crawl4ai-mcp:latest",
"crawl4ai-mcp", "--stdio"
]
}
}
}
📦 Alternative Installation Methods
NPM Package
# Install globally
npm install -g crawl4ai-mcp-sse-stdio
# Run in different modes (set CRAWL4AI_ENDPOINT first)
export CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com"
npx crawl4ai-mcp --stdio
npx crawl4ai-mcp --sse --port 3001
npx crawl4ai-mcp --http --port 3000
Python Package (PyPI)
# Install from PyPI
pip install crawl4ai-mcp
<<<<<<< HEAD
# Set required endpoint and run
export CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com"
crawl4ai-mcp --stdio
=======
# Run with command line arguments (recommended)
crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com
crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com
crawl4ai-mcp --sse --port 3001 --endpoint https://your-crawl4ai-server.com
# With optional bearer token
crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com --bearer-token your-token
From NPM (Alternative)
npm install -g crawl4ai-mcp-sse-stdio
# Run with command line arguments (recommended)
npx crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com
npx crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com
npx crawl4ai-mcp --sse --port 3001 --endpoint https://your-crawl4ai-server.com
# With optional bearer token
npx crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com --bearer-token your-token
From Source
git clone https://github.com/stgmt/crawl4ai-mcp.git
cd crawl4ai-mcp
pip install -e .
With Claude Desktop (Non-Docker)
Add to your claude_desktop_config.json:
{
"mcpServers": {
"crawl4ai": {
"command": "crawl4ai-mcp",
"args": ["--stdio"],
"env": {
"CRAWL4AI_ENDPOINT": "https://your-crawl4ai-server.com",
"CRAWL4AI_BEARER_TOKEN": "your-optional-token"
}
}
}
}
🛠️ Available Tools
1. crawl - Full Web Crawling
Extract complete content from any webpage.
{
"name": "crawl",
"arguments": {
"url": "https://example.com",
"wait_for": "css:.content",
"timeout": 30000
}
}
2. md - Markdown Extraction
Get clean markdown content from webpages.
{
"name": "md",
"arguments": {
"url": "https://docs.example.com",
"clean": true
}
}
3. html - Raw HTML
Retrieve raw HTML content.
{
"name": "html",
"arguments": {
"url": "https://example.com"
}
}
4. screenshot - Visual Capture
Take screenshots of webpages.
{
"name": "screenshot",
"arguments": {
"url": "https://example.com",
"full_page": true
}
}
5. pdf - PDF Generation
Convert webpages to PDF.
{
"name": "pdf",
"arguments": {
"url": "https://example.com",
"format": "A4"
}
}
6. execute_js - JavaScript Execution
Execute JavaScript on webpages.
{
"name": "execute_js",
"arguments": {
"url": "https://example.com",
"script": "document.title"
}
}
🚀 Usage
The crawl4ai-mcp server supports multiple transport modes and provides comprehensive web crawling capabilities through the Model Context Protocol.
Basic Commands
# HTTP mode (recommended for testing)
crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com
# SSE mode (Server-Sent Events)
crawl4ai-mcp --sse --port 3001 --endpoint https://your-crawl4ai-server.com
# STDIO mode (for MCP clients)
crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com
# With optional bearer token
crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com --bearer-token your-token
With Custom Endpoint
# Using custom Crawl4AI endpoint with bearer token
crawl4ai-mcp --http --port 3000 \
--crawl4ai-endpoint "https://your-server.com" \
--bearer-token "your-token"
See the Python Integration section for detailed code examples.
⚙️ Configuration
Environment Variables
# REQUIRED: Crawl4AI endpoint URL
export CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com"
# OPTIONAL: Bearer authentication token
export CRAWL4AI_BEARER_TOKEN="your-api-token"
Parameter Requirements:
CRAWL4AI_ENDPOINT- Required - The URL of your Crawl4AI server instanceCRAWL4AI_BEARER_TOKEN- Optional - Bearer token for authenticated API access
Command Line Options
crawl4ai-mcp --help
Options:
--stdio Run in STDIO mode for MCP clients
--sse Run in SSE mode for web interfaces (default)
--http Run in HTTP mode
--endpoint ENDPOINT Crawl4AI API endpoint URL (REQUIRED)
--bearer-token TOKEN Bearer authentication token (OPTIONAL)
--version, -v Show version
🐍 Python Integration Example
Here's how to integrate the MCP server with your Python application using HTTP mode with bearer token authentication:
import asyncio
import aiohttp
import json
async def test_crawl4ai_mcp():
"""
Example: Using Crawl4AI MCP server via HTTP with bearer token
"""
# Server configuration
server_url = "http://localhost:3000"
bearer_token = "your-api-token" # Optional
headers = {
"Content-Type": "application/json"
}
# Add bearer token if available
if bearer_token:
headers["Authorization"] = f"Bearer {bearer_token}"
async with aiohttp.ClientSession() as session:
# 1. List available tools
async with session.post(
f"{server_url}/tools/list",
headers=headers
) as response:
tools = await response.json()
print("Available tools:", [tool['name'] for tool in tools['tools']])
# 2. Extract markdown from a webpage
tool_request = {
"name": "md",
"arguments": {
"url": "https://example.com",
"clean": True
}
}
async with session.post(
f"{server_url}/tools/call",
headers=headers,
json=tool_request
) as response:
result = await response.json()
print("Markdown content:", result['content'][:200] + "...")
# 3. Take a screenshot
screenshot_request = {
"name": "screenshot",
"arguments": {
"url": "https://example.com",
"full_page": True
}
}
async with session.post(
f"{server_url}/tools/call",
headers=headers,
json=screenshot_request
) as response:
result = await response.json()
print("Screenshot saved:", result.get('path', 'Screenshot data returned'))
# 4. Execute JavaScript on a page
js_request = {
"name": "execute_js",
"arguments": {
"url": "https://example.com",
"script": "document.title"
}
}
async with session.post(
f"{server_url}/tools/call",
headers=headers,
json=js_request
) as response:
result = await response.json()
print("Page title:", result['content'])
# Run the example
if __name__ == "__main__":
# First, start the MCP server in HTTP mode:
# docker run -p 3000:3000 \
# -e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
# -e CRAWL4AI_BEARER_TOKEN="your-api-token" \
# stgmt/crawl4ai-mcp:latest crawl4ai-mcp --http --port 3000
asyncio.run(test_crawl4ai_mcp())
Installation for Python integration
pip install aiohttp # For HTTP client
🤝 Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
📄 License
MIT License - see LICENSE file for details.
🔗 Links
- PyPI Package: https://pypi.org/project/crawl4ai-mcp/
- NPM Package: https://www.npmjs.com/package/crawl4ai-mcp-sse-stdio
- Docker Hub: https://hub.docker.com/r/stgmt/crawl4ai-mcp
- GitHub Repository: https://github.com/stgmt/crawl4ai-mcp
Made with ❤️ for the AI community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crawl4ai_mcp_sse_stdio-1.1.0.tar.gz.
File metadata
- Download URL: crawl4ai_mcp_sse_stdio-1.1.0.tar.gz
- Upload date:
- Size: 21.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6af56d90cc331b7e76ba0e792730f9ce5798cd6a88a8491b75b7b810e5a5173a
|
|
| MD5 |
ec077024316c21c661fa8c86150b94ec
|
|
| BLAKE2b-256 |
937f7c700bc03bf60027ccff0b7287f974c69bdf63a7821bd1964e86ec657dc6
|
File details
Details for the file crawl4ai_mcp_sse_stdio-1.1.0-py3-none-any.whl.
File metadata
- Download URL: crawl4ai_mcp_sse_stdio-1.1.0-py3-none-any.whl
- Upload date:
- Size: 22.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdaeecae147e37e24672c7fec5b298a3e3d4829526427f067b4586437ee7451a
|
|
| MD5 |
ca608eb0a28558afea60ff28fbce66a8
|
|
| BLAKE2b-256 |
6a291ceee04a3cd4cf927a97e5a5b9ad3876b175b66f920b185823c821030a7c
|