Skip to main content

MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction

Project description

🕷️ Crawl4AI MCP Server

PyPI version Python License: MIT Downloads Author

MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction for AI agents.

Integrate powerful web scraping capabilities into Claude, ChatGPT, and any MCP-compatible AI assistant.

📑 Table of Contents

🐳 Quick Start with Docker (Recommended)

✨ Docker is the preferred way to run Crawl4AI MCP Server - everything is pre-installed and ready to go!

Option 1: Docker Hub Image (Latest)

# SSE mode (for web interfaces) - DEFAULT
docker run --rm -p 3001:9001 \
  -e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
  -e CRAWL4AI_BEARER_TOKEN="your-optional-token" \
  stgmt/crawl4ai-mcp:latest crawl4ai-mcp --sse

# HTTP mode (for REST API)
docker run --rm -p 3000:3000 \
  -e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
  -e CRAWL4AI_BEARER_TOKEN="your-optional-token" \
  stgmt/crawl4ai-mcp:latest crawl4ai-mcp --http --port 3000

# STDIO mode (for Claude Desktop)
docker run --rm -it \
  -e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
  -e CRAWL4AI_BEARER_TOKEN="your-optional-token" \
  stgmt/crawl4ai-mcp:latest crawl4ai-mcp --stdio

Option 2: Build from Source (Latest fixes)

# Clone and build
git clone https://github.com/stgmt/crawl4ai-mcp.git
cd crawl4ai-mcp
docker build -t crawl4ai-mcp:local .

# Run SSE mode
docker run --rm -p 3001:9001 \
  -e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
  -e CRAWL4AI_BEARER_TOKEN="your-optional-token" \
  crawl4ai-mcp:local crawl4ai-mcp --sse

With Claude Desktop (Docker)

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "crawl4ai": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "-e", "CRAWL4AI_ENDPOINT=https://your-crawl4ai-server.com",
        "-e", "CRAWL4AI_BEARER_TOKEN=your-optional-token",
        "stgmt/crawl4ai-mcp:latest",
        "crawl4ai-mcp", "--stdio"
      ]
    }
  }
}

📦 Alternative Installation Methods

NPM Package

# Install globally
npm install -g crawl4ai-mcp-sse-stdio

# Run in different modes (set CRAWL4AI_ENDPOINT first)
export CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com"
npx crawl4ai-mcp --stdio
npx crawl4ai-mcp --sse --port 3001
npx crawl4ai-mcp --http --port 3000

Python Package (PyPI)

# Install from PyPI
pip install crawl4ai-mcp

<<<<<<< HEAD
# Set required endpoint and run
export CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com"
crawl4ai-mcp --stdio
=======
# Run with command line arguments (recommended)
crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com
crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com
crawl4ai-mcp --sse --port 3001 --endpoint https://your-crawl4ai-server.com

# With optional bearer token
crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com --bearer-token your-token

From NPM (Alternative)

npm install -g crawl4ai-mcp-sse-stdio

# Run with command line arguments (recommended)
npx crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com
npx crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com
npx crawl4ai-mcp --sse --port 3001 --endpoint https://your-crawl4ai-server.com

# With optional bearer token
npx crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com --bearer-token your-token

From Source

git clone https://github.com/stgmt/crawl4ai-mcp.git
cd crawl4ai-mcp
pip install -e .

With Claude Desktop (Non-Docker)

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "crawl4ai": {
      "command": "crawl4ai-mcp",
      "args": ["--stdio"],
      "env": {
        "CRAWL4AI_ENDPOINT": "https://your-crawl4ai-server.com",
        "CRAWL4AI_BEARER_TOKEN": "your-optional-token"
      }
    }
  }
}

🛠️ Available Tools

1. crawl - Full Web Crawling

Extract complete content from any webpage.

{
  "name": "crawl",
  "arguments": {
    "url": "https://example.com",
    "wait_for": "css:.content",
    "timeout": 30000
  }
}

2. md - Markdown Extraction

Get clean markdown content from webpages.

{
  "name": "md", 
  "arguments": {
    "url": "https://docs.example.com",
    "clean": true
  }
}

3. html - Raw HTML

Retrieve raw HTML content.

{
  "name": "html",
  "arguments": {
    "url": "https://example.com"
  }
}

4. screenshot - Visual Capture

Take screenshots of webpages.

{
  "name": "screenshot",
  "arguments": {
    "url": "https://example.com",
    "full_page": true
  }
}

5. pdf - PDF Generation

Convert webpages to PDF.

{
  "name": "pdf",
  "arguments": {
    "url": "https://example.com",
    "format": "A4"
  }
}

6. execute_js - JavaScript Execution

Execute JavaScript on webpages.

{
  "name": "execute_js",
  "arguments": {
    "url": "https://example.com",
    "script": "document.title"
  }
}

🚀 Usage

The crawl4ai-mcp server supports multiple transport modes and provides comprehensive web crawling capabilities through the Model Context Protocol.

Basic Commands

# HTTP mode (recommended for testing)
crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com

# SSE mode (Server-Sent Events)
crawl4ai-mcp --sse --port 3001 --endpoint https://your-crawl4ai-server.com

# STDIO mode (for MCP clients)
crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com

# With optional bearer token
crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com --bearer-token your-token

With Custom Endpoint

# Using custom Crawl4AI endpoint with bearer token
crawl4ai-mcp --http --port 3000 \
  --crawl4ai-endpoint "https://your-server.com" \
  --bearer-token "your-token"

See the Python Integration section for detailed code examples.

⚙️ Configuration

Environment Variables

# REQUIRED: Crawl4AI endpoint URL
export CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com"

# OPTIONAL: Bearer authentication token
export CRAWL4AI_BEARER_TOKEN="your-api-token"

Parameter Requirements:

  • CRAWL4AI_ENDPOINT - Required - The URL of your Crawl4AI server instance
  • CRAWL4AI_BEARER_TOKEN - Optional - Bearer token for authenticated API access

Command Line Options

crawl4ai-mcp --help

Options:
  --stdio              Run in STDIO mode for MCP clients
  --sse                Run in SSE mode for web interfaces (default)
  --http               Run in HTTP mode
  --endpoint ENDPOINT  Crawl4AI API endpoint URL (REQUIRED)
  --bearer-token TOKEN Bearer authentication token (OPTIONAL)
  --version, -v        Show version

🐍 Python Integration Example

Here's how to integrate the MCP server with your Python application using HTTP mode with bearer token authentication:

import asyncio
import aiohttp
import json

async def test_crawl4ai_mcp():
    """
    Example: Using Crawl4AI MCP server via HTTP with bearer token
    """
    # Server configuration
    server_url = "http://localhost:3000"
    bearer_token = "your-api-token"  # Optional

    headers = {
        "Content-Type": "application/json"
    }

    # Add bearer token if available
    if bearer_token:
        headers["Authorization"] = f"Bearer {bearer_token}"

    async with aiohttp.ClientSession() as session:
        # 1. List available tools
        async with session.post(
            f"{server_url}/tools/list",
            headers=headers
        ) as response:
            tools = await response.json()
            print("Available tools:", [tool['name'] for tool in tools['tools']])

        # 2. Extract markdown from a webpage
        tool_request = {
            "name": "md",
            "arguments": {
                "url": "https://example.com",
                "clean": True
            }
        }

        async with session.post(
            f"{server_url}/tools/call",
            headers=headers,
            json=tool_request
        ) as response:
            result = await response.json()
            print("Markdown content:", result['content'][:200] + "...")

        # 3. Take a screenshot
        screenshot_request = {
            "name": "screenshot",
            "arguments": {
                "url": "https://example.com",
                "full_page": True
            }
        }

        async with session.post(
            f"{server_url}/tools/call",
            headers=headers,
            json=screenshot_request
        ) as response:
            result = await response.json()
            print("Screenshot saved:", result.get('path', 'Screenshot data returned'))

        # 4. Execute JavaScript on a page
        js_request = {
            "name": "execute_js",
            "arguments": {
                "url": "https://example.com",
                "script": "document.title"
            }
        }

        async with session.post(
            f"{server_url}/tools/call",
            headers=headers,
            json=js_request
        ) as response:
            result = await response.json()
            print("Page title:", result['content'])

# Run the example
if __name__ == "__main__":
    # First, start the MCP server in HTTP mode:
    # docker run -p 3000:3000 \
    #   -e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
    #   -e CRAWL4AI_BEARER_TOKEN="your-api-token" \
    #   stgmt/crawl4ai-mcp:latest crawl4ai-mcp --http --port 3000

    asyncio.run(test_crawl4ai_mcp())

Installation for Python integration

pip install aiohttp  # For HTTP client

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

📄 License

MIT License - see LICENSE file for details.

🔗 Links


Made with ❤️ for the AI community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawl4ai_mcp_sse_stdio-1.1.0.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawl4ai_mcp_sse_stdio-1.1.0-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file crawl4ai_mcp_sse_stdio-1.1.0.tar.gz.

File metadata

  • Download URL: crawl4ai_mcp_sse_stdio-1.1.0.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for crawl4ai_mcp_sse_stdio-1.1.0.tar.gz
Algorithm Hash digest
SHA256 6af56d90cc331b7e76ba0e792730f9ce5798cd6a88a8491b75b7b810e5a5173a
MD5 ec077024316c21c661fa8c86150b94ec
BLAKE2b-256 937f7c700bc03bf60027ccff0b7287f974c69bdf63a7821bd1964e86ec657dc6

See more details on using hashes here.

File details

Details for the file crawl4ai_mcp_sse_stdio-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for crawl4ai_mcp_sse_stdio-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bdaeecae147e37e24672c7fec5b298a3e3d4829526427f067b4586437ee7451a
MD5 ca608eb0a28558afea60ff28fbce66a8
BLAKE2b-256 6a291ceee04a3cd4cf927a97e5a5b9ad3876b175b66f920b185823c821030a7c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page