MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction

These details have not been verified by PyPI

Project links

Project description

🕷️ Crawl4AI MCP Server

MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction for AI agents.

Integrate powerful web scraping capabilities into Claude, ChatGPT, and any MCP-compatible AI assistant.

🐳 Quick Start with Docker (Recommended)

✨ Docker is the preferred way to run Crawl4AI MCP Server - everything is pre-installed and ready to go!

Option 1: Docker Hub Image (Latest)

# SSE mode (for web interfaces) - DEFAULT
docker run --rm -p 3001:9001 \
  -e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
  -e CRAWL4AI_BEARER_TOKEN="your-optional-token" \
  stgmt/crawl4ai-mcp:latest crawl4ai-mcp --sse

# HTTP mode (for REST API)
docker run --rm -p 3000:3000 \
  -e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
  -e CRAWL4AI_BEARER_TOKEN="your-optional-token" \
  stgmt/crawl4ai-mcp:latest crawl4ai-mcp --http --port 3000

# STDIO mode (for Claude Desktop)
docker run --rm -it \
  -e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
  -e CRAWL4AI_BEARER_TOKEN="your-optional-token" \
  stgmt/crawl4ai-mcp:latest crawl4ai-mcp --stdio

Option 2: Build from Source (Latest fixes)

# Clone and build
git clone https://github.com/stgmt/crawl4ai-mcp.git
cd crawl4ai-mcp
docker build -t crawl4ai-mcp:local .

# Run SSE mode
docker run --rm -p 3001:9001 \
  -e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
  -e CRAWL4AI_BEARER_TOKEN="your-optional-token" \
  crawl4ai-mcp:local crawl4ai-mcp --sse

With Claude Desktop (Docker)

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "crawl4ai": {
      "command": "docker",
      "args": [
        "run", "--rm", "-i",
        "-e", "CRAWL4AI_ENDPOINT=https://your-crawl4ai-server.com",
        "-e", "CRAWL4AI_BEARER_TOKEN=your-optional-token",
        "stgmt/crawl4ai-mcp:latest",
        "crawl4ai-mcp", "--stdio"
      ]
    }
  }
}

📦 Alternative Installation Methods

NPM Package

# Install globally
npm install -g crawl4ai-mcp-sse-stdio

# Run in different modes (set CRAWL4AI_ENDPOINT first)
export CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com"
npx crawl4ai-mcp --stdio
npx crawl4ai-mcp --sse --port 3001
npx crawl4ai-mcp --http --port 3000

Python Package (PyPI)

# Install from PyPI
pip install crawl4ai-mcp

<<<<<<< HEAD
# Set required endpoint and run
export CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com"
crawl4ai-mcp --stdio
=======
# Run with command line arguments (recommended)
crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com
crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com
crawl4ai-mcp --sse --port 3001 --endpoint https://your-crawl4ai-server.com

# With optional bearer token
crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com --bearer-token your-token

From NPM (Alternative)

npm install -g crawl4ai-mcp-sse-stdio

# Run with command line arguments (recommended)
npx crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com
npx crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com
npx crawl4ai-mcp --sse --port 3001 --endpoint https://your-crawl4ai-server.com

# With optional bearer token
npx crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com --bearer-token your-token

From Source

git clone https://github.com/stgmt/crawl4ai-mcp.git
cd crawl4ai-mcp
pip install -e .

With Claude Desktop (Non-Docker)

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "crawl4ai": {
      "command": "crawl4ai-mcp",
      "args": ["--stdio"],
      "env": {
        "CRAWL4AI_ENDPOINT": "https://your-crawl4ai-server.com",
        "CRAWL4AI_BEARER_TOKEN": "your-optional-token"
      }
    }
  }
}

🛠️ Available Tools

1. `crawl` - Full Web Crawling

Extract complete content from any webpage.

{
  "name": "crawl",
  "arguments": {
    "url": "https://example.com",
    "wait_for": "css:.content",
    "timeout": 30000
  }
}

2. `md` - Markdown Extraction

Get clean markdown content from webpages.

{
  "name": "md", 
  "arguments": {
    "url": "https://docs.example.com",
    "clean": true
  }
}

3. `html` - Raw HTML

Retrieve raw HTML content.

{
  "name": "html",
  "arguments": {
    "url": "https://example.com"
  }
}

4. `screenshot` - Visual Capture

Take screenshots of webpages.

{
  "name": "screenshot",
  "arguments": {
    "url": "https://example.com",
    "full_page": true
  }
}

5. `pdf` - PDF Generation

Convert webpages to PDF.

{
  "name": "pdf",
  "arguments": {
    "url": "https://example.com",
    "format": "A4"
  }
}

6. `execute_js` - JavaScript Execution

Execute JavaScript on webpages.

{
  "name": "execute_js",
  "arguments": {
    "url": "https://example.com",
    "script": "document.title"
  }
}

🚀 Usage

The crawl4ai-mcp server supports multiple transport modes and provides comprehensive web crawling capabilities through the Model Context Protocol.

Basic Commands

# HTTP mode (recommended for testing)
crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com

# SSE mode (Server-Sent Events)
crawl4ai-mcp --sse --port 3001 --endpoint https://your-crawl4ai-server.com

# STDIO mode (for MCP clients)
crawl4ai-mcp --stdio --endpoint https://your-crawl4ai-server.com

# With optional bearer token
crawl4ai-mcp --http --port 3000 --endpoint https://your-crawl4ai-server.com --bearer-token your-token

With Custom Endpoint

# Using custom Crawl4AI endpoint with bearer token
crawl4ai-mcp --http --port 3000 \
  --crawl4ai-endpoint "https://your-server.com" \
  --bearer-token "your-token"

See the Python Integration section for detailed code examples.

⚙️ Configuration

Environment Variables

# REQUIRED: Crawl4AI endpoint URL
export CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com"

# OPTIONAL: Bearer authentication token
export CRAWL4AI_BEARER_TOKEN="your-api-token"

Parameter Requirements:

CRAWL4AI_ENDPOINT - Required - The URL of your Crawl4AI server instance
CRAWL4AI_BEARER_TOKEN - Optional - Bearer token for authenticated API access

Command Line Options

crawl4ai-mcp --help

Options:
  --stdio              Run in STDIO mode for MCP clients
  --sse                Run in SSE mode for web interfaces (default)
  --http               Run in HTTP mode
  --endpoint ENDPOINT  Crawl4AI API endpoint URL (REQUIRED)
  --bearer-token TOKEN Bearer authentication token (OPTIONAL)
  --version, -v        Show version

🐍 Python Integration Example

Here's how to integrate the MCP server with your Python application using HTTP mode with bearer token authentication:

import asyncio
import aiohttp
import json

async def test_crawl4ai_mcp():
    """
    Example: Using Crawl4AI MCP server via HTTP with bearer token
    """
    # Server configuration
    server_url = "http://localhost:3000"
    bearer_token = "your-api-token"  # Optional

    headers = {
        "Content-Type": "application/json"
    }

    # Add bearer token if available
    if bearer_token:
        headers["Authorization"] = f"Bearer {bearer_token}"

    async with aiohttp.ClientSession() as session:
        # 1. List available tools
        async with session.post(
            f"{server_url}/tools/list",
            headers=headers
        ) as response:
            tools = await response.json()
            print("Available tools:", [tool['name'] for tool in tools['tools']])

        # 2. Extract markdown from a webpage
        tool_request = {
            "name": "md",
            "arguments": {
                "url": "https://example.com",
                "clean": True
            }
        }

        async with session.post(
            f"{server_url}/tools/call",
            headers=headers,
            json=tool_request
        ) as response:
            result = await response.json()
            print("Markdown content:", result['content'][:200] + "...")

        # 3. Take a screenshot
        screenshot_request = {
            "name": "screenshot",
            "arguments": {
                "url": "https://example.com",
                "full_page": True
            }
        }

        async with session.post(
            f"{server_url}/tools/call",
            headers=headers,
            json=screenshot_request
        ) as response:
            result = await response.json()
            print("Screenshot saved:", result.get('path', 'Screenshot data returned'))

        # 4. Execute JavaScript on a page
        js_request = {
            "name": "execute_js",
            "arguments": {
                "url": "https://example.com",
                "script": "document.title"
            }
        }

        async with session.post(
            f"{server_url}/tools/call",
            headers=headers,
            json=js_request
        ) as response:
            result = await response.json()
            print("Page title:", result['content'])

# Run the example
if __name__ == "__main__":
    # First, start the MCP server in HTTP mode:
    # docker run -p 3000:3000 \
    #   -e CRAWL4AI_ENDPOINT="https://your-crawl4ai-server.com" \
    #   -e CRAWL4AI_BEARER_TOKEN="your-api-token" \
    #   stgmt/crawl4ai-mcp:latest crawl4ai-mcp --http --port 3000

    asyncio.run(test_crawl4ai_mcp())

Installation for Python integration

pip install aiohttp  # For HTTP client

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

📄 License

MIT License - see LICENSE file for details.

🔗 Links

PyPI Package: https://pypi.org/project/crawl4ai-mcp/
NPM Package: https://www.npmjs.com/package/crawl4ai-mcp-sse-stdio
Docker Hub: https://hub.docker.com/r/stgmt/crawl4ai-mcp
GitHub Repository: https://github.com/stgmt/crawl4ai-mcp

Made with ❤️ for the AI community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Sep 15, 2025

1.0.9

Sep 12, 2025

1.0.8

Sep 12, 2025

1.0.7

Sep 12, 2025

1.0.6

Sep 11, 2025

1.0.5

Sep 11, 2025

1.0.4

Sep 11, 2025

1.0.3

Sep 9, 2025

1.0.1

Sep 9, 2025

1.0.0

Sep 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawl4ai_mcp_sse_stdio-1.1.0.tar.gz (21.0 kB view details)

Uploaded Sep 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crawl4ai_mcp_sse_stdio-1.1.0-py3-none-any.whl (22.6 kB view details)

Uploaded Sep 15, 2025 Python 3

File details

Details for the file crawl4ai_mcp_sse_stdio-1.1.0.tar.gz.

File metadata

Download URL: crawl4ai_mcp_sse_stdio-1.1.0.tar.gz
Upload date: Sep 15, 2025
Size: 21.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for crawl4ai_mcp_sse_stdio-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6af56d90cc331b7e76ba0e792730f9ce5798cd6a88a8491b75b7b810e5a5173a`
MD5	`ec077024316c21c661fa8c86150b94ec`
BLAKE2b-256	`937f7c700bc03bf60027ccff0b7287f974c69bdf63a7821bd1964e86ec657dc6`

See more details on using hashes here.

File details

Details for the file crawl4ai_mcp_sse_stdio-1.1.0-py3-none-any.whl.

File metadata

Download URL: crawl4ai_mcp_sse_stdio-1.1.0-py3-none-any.whl
Upload date: Sep 15, 2025
Size: 22.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for crawl4ai_mcp_sse_stdio-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bdaeecae147e37e24672c7fec5b298a3e3d4829526427f067b4586437ee7451a`
MD5	`ca608eb0a28558afea60ff28fbce66a8`
BLAKE2b-256	`6a291ceee04a3cd4cf927a97e5a5b9ad3876b175b66f920b185823c821030a7c`

See more details on using hashes here.

crawl4ai-mcp-sse-stdio 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🕷️ Crawl4AI MCP Server

📑 Table of Contents

🐳 Quick Start with Docker (Recommended)

Option 1: Docker Hub Image (Latest)

Option 2: Build from Source (Latest fixes)

With Claude Desktop (Docker)

📦 Alternative Installation Methods

NPM Package

Python Package (PyPI)

From NPM (Alternative)

From Source

With Claude Desktop (Non-Docker)

🛠️ Available Tools

1. crawl - Full Web Crawling

2. md - Markdown Extraction

3. html - Raw HTML

4. screenshot - Visual Capture

5. pdf - PDF Generation

6. execute_js - JavaScript Execution

🚀 Usage

Basic Commands

With Custom Endpoint

⚙️ Configuration

Environment Variables

Command Line Options

🐍 Python Integration Example

Installation for Python integration

🤝 Contributing

📄 License

🔗 Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. `crawl` - Full Web Crawling

2. `md` - Markdown Extraction

3. `html` - Raw HTML

4. `screenshot` - Visual Capture

5. `pdf` - PDF Generation

6. `execute_js` - JavaScript Execution