MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction

These details have not been verified by PyPI

Project links

Project description

🕷️ Crawl4AI MCP Server

MCP (Model Context Protocol) server for Crawl4AI - Universal web crawling and data extraction for AI agents.

Integrate powerful web scraping capabilities into Claude, ChatGPT, and any MCP-compatible AI assistant.

🎯 Why This Tool?

The Problem

🔴 No MCP servers for web scraping - AI agents can't access web content
🔴 Complex scraping setup - Crawl4AI requires custom integration
🔴 Limited protocol support - Most tools only support one transport
🔴 Poor AI integration - Existing scrapers aren't optimized for LLMs

Our Solution

✅ First Crawl4AI MCP server - Native MCP integration
✅ All MCP transports - STDIO, SSE, and HTTP support
✅ AI-optimized extraction - Clean markdown, structured data
✅ One-line execution - crawl4ai-mcp --stdio
✅ Production ready - Type hints, tests, Docker support

⚡ Quick Start

One-Line Execution

# Install and run
pip install crawl4ai-mcp
crawl4ai-mcp --stdio

With Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "crawl4ai": {
      "command": "crawl4ai-mcp",
      "args": ["--stdio"]
    }
  }
}

With Any MCP Client

# STDIO mode (for CLI tools)
crawl4ai-mcp --stdio

# SSE mode (for web clients)  
crawl4ai-mcp --sse

# HTTP mode (for REST API)
crawl4ai-mcp --http

🚀 Features

Core Capabilities

🌐 Universal Web Scraping - Extract content from any website
📝 Markdown Conversion - Clean, formatted markdown output
📸 Screenshots - Capture visual content
📄 PDF Generation - Save pages as PDF
🎭 JavaScript Execution - Interact with dynamic content
🔄 Multiple Transports - STDIO, SSE, HTTP protocols

Why Choose crawl4ai-mcp?

Feature	crawl4ai-mcp	Other Tools
MCP Protocol Support	✅ Full	❌ None
Crawl4AI Integration	✅ Native	❌ Manual
Transport Protocols	✅ All 3	⚠️ Usually 1
AI Optimization	✅ Built-in	❌ Generic
Production Ready	✅ Yes	⚠️ Varies
Docker Support	✅ Yes	⚠️ Limited

📦 Installation

From PyPI

pip install crawl4ai-mcp

From Source

git clone https://github.com/stgmt/crawl4ai-mcp.git
cd crawl4ai-mcp
pip install -e .

With Docker

docker pull stgmt/crawl4ai-mcp
docker run -p 3000:3000 stgmt/crawl4ai-mcp

🔧 Usage

Basic Command Line

# Default HTTP mode
crawl4ai-mcp

# STDIO mode for CLI integration
crawl4ai-mcp --stdio

# SSE mode for real-time streaming
crawl4ai-mcp --sse

# HTTP mode explicitly
crawl4ai-mcp --http

Python Integration

import asyncio
from crawl4ai_mcp import Crawl4AIMCPServer

async def main():
    server = Crawl4AIMCPServer()
    
    # Run in STDIO mode
    await server.run_stdio()
    
    # Or run in HTTP mode
    # server.run_http(host="0.0.0.0", port=3000)

asyncio.run(main())

With MCP Clients

Using mcp-client SDK

from mcp import ClientSession, StdioServerParameters
import asyncio

async def main():
    server_params = StdioServerParameters(
        command="crawl4ai-mcp",
        args=["--stdio"]
    )
    
    async with ClientSession(server_params) as session:
        # List available tools
        tools = await session.list_tools()
        
        # Crawl a webpage
        result = await session.call_tool(
            "crawl",
            {"url": "https://example.com"}
        )
        print(result)

asyncio.run(main())

🛠️ Available Tools

1. `crawl` - Full Web Crawling

Extract complete content from any webpage.

{
  "name": "crawl",
  "arguments": {
    "url": "https://example.com",
    "wait_for": "css:.content",
    "timeout": 30000
  }
}

2. `md` - Markdown Extraction

Get clean markdown content from webpages.

{
  "name": "md", 
  "arguments": {
    "url": "https://docs.example.com",
    "clean": true
  }
}

3. `html` - Raw HTML

Retrieve raw HTML content.

{
  "name": "html",
  "arguments": {
    "url": "https://example.com"
  }
}

4. `screenshot` - Visual Capture

Take screenshots of webpages.

{
  "name": "screenshot",
  "arguments": {
    "url": "https://example.com",
    "full_page": true
  }
}

5. `pdf` - PDF Generation

Convert webpages to PDF.

{
  "name": "pdf",
  "arguments": {
    "url": "https://example.com",
    "format": "A4"
  }
}

6. `execute_js` - JavaScript Execution

Execute JavaScript on webpages.

{
  "name": "execute_js",
  "arguments": {
    "url": "https://example.com",
    "script": "document.title"
  }
}

🌐 Transport Protocols

STDIO Transport

Best for command-line tools and local development.

crawl4ai-mcp --stdio

Use cases:

Claude Desktop app
Terminal-based MCP clients
Local development and testing
CI/CD pipelines

SSE Transport (Server-Sent Events)

Ideal for real-time web applications.

crawl4ai-mcp --sse

Use cases:

Web-based MCP clients
Real-time streaming applications
Browser extensions
Progressive web apps

HTTP Transport

Standard REST API for maximum compatibility.

crawl4ai-mcp --http

Use cases:

REST API clients
Microservice architectures
Cloud deployments
Load-balanced environments

⚙️ Configuration

Environment Variables

# Crawl4AI endpoint (if using remote instance)
CRAWL4AI_ENDPOINT=http://localhost:8000

# Server ports
HTTP_PORT=3000
SSE_PORT=3001

# Logging
LOG_LEVEL=INFO

# Performance
MAX_CONCURRENT_REQUESTS=10
REQUEST_TIMEOUT=30

Configuration File

Create .env file:

CRAWL4AI_ENDPOINT=http://your-crawl4ai-instance.com
HTTP_PORT=3000
SSE_PORT=3001
LOG_LEVEL=DEBUG
DEBUG=true

Advanced Settings

# config/settings.py
from pydantic import BaseSettings

class Settings(BaseSettings):
    crawl4ai_endpoint: str = "http://localhost:8000"
    http_port: int = 3000
    sse_port: int = 3001
    log_level: str = "INFO"
    debug: bool = False
    max_concurrent_requests: int = 10
    request_timeout: int = 30
    
    class Config:
        env_file = ".env"

settings = Settings()

🐳 Docker Support

Quick Start with Docker

# Run with default settings
docker run -p 3000:3000 stgmt/crawl4ai-mcp

# With environment variables
docker run -p 3000:3000 \
  -e CRAWL4AI_ENDPOINT=http://crawl4ai:8000 \
  -e LOG_LEVEL=DEBUG \
  stgmt/crawl4ai-mcp

# With Docker Compose
docker-compose up

Docker Compose

version: '3.8'

services:
  crawl4ai-mcp:
    image: stgmt/crawl4ai-mcp
    ports:
      - "3000:3000"
      - "3001:3001"
    environment:
      - CRAWL4AI_ENDPOINT=http://crawl4ai:8000
      - LOG_LEVEL=INFO
    depends_on:
      - crawl4ai
      
  crawl4ai:
    image: crawl4ai/crawl4ai
    ports:
      - "8000:8000"

Building Custom Image

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

RUN pip install -e .

EXPOSE 3000 3001

CMD ["crawl4ai-mcp", "--http"]

🧪 Testing

Running Tests

# Install test dependencies
pip install -e .[test]

# Run all tests
pytest

# Run with coverage
pytest --cov=crawl4ai_mcp

# Run specific test
pytest tests/test_server.py::test_crawl_tool

Testing with MCP Server Tester

This MCP server can be comprehensively tested using the MCP Server Tester that supports all three transport protocols (STDIO, SSE, HTTP).

Install MCP Server Tester

# Option 1: Using Docker (recommended)
docker run -it stgmt/mcp-server-tester test --help

# Option 2: Using NPM
npm install -g mcp-server-tester-sse-http-stdio
mcp-server-tester test --help

# Option 3: Using Python
pip install mcp-server-tester-sse-http-stdio
mcp-server-tester test --help

Test Examples

Test STDIO mode:

# Docker
docker run -it stgmt/mcp-server-tester test \
  --transport stdio \
  --command "crawl4ai-mcp --stdio"

# NPM/Python
mcp-server-tester test \
  --transport stdio \
  --command "crawl4ai-mcp --stdio"

Test HTTP mode:

# Start the server first
crawl4ai-mcp --http &

# Run tests
mcp-server-tester test \
  --transport http \
  --url http://localhost:3000

Test SSE mode:

# Start the server first
crawl4ai-mcp --sse &

# Run tests
mcp-server-tester test \
  --transport sse \
  --url http://localhost:3001

Test with configuration file:

Create test-config.yaml:

name: Crawl4AI Comprehensive Tests
transport: stdio
command: crawl4ai-mcp --stdio
tests:
  - name: Test markdown extraction
    tool: md
    arguments:
      url: https://example.com
      f: fit
    assert:
      - type: contains
        value: "Example Domain"
  
  - name: Test screenshot
    tool: screenshot
    arguments:
      url: https://example.com
      screenshot_wait_for: 2
    assert:
      - type: exists
        path: result
  
  - name: Test HTML extraction
    tool: html
    arguments:
      url: https://example.com
    assert:
      - type: contains
        value: "<html"
  
  - name: Test JavaScript execution
    tool: execute_js
    arguments:
      url: https://example.com
      scripts: ["document.title"]
    assert:
      - type: contains
        value: "Example"

Run the test:

mcp-server-tester test -f test-config.yaml

Interactive Testing

The tester also provides an interactive mode for manual testing:

# Interactive STDIO mode
mcp-server-tester interactive \
  --transport stdio \
  --command "crawl4ai-mcp --stdio"

# Interactive HTTP mode
mcp-server-tester interactive \
  --transport http \
  --url http://localhost:3000

📚 Examples

Example 1: Extract Documentation

# Extract markdown from documentation
result = await session.call_tool("md", {
    "url": "https://docs.python.org/3/",
    "clean": True
})

Example 2: Monitor Price Changes

# Screenshot for visual comparison
screenshot = await session.call_tool("screenshot", {
    "url": "https://store.example.com/product",
    "full_page": False
})

# Extract price via JavaScript
price = await session.call_tool("execute_js", {
    "url": "https://store.example.com/product",
    "script": "document.querySelector('.price').innerText"
})

Example 3: Generate Reports

# Generate PDF report
pdf = await session.call_tool("pdf", {
    "url": "https://analytics.example.com/report",
    "format": "A4",
    "landscape": True
})

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Development Setup

# Clone the repository
git clone https://github.com/stgmt/crawl4ai-mcp.git
cd crawl4ai-mcp

# Install in development mode
pip install -e .[dev]

# Run tests
pytest

# Format code
black crawl4ai_mcp tests
ruff check --fix crawl4ai_mcp tests

📄 License

MIT License - see LICENSE file for details.

🔗 Links

PyPI Package: https://pypi.org/project/crawl4ai-mcp/
GitHub Repository: https://github.com/stgmt/crawl4ai-mcp
Documentation: https://github.com/stgmt/crawl4ai-mcp#readme
Issues: https://github.com/stgmt/crawl4ai-mcp/issues

🙏 Acknowledgments

Crawl4AI - The powerful crawling engine
MCP - Model Context Protocol specification
Anthropic - For creating the MCP standard

Made with ❤️ for the AI community

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.0

Sep 15, 2025

1.0.9

Sep 12, 2025

1.0.8

Sep 12, 2025

1.0.7

Sep 12, 2025

1.0.6

Sep 11, 2025

1.0.5

Sep 11, 2025

This version

1.0.4

Sep 11, 2025

1.0.3

Sep 9, 2025

1.0.1

Sep 9, 2025

1.0.0

Sep 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawl4ai_mcp_sse_stdio-1.0.4.tar.gz (17.1 kB view details)

Uploaded Sep 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crawl4ai_mcp_sse_stdio-1.0.4-py3-none-any.whl (12.2 kB view details)

Uploaded Sep 11, 2025 Python 3

File details

Details for the file crawl4ai_mcp_sse_stdio-1.0.4.tar.gz.

File metadata

Download URL: crawl4ai_mcp_sse_stdio-1.0.4.tar.gz
Upload date: Sep 11, 2025
Size: 17.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for crawl4ai_mcp_sse_stdio-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`0a333b7079f0f5d9aed6a015a7277a1d531ad1bd71b1b681c3afc51c32f661a3`
MD5	`2c022a449607708e1333c9d26c609bb0`
BLAKE2b-256	`1e650e9f296f629c1ad66903441896280924f3829bd01d800daad216ca08e848`

See more details on using hashes here.

File details

Details for the file crawl4ai_mcp_sse_stdio-1.0.4-py3-none-any.whl.

File metadata

Download URL: crawl4ai_mcp_sse_stdio-1.0.4-py3-none-any.whl
Upload date: Sep 11, 2025
Size: 12.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for crawl4ai_mcp_sse_stdio-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f7aff9c86eb91d63619fe7de7fcc73672383d6025889e3686a7535adab1778e3`
MD5	`ab0a93ec92caec2a8207bc4d75ef2192`
BLAKE2b-256	`2233d6171705a09c469a96cee9bc5220b6f354f142d52aa7c0aa0a72076a401a`

See more details on using hashes here.

crawl4ai-mcp-sse-stdio 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🕷️ Crawl4AI MCP Server

📑 Table of Contents

🎯 Why This Tool?

The Problem

Our Solution

⚡ Quick Start

One-Line Execution

With Claude Desktop

With Any MCP Client

🚀 Features

Core Capabilities

Why Choose crawl4ai-mcp?

📦 Installation

From PyPI

From Source

With Docker

🔧 Usage

Basic Command Line

Python Integration

With MCP Clients

Using mcp-client SDK

🛠️ Available Tools

1. crawl - Full Web Crawling

2. md - Markdown Extraction

3. html - Raw HTML

4. screenshot - Visual Capture

5. pdf - PDF Generation

6. execute_js - JavaScript Execution

🌐 Transport Protocols

STDIO Transport

SSE Transport (Server-Sent Events)

HTTP Transport

⚙️ Configuration

Environment Variables

Configuration File

Advanced Settings

🐳 Docker Support

Quick Start with Docker

Docker Compose

Building Custom Image

🧪 Testing

Running Tests

Testing with MCP Server Tester

Install MCP Server Tester

Test Examples

Interactive Testing

📚 Examples

Example 1: Extract Documentation

Example 2: Monitor Price Changes

Example 3: Generate Reports

🤝 Contributing

Development Setup

📄 License

🔗 Links

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

1. `crawl` - Full Web Crawling

2. `md` - Markdown Extraction

3. `html` - Raw HTML

4. `screenshot` - Visual Capture

5. `pdf` - PDF Generation

6. `execute_js` - JavaScript Execution