Model Context Protocol server for web crawling powered by Crawl4AI

These details have not been verified by PyPI

Project description

Crawl4AI-MCP

A Model Context Protocol server for web crawling powered by Crawl4AI

Overview

Crawl4AI-MCP is a Model Context Protocol server that gives AI systems access to the live web. Built on FastMCP v3 and Crawl4AI, it exposes 4 tools, 2 resources, and 3 prompts through the standardized MCP interface, backed by a lifespan-managed headless Chromium browser.

Only 2 runtime dependencies — fastmcp and crawl4ai.

[!TIP] Full documentation site →

Key Features

Full MCP compliance via FastMCP v3 with tool annotations (readOnlyHint, destructiveHint, etc.)
4 focused tools centered on canonical scrape/crawl plus session lifecycle/artifacts
3 prompts for common LLM workflows (summarize, extract schema, compare pages)
2 resources exposing server configuration and version info
Headless Chromium managed as a lifespan singleton (start once, reuse everywhere)
Multiple transports — stdio (default) and Streamable HTTP
LLM-optimized output — markdown, cleaned HTML, raw HTML, or plain text
Canonical option groups for extraction, runtime, diagnostics, sessions, rendering, and traversal
List and deep traversal in one crawl contract
Session-aware workflows with explicit session close and artifact retrieval tools
Auto browser setup — detects missing Playwright browsers and installs automatically

Installation

pip

pip install crawl4ai-mcp
crawl4ai-mcp --setup       # one-time: installs Playwright browsers

uv (recommended)

uv add crawl4ai-mcp
crawl4ai-mcp --setup       # one-time: installs Playwright browsers

Docker

docker build -t crawl4ai-mcp .
docker run -p 8000:8000 crawl4ai-mcp

The Docker image includes Playwright browsers — no separate setup needed.

Development

git clone https://github.com/wyattowalsh/crawl4ai-mcp.git
cd crawl4ai-mcp
uv sync --group dev
crawl4ai-mcp --setup

[!NOTE] The server auto-detects missing Playwright browsers on first startup and attempts to install them automatically. You can also run crawl4ai-mcp --setup or crawl4ai-setup manually at any time.

Quick Start

stdio (default — for Claude Desktop, Cursor, etc.)

crawl4ai-mcp

HTTP transport

crawl4ai-mcp --transport http --port 8000

[!NOTE] HTTP binds to 127.0.0.1 by default (private/local only); for external exposure, set --host explicitly and use a reverse proxy for TLS/auth.

Claude Desktop configuration

Add to your Claude Desktop MCP settings (claude_desktop_config.json):

{
  "mcpServers": {
    "crawl4ai": {
      "command": "crawl4ai-mcp",
      "args": ["--transport", "stdio"]
    }
  }
}

Claude Code configuration

claude mcp add crawl4ai -- crawl4ai-mcp --transport stdio

MCP Inspector

npx @modelcontextprotocol/inspector crawl4ai-mcp

Tools

The canonical surface now exposes 4 tools:

`scrape`

Scrape one URL or a bounded list of URLs (up to 20) with a single canonical envelope response.

Input: targets (str | list[str]) and optional grouped options
Supports extraction (schema, extraction_mode), runtime controls, diagnostics, session settings, render settings, and artifact capture
Returns canonical JSON envelope with schema_version, tool, ok, data/items, meta, warnings, error

`crawl`

Crawl with canonical traversal controls.

options.traversal.mode="list" for bounded list traversal
options.traversal.mode="deep" for recursive BFS/DFS traversal from a single seed
Shares scrape option groups plus traversal options in the same envelope shape

`close_session`

Close a stateful session created via options.session.session_id.

`get_artifact`

Retrieve artifact metadata/content captured during scrape or crawl when options.conversion.capture_artifacts is enabled.

Choosing core vs advanced usage

Core path (recommended): use scrape/crawl with minimal options (runtime, conversion.output_format, traversal.mode="list"). This keeps behavior predictable and low-risk for most agent workflows.
Advanced path (explicit opt-in): use deep traversal, custom dispatcher controls, JS transforms, extraction schemas, and artifact capture only when required by task outcomes.
Safety budgets and gates: inspect config://server for settings.defaults, settings.limits, settings.policies, and settings.capabilities to understand active constraints and feature gates before enabling advanced options.

Resources

URI	MIME Type	Description
`config://server`	`application/json`	Current server configuration: name, version, tool list, browser config
`crawl4ai://version`	`application/json`	Server and dependency version information (server, crawl4ai, fastmcp)

Prompts

Prompt	Parameters	Description
`summarize_page`	`url`, `focus` (default: `"key points"`)	Crawl a page and summarize its content with the specified focus
`build_extraction_schema`	`url`, `data_type`	Inspect a page and build a CSS extraction schema for `scrape`
`compare_pages`	`url1`, `url2`	Crawl two pages and produce a structured comparison

Architecture

graph TD
    A[MCP Client] -->|stdio / HTTP| B[FastMCP v3]
    B --> C[Tool Router]
    C --> D[scrape]
    C --> E[crawl]
    C --> F[close_session]
    C --> G[get_artifact]
    D & E & F & G --> N[AsyncWebCrawler Singleton]
    N --> O[Headless Chromium]
    B --> P[Resources]
    B --> Q[Prompts]

    style B fill:#4B8BBE,color:#fff
    style N fill:#FF6B35,color:#fff
    style O fill:#2496ED,color:#fff

The server uses a single-module architecture:

FastMCP v3 handles MCP protocol negotiation, transport, tool/resource/prompt registration, and message routing
Lifespan-managed AsyncWebCrawler starts a headless Chromium browser once at server startup and shares it across all tool invocations, then shuts it down cleanly on exit
4 tool functions decorated with @mcp.tool() define the canonical surface
2 resource functions decorated with @mcp.resource() return JSON
3 prompt functions decorated with @mcp.prompt return structured Message lists

[!IMPORTANT] There are no intermediate manager classes or custom HTTP clients. The server delegates all crawling to crawl4ai's AsyncWebCrawler and all protocol handling to FastMCP. Only 2 runtime dependencies.

Configuration

CLI flags

Flag	Default	Description
`--transport`	`stdio`	Transport protocol: `stdio` or `http`
`--host`	`127.0.0.1`	Host to bind (HTTP transport only)
`--port`	`8000`	Port to bind (HTTP transport only)
`--setup`	—	Install Playwright browsers and exit

Environment variables

No environment variables are required. The server uses sensible defaults for all configuration. Crawl4AI's own environment variables (e.g., CRAWL4AI_VERBOSE) are respected if set.

Testing

# Run all tests
uv run pytest

# Coverage gate (>=90%)
uv run pytest --cov=crawl4ai_mcp --cov-report=term-missing

# Smoke tests only
uv run pytest -m smoke

# Unit tests only
uv run pytest -m unit

# Integration workflow tests
uv run pytest -m integration

# End-to-end workflow tests
uv run pytest -m e2e

# Manual live test (requires browser)
uv run python tests/manual/test_live.py

[!NOTE] All automated tests run in-memory using fastmcp.Client(mcp) — no browser or network required. The test suite mocks AsyncWebCrawler for fast, deterministic execution.

Contributing

See the Contributing Guide for details on setting up your development environment, coding standards, and the pull request process.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Crawl4AI-MCP — Connecting AI to the Live Web

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.1

Mar 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_crawl4ai-0.3.1.tar.gz (47.3 kB view details)

Uploaded Mar 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcp_crawl4ai-0.3.1-py3-none-any.whl (34.4 kB view details)

Uploaded Mar 5, 2026 Python 3

File details

Details for the file mcp_crawl4ai-0.3.1.tar.gz.

File metadata

Download URL: mcp_crawl4ai-0.3.1.tar.gz
Upload date: Mar 5, 2026
Size: 47.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_crawl4ai-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`a289c840d4c2ac7008a1f34534f99b696e25948fe586aa3a5a006b7b6da22ba0`
MD5	`6238ac0c4273e79dcdde84b170a6ec22`
BLAKE2b-256	`98e83275bb91aa14cd8ae935c4403a59f986c88ed232caadee2363e0424fb861`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_crawl4ai-0.3.1.tar.gz:

Publisher: release.yml on wyattowalsh/crawl4ai-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mcp_crawl4ai-0.3.1.tar.gz
- Subject digest: a289c840d4c2ac7008a1f34534f99b696e25948fe586aa3a5a006b7b6da22ba0
- Sigstore transparency entry: 1037162144
- Sigstore integration time: Mar 5, 2026
Source repository:
- Permalink: wyattowalsh/crawl4ai-mcp@730610a9317905fe0d680001c8651da4f6788648
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/wyattowalsh
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@730610a9317905fe0d680001c8651da4f6788648
- Trigger Event: push

File details

Details for the file mcp_crawl4ai-0.3.1-py3-none-any.whl.

File metadata

Download URL: mcp_crawl4ai-0.3.1-py3-none-any.whl
Upload date: Mar 5, 2026
Size: 34.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_crawl4ai-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e67336c21bb9c6071df5575de62e1755a8c0031187481507a02fd67034150c73`
MD5	`6fc37959e1c45e57abcefb5705927de5`
BLAKE2b-256	`c90c86386132f03ca09530dc8e8f8b2caf811f8335c4c95dc3ba59fa7c3ca3b4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mcp_crawl4ai-0.3.1-py3-none-any.whl:

Publisher: release.yml on wyattowalsh/crawl4ai-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mcp_crawl4ai-0.3.1-py3-none-any.whl
- Subject digest: e67336c21bb9c6071df5575de62e1755a8c0031187481507a02fd67034150c73
- Sigstore transparency entry: 1037162517
- Sigstore integration time: Mar 5, 2026
Source repository:
- Permalink: wyattowalsh/crawl4ai-mcp@730610a9317905fe0d680001c8651da4f6788648
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/wyattowalsh
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@730610a9317905fe0d680001c8651da4f6788648
- Trigger Event: push

mcp-crawl4ai 0.3.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Crawl4AI-MCP

Overview

Key Features

Installation

Quick Start

Tools

scrape

crawl

close_session

get_artifact

Choosing core vs advanced usage

Resources

Prompts

Architecture

Configuration

Testing

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`scrape`

`crawl`

`close_session`

`get_artifact`