Model Context Protocol server for web crawling powered by Crawl4AI
Project description
Overview
Crawl4AI-MCP is a Model Context Protocol server that gives AI systems access to the live web. Built on FastMCP v3 and Crawl4AI, it exposes 4 tools, 2 resources, and 3 prompts through the standardized MCP interface, backed by a lifespan-managed headless Chromium browser.
Only 2 runtime dependencies — fastmcp and crawl4ai.
[!TIP] Full documentation site →
Key Features
- Full MCP compliance via FastMCP v3 with tool annotations (
readOnlyHint,destructiveHint, etc.) - 4 focused tools centered on canonical scrape/crawl plus session lifecycle/artifacts
- 3 prompts for common LLM workflows (summarize, extract schema, compare pages)
- 2 resources exposing server configuration and version info
- Headless Chromium managed as a lifespan singleton (start once, reuse everywhere)
- Multiple transports — stdio (default) and Streamable HTTP
- LLM-optimized output — markdown, cleaned HTML, raw HTML, or plain text
- Canonical option groups for extraction, runtime, diagnostics, sessions, rendering, and traversal
- List and deep traversal in one
crawlcontract - Session-aware workflows with explicit session close and artifact retrieval tools
- Auto browser setup — detects missing Playwright browsers and installs automatically
Installation
pip
pip install crawl4ai-mcp
crawl4ai-mcp --setup # one-time: installs Playwright browsers
uv (recommended)
uv add crawl4ai-mcp
crawl4ai-mcp --setup # one-time: installs Playwright browsers
Docker
docker build -t crawl4ai-mcp .
docker run -p 8000:8000 crawl4ai-mcp
The Docker image includes Playwright browsers — no separate setup needed.
Development
git clone https://github.com/wyattowalsh/crawl4ai-mcp.git
cd crawl4ai-mcp
uv sync --group dev
crawl4ai-mcp --setup
[!NOTE] The server auto-detects missing Playwright browsers on first startup and attempts to install them automatically. You can also run
crawl4ai-mcp --setuporcrawl4ai-setupmanually at any time.
Quick Start
stdio (default — for Claude Desktop, Cursor, etc.)
crawl4ai-mcp
HTTP transport
crawl4ai-mcp --transport http --port 8000
[!NOTE] HTTP binds to
127.0.0.1by default (private/local only); for external exposure, set--hostexplicitly and use a reverse proxy for TLS/auth.
Claude Desktop configuration
Add to your Claude Desktop MCP settings (claude_desktop_config.json):
{
"mcpServers": {
"crawl4ai": {
"command": "crawl4ai-mcp",
"args": ["--transport", "stdio"]
}
}
}
Claude Code configuration
claude mcp add crawl4ai -- crawl4ai-mcp --transport stdio
MCP Inspector
npx @modelcontextprotocol/inspector crawl4ai-mcp
Tools
The canonical surface now exposes 4 tools:
scrape
Scrape one URL or a bounded list of URLs (up to 20) with a single canonical envelope response.
- Input:
targets(str | list[str]) and optional groupedoptions - Supports extraction (
schema,extraction_mode), runtime controls, diagnostics, session settings, render settings, and artifact capture - Returns canonical JSON envelope with
schema_version,tool,ok,data/items,meta,warnings,error
crawl
Crawl with canonical traversal controls.
options.traversal.mode="list"for bounded list traversaloptions.traversal.mode="deep"for recursive BFS/DFS traversal from a single seed- Shares scrape option groups plus traversal options in the same envelope shape
close_session
Close a stateful session created via options.session.session_id.
get_artifact
Retrieve artifact metadata/content captured during scrape or crawl when options.conversion.capture_artifacts is enabled.
Choosing core vs advanced usage
- Core path (recommended): use
scrape/crawlwith minimal options (runtime,conversion.output_format,traversal.mode="list"). This keeps behavior predictable and low-risk for most agent workflows. - Advanced path (explicit opt-in): use deep traversal, custom dispatcher controls, JS transforms, extraction schemas, and artifact capture only when required by task outcomes.
- Safety budgets and gates: inspect
config://serverforsettings.defaults,settings.limits,settings.policies, andsettings.capabilitiesto understand active constraints and feature gates before enabling advanced options.
Resources
| URI | MIME Type | Description |
|---|---|---|
config://server |
application/json |
Current server configuration: name, version, tool list, browser config |
crawl4ai://version |
application/json |
Server and dependency version information (server, crawl4ai, fastmcp) |
Prompts
| Prompt | Parameters | Description |
|---|---|---|
summarize_page |
url, focus (default: "key points") |
Crawl a page and summarize its content with the specified focus |
build_extraction_schema |
url, data_type |
Inspect a page and build a CSS extraction schema for scrape |
compare_pages |
url1, url2 |
Crawl two pages and produce a structured comparison |
Architecture
graph TD
A[MCP Client] -->|stdio / HTTP| B[FastMCP v3]
B --> C[Tool Router]
C --> D[scrape]
C --> E[crawl]
C --> F[close_session]
C --> G[get_artifact]
D & E & F & G --> N[AsyncWebCrawler Singleton]
N --> O[Headless Chromium]
B --> P[Resources]
B --> Q[Prompts]
style B fill:#4B8BBE,color:#fff
style N fill:#FF6B35,color:#fff
style O fill:#2496ED,color:#fff
The server uses a single-module architecture:
- FastMCP v3 handles MCP protocol negotiation, transport, tool/resource/prompt registration, and message routing
- Lifespan-managed
AsyncWebCrawlerstarts a headless Chromium browser once at server startup and shares it across all tool invocations, then shuts it down cleanly on exit - 4 tool functions decorated with
@mcp.tool()define the canonical surface - 2 resource functions decorated with
@mcp.resource()return JSON - 3 prompt functions decorated with
@mcp.promptreturn structuredMessagelists
[!IMPORTANT] There are no intermediate manager classes or custom HTTP clients. The server delegates all crawling to crawl4ai's
AsyncWebCrawlerand all protocol handling to FastMCP. Only 2 runtime dependencies.
Configuration
CLI flags
| Flag | Default | Description |
|---|---|---|
--transport |
stdio |
Transport protocol: stdio or http |
--host |
127.0.0.1 |
Host to bind (HTTP transport only) |
--port |
8000 |
Port to bind (HTTP transport only) |
--setup |
— | Install Playwright browsers and exit |
Environment variables
No environment variables are required. The server uses sensible defaults for all configuration. Crawl4AI's own environment variables (e.g., CRAWL4AI_VERBOSE) are respected if set.
Testing
# Run all tests
uv run pytest
# Coverage gate (>=90%)
uv run pytest --cov=crawl4ai_mcp --cov-report=term-missing
# Smoke tests only
uv run pytest -m smoke
# Unit tests only
uv run pytest -m unit
# Integration workflow tests
uv run pytest -m integration
# End-to-end workflow tests
uv run pytest -m e2e
# Manual live test (requires browser)
uv run python tests/manual/test_live.py
[!NOTE] All automated tests run in-memory using
fastmcp.Client(mcp)— no browser or network required. The test suite mocksAsyncWebCrawlerfor fast, deterministic execution.
Contributing
See the Contributing Guide for details on setting up your development environment, coding standards, and the pull request process.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Crawl4AI-MCP — Connecting AI to the Live Web
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_crawl4ai-0.3.1.tar.gz.
File metadata
- Download URL: mcp_crawl4ai-0.3.1.tar.gz
- Upload date:
- Size: 47.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a289c840d4c2ac7008a1f34534f99b696e25948fe586aa3a5a006b7b6da22ba0
|
|
| MD5 |
6238ac0c4273e79dcdde84b170a6ec22
|
|
| BLAKE2b-256 |
98e83275bb91aa14cd8ae935c4403a59f986c88ed232caadee2363e0424fb861
|
Provenance
The following attestation bundles were made for mcp_crawl4ai-0.3.1.tar.gz:
Publisher:
release.yml on wyattowalsh/crawl4ai-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_crawl4ai-0.3.1.tar.gz -
Subject digest:
a289c840d4c2ac7008a1f34534f99b696e25948fe586aa3a5a006b7b6da22ba0 - Sigstore transparency entry: 1037162144
- Sigstore integration time:
-
Permalink:
wyattowalsh/crawl4ai-mcp@730610a9317905fe0d680001c8651da4f6788648 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/wyattowalsh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@730610a9317905fe0d680001c8651da4f6788648 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mcp_crawl4ai-0.3.1-py3-none-any.whl.
File metadata
- Download URL: mcp_crawl4ai-0.3.1-py3-none-any.whl
- Upload date:
- Size: 34.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e67336c21bb9c6071df5575de62e1755a8c0031187481507a02fd67034150c73
|
|
| MD5 |
6fc37959e1c45e57abcefb5705927de5
|
|
| BLAKE2b-256 |
c90c86386132f03ca09530dc8e8f8b2caf811f8335c4c95dc3ba59fa7c3ca3b4
|
Provenance
The following attestation bundles were made for mcp_crawl4ai-0.3.1-py3-none-any.whl:
Publisher:
release.yml on wyattowalsh/crawl4ai-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_crawl4ai-0.3.1-py3-none-any.whl -
Subject digest:
e67336c21bb9c6071df5575de62e1755a8c0031187481507a02fd67034150c73 - Sigstore transparency entry: 1037162517
- Sigstore integration time:
-
Permalink:
wyattowalsh/crawl4ai-mcp@730610a9317905fe0d680001c8651da4f6788648 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/wyattowalsh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@730610a9317905fe0d680001c8651da4f6788648 -
Trigger Event:
push
-
Statement type: