A web crawler using Model Context Protocol (MCP)

Project description

Web Crawler MCP

Python License

A powerful web crawling tool that integrates with AI assistants via the MCP (Machine Conversation Protocol). This project allows you to crawl websites and save their content [...]

📋 Features

Website crawling with configurable depth
Support for internal and external links
Generation of structured Markdown files
Native integration with AI assistants via MCP
Detailed crawl result statistics
Error and not found page handling

🚀 Installation

Prerequisites

Python 3.9 or higher

Installation Steps

Clone this repository:

git clone laurentvv/crawl4ai-mcp
cd crawl4ai-mcp

Create and activate a virtual environment:

# Windows
python -m venv .venv
.venv\Scripts\activate

# Linux/MacOS
python -m venv .venv
source .venv/bin/activate

Install the required dependencies:

# Using pip
pip install -r requirements.txt

# Using UV (recommended)
pip install uv  # If UV is not yet installed
uv venv
uv pip install -e .

Using UV (Modern Python Package Manager)

This project now supports UV, a fast Python package installer and resolver.

# Install UV if you don't have it yet
pip install uv

# Create a virtual environment
uv venv

# Install the project and its dependencies
uv pip install -e .

# Install development dependencies
uv pip install -e ".[dev]"

🔧 Configuration

MCP Configuration for AI Assistants

To use this crawler with AI assistants like VScode Cline, configure your cline_mcp_settings.json file:

{
  "mcpServers": {
    "crawl": {
      "command": "PATH\\TO\\YOUR\\ENVIRONMENT\\.venv\\Scripts\\python.exe",
      "args": [
        "PATH\\TO\\YOUR\\PROJECT\\crawl_mcp.py"
      ],
      "disabled": false,
      "autoApprove": [],
      "timeout": 600
    }
  }
}

Replace PATH\\TO\\YOUR\\ENVIRONMENT and PATH\\TO\\YOUR\\PROJECT with the appropriate paths on your system.

Concrete Example (Windows)

{
  "mcpServers": {
    "crawl": {
      "command": "C:\\Python\\crawl4ai-mcp\\.venv\\Scripts\\python.exe",
      "args": [
        "D:\\Python\\crawl4ai-mcp\\crawl_mcp.py"
      ],
      "disabled": false,
      "autoApprove": [],
      "timeout": 600
    }
  }
}

🖥️ Usage

Usage with an AI Assistant (via MCP)

Once configured in your AI assistant, you can use the crawler by asking the assistant to perform a crawl using the following syntax:

Can you crawl the website https://example.com with a depth of 2?

The assistant will use the MCP protocol to run the crawling tool with the specified parameters.

Usage Examples with Claude

Here are examples of requests you can make to Claude after configuring the MCP tool:

Simple Crawl: "Can you crawl the site example.com and give me a summary?"
Crawl with Options: "Can you crawl https://example.com with a depth of 3 and include external links?"
Crawl with Custom Output: "Can you crawl the blog example.com and save the results in a file named 'blog_analysis.md'?"

📁 Result Structure

Crawl results are saved in the crawl_results folder at the root of the project. Each result file is in Markdown format with the following structure:

# https://example.com/page

## Metadata
- Depth: 1
- Timestamp: 2023-07-01T12:34:56

## Content
Extracted content from the page...

---

🛠️ Available Parameters

The crawl tool accepts the following parameters:

Parameter	Type	Description	Default Value
url	string	URL to crawl (required)	-
max_depth	integer	Maximum crawling depth	2
include_external	boolean	Include external links	false
verbose	boolean	Enable detailed output	true
output_file	string	Output file path	automatically generated

📊 Result Format

The tool returns a summary with:

URL crawled
Path to the generated file
Duration of the crawl
Statistics about processed pages (successful, failed, not found, access forbidden)

Results are saved in the crawl_results directory of your project.

🤝 Contribution

Contributions are welcome! Feel free to open an issue or submit a pull request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

0.1.2

May 5, 2025

0.1.1

May 5, 2025

This version

0.1.0

May 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_crawl-0.1.0.tar.gz (8.2 kB view details)

Uploaded May 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcp_crawl-0.1.0-py3-none-any.whl (8.1 kB view details)

Uploaded May 5, 2025 Python 3

File details

Details for the file mcp_crawl-0.1.0.tar.gz.

File metadata

Download URL: mcp_crawl-0.1.0.tar.gz
Upload date: May 5, 2025
Size: 8.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for mcp_crawl-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6e5cdf1ec59ec92e3218da6ca252f52697ddf5acc79181ace6eceb1fa2bd80cc`
MD5	`f0925149306d7a797a6fd1a743aa047e`
BLAKE2b-256	`33e2e2b53f3b81a69861ab18ae7082317d7d2e6ab0bebda216b0fc548d34783d`

See more details on using hashes here.

File details

Details for the file mcp_crawl-0.1.0-py3-none-any.whl.

File metadata

Download URL: mcp_crawl-0.1.0-py3-none-any.whl
Upload date: May 5, 2025
Size: 8.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for mcp_crawl-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7795f74570cb5d26c6af5d13ad29ad731e13e8055e29f012a949540889da6e15`
MD5	`a892a7cf6e6238faa46e37f3d651db6c`
BLAKE2b-256	`c03d9c1be1306fb1fbbc663e679c1fba006a4d6aef94de87a15637d364cf664a`

See more details on using hashes here.

mcp-crawl 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Web Crawler MCP

📋 Features

🚀 Installation

Prerequisites

Installation Steps

Using UV (Modern Python Package Manager)

🔧 Configuration

MCP Configuration for AI Assistants

Concrete Example (Windows)

🖥️ Usage

Usage with an AI Assistant (via MCP)

Usage Examples with Claude

📁 Result Structure

🛠️ Available Parameters

📊 Result Format

🤝 Contribution

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes