A web crawler using Model Context Protocol (MCP)
Project description
Web Crawler MCP
A powerful web crawling tool that integrates with AI assistants via the MCP (Machine Conversation Protocol). This project allows you to crawl websites and save their content [...]
📋 Features
- Website crawling with configurable depth
- Support for internal and external links
- Generation of structured Markdown files
- Native integration with AI assistants via MCP
- Detailed crawl result statistics
- Error and not found page handling
🚀 Installation
Prerequisites
- Python 3.9 or higher
Installation Steps
- Clone this repository:
git clone laurentvv/crawl4ai-mcp
cd crawl4ai-mcp
- Create and activate a virtual environment:
# Windows
python -m venv .venv
.venv\Scripts\activate
# Linux/MacOS
python -m venv .venv
source .venv/bin/activate
- Install the required dependencies:
# Using pip
pip install -r requirements.txt
# Using UV (recommended)
pip install uv # If UV is not yet installed
uv venv
uv pip install -e .
Using UV (Modern Python Package Manager)
This project now supports UV, a fast Python package installer and resolver.
# Install UV if you don't have it yet
pip install uv
# Create a virtual environment
uv venv
# Install the project and its dependencies
uv pip install -e .
# Install development dependencies
uv pip install -e ".[dev]"
🔧 Configuration
MCP Configuration for AI Assistants
To use this crawler with AI assistants like VScode Cline, configure your cline_mcp_settings.json file:
{
"mcpServers": {
"crawl": {
"command": "PATH\\TO\\YOUR\\ENVIRONMENT\\.venv\\Scripts\\python.exe",
"args": [
"PATH\\TO\\YOUR\\PROJECT\\crawl_mcp.py"
],
"disabled": false,
"autoApprove": [],
"timeout": 600
}
}
}
Replace PATH\\TO\\YOUR\\ENVIRONMENT and PATH\\TO\\YOUR\\PROJECT with the appropriate paths on your system.
Concrete Example (Windows)
{
"mcpServers": {
"crawl": {
"command": "C:\\Python\\crawl4ai-mcp\\.venv\\Scripts\\python.exe",
"args": [
"D:\\Python\\crawl4ai-mcp\\crawl_mcp.py"
],
"disabled": false,
"autoApprove": [],
"timeout": 600
}
}
}
🖥️ Usage
Usage with an AI Assistant (via MCP)
Once configured in your AI assistant, you can use the crawler by asking the assistant to perform a crawl using the following syntax:
Can you crawl the website https://example.com with a depth of 2?
The assistant will use the MCP protocol to run the crawling tool with the specified parameters.
Usage Examples with Claude
Here are examples of requests you can make to Claude after configuring the MCP tool:
- Simple Crawl: "Can you crawl the site example.com and give me a summary?"
- Crawl with Options: "Can you crawl https://example.com with a depth of 3 and include external links?"
- Crawl with Custom Output: "Can you crawl the blog example.com and save the results in a file named 'blog_analysis.md'?"
📁 Result Structure
Crawl results are saved in the crawl_results folder at the root of the project. Each result file is in Markdown format with the following structure:
# https://example.com/page
## Metadata
- Depth: 1
- Timestamp: 2023-07-01T12:34:56
## Content
Extracted content from the page...
---
🛠️ Available Parameters
The crawl tool accepts the following parameters:
| Parameter | Type | Description | Default Value |
|---|---|---|---|
| url | string | URL to crawl (required) | - |
| max_depth | integer | Maximum crawling depth | 2 |
| include_external | boolean | Include external links | false |
| verbose | boolean | Enable detailed output | true |
| output_file | string | Output file path | automatically generated |
📊 Result Format
The tool returns a summary with:
- URL crawled
- Path to the generated file
- Duration of the crawl
- Statistics about processed pages (successful, failed, not found, access forbidden)
Results are saved in the crawl_results directory of your project.
🤝 Contribution
Contributions are welcome! Feel free to open an issue or submit a pull request.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crawl4ai_mcp-0.1.0.tar.gz.
File metadata
- Download URL: crawl4ai_mcp-0.1.0.tar.gz
- Upload date:
- Size: 145.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9505608d747c72696f9b8855bcf22313b7e4346321a87cc1fb0e573077f469d0
|
|
| MD5 |
c7b3544c1302b4ee49eaa566cc9ed6b7
|
|
| BLAKE2b-256 |
ef4c569894dc8cae8bbb5f378faa065cf62bb723baa0f03deb09690dc5d583f2
|
File details
Details for the file crawl4ai_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: crawl4ai_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 3.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0fc705b1cd6200f4b3163fc38b589a7143930b85ddfd0eeced5b08c2a7fb43a6
|
|
| MD5 |
6562c600625f33ef3d8a1540e0d84310
|
|
| BLAKE2b-256 |
b1ce24af2ae9d1db898ee6340c4192ac8a3badcbeb0fe2a2aa5dc8ff9a7ed137
|