Skip to main content

A powerful web crawling tool that integrates with AI assistants via the MCP (Machine Conversation Protocol)

Project description

Web Crawler MCP

English 中文 हिंदी Español Français العربية বাংলা Русский Português Bahasa Indonesia

Python License

A powerful web crawling tool that integrates with AI assistants via the MCP (Machine Conversation Protocol). This project allows you to crawl websites and save their content [...]

📋 Features

  • Website crawling with configurable depth
  • Support for internal and external links
  • Generation of structured Markdown files
  • Native integration with AI assistants via MCP
  • Detailed crawl result statistics
  • Error and not found page handling

🚀 Installation

Prerequisites

  • Python 3.9 or higher

Installation Steps

  1. Clone this repository:
git clone laurentvv/crawl4ai-mcp
cd crawl4ai-mcp
  1. Create and activate a virtual environment:
# Windows
python -m venv .venv
.venv\Scripts\activate

# Linux/MacOS
python -m venv .venv
source .venv/bin/activate
  1. Install the required dependencies:
pip install -r requirements.txt

🔧 Configuration

MCP Configuration for AI Assistants

To use this crawler with AI assistants like VScode Cline, configure your cline_mcp_settings.json file:

{
  "mcpServers": {
    "crawl": {
      "command": "PATH\\TO\\YOUR\\ENVIRONMENT\\.venv\\Scripts\\python.exe",
      "args": [
        "PATH\\TO\\YOUR\\PROJECT\\crawl_mcp.py"
      ],
      "disabled": false,
      "autoApprove": [],
      "timeout": 600
    }
  }
}

Replace PATH\\TO\\YOUR\\ENVIRONMENT and PATH\\TO\\YOUR\\PROJECT with the appropriate paths on your system.

Concrete Example (Windows)

{
  "mcpServers": {
    "crawl": {
      "command": "C:\\Python\\crawl4ai-mcp\\.venv\\Scripts\\python.exe",
      "args": [
        "D:\\Python\\crawl4ai-mcp\\crawl_mcp.py"
      ],
      "disabled": false,
      "autoApprove": [],
      "timeout": 600
    }
  }
}

🖥️ Usage

Usage with an AI Assistant (via MCP)

Once configured in your AI assistant, you can use the crawler by asking the assistant to perform a crawl using the following syntax:

Can you crawl the website https://example.com with a depth of 2?

The assistant will use the MCP protocol to run the crawling tool with the specified parameters.

Usage Examples with Claude

Here are examples of requests you can make to Claude after configuring the MCP tool:

  • Simple Crawl: "Can you crawl the site example.com and give me a summary?"
  • Crawl with Options: "Can you crawl https://example.com with a depth of 3 and include external links?"
  • Crawl with Custom Output: "Can you crawl the blog example.com and save the results in a file named 'blog_analysis.md'?"

📁 Result Structure

Crawl results are saved in the crawl_results folder at the root of the project. Each result file is in Markdown format with the following structure:

# https://example.com/page

## Metadata
- Depth: 1
- Timestamp: 2023-07-01T12:34:56

## Content
Extracted content from the page...

---

🛠️ Available Parameters

The crawl tool accepts the following parameters:

Parameter Type Description Default Value
url string URL to crawl (required) -
max_depth integer Maximum crawling depth 2
include_external boolean Include external links false
verbose boolean Enable detailed output true
output_file string Output file path automatically generated

📊 Result Format

The tool returns a summary with:

  • URL crawled
  • Path to the generated file
  • Duration of the crawl
  • Statistics about processed pages (successful, failed, not found, access forbidden)

Results are saved in the crawl_results directory of your project.

🤝 Contribution

Contributions are welcome! Feel free to open an issue or submit a pull request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iflow_mcp_laurentvv_crawl4ai_mcp-0.1.0.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file iflow_mcp_laurentvv_crawl4ai_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: iflow_mcp_laurentvv_crawl4ai_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_laurentvv_crawl4ai_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 55362aa8901db600e933c8d3cf905d3f234500711aae1169b35f8f95e8e3f278
MD5 d561f27e41ecc85f25cfbf5a1c7ec573
BLAKE2b-256 6e59fd3a4d234077d566af1f5086e2542e0077722f6dfcd66cf7ece22d989a07

See more details on using hashes here.

File details

Details for the file iflow_mcp_laurentvv_crawl4ai_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: iflow_mcp_laurentvv_crawl4ai_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_laurentvv_crawl4ai_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 23a364dd059a2ddac9fd0240e5944dbaf39ac8091b896f50bea4b05987e8a7c4
MD5 9e82b887584278dfe71e101f504c378c
BLAKE2b-256 496b077c0159bcc9c3ac4403c342ef758b719fc04835a557067ea178aafec31a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page