A Model Context Protocol server for web crawling using Crawl4ai
Project description
Crawl4AI MCP Server
A Model Context Protocol server for web crawling using the Crawl4ai library.
📋 Overview
Crawl4AI MCP Server provides a set of tools and prompts for web crawling through the Model Context Protocol (MCP). It allows AI assistants to autonomously crawl websites, extract content, and save information as Markdown files.
✨ Features
- 🕸️ Single Page Crawling: Extract content from a single webpage in Markdown format
- 🌐 Deep Website Crawling: Crawl multiple pages of a website with configurable depth and limits
- 🔍 Structured Data Extraction: Use CSS selectors to extract specific structured data from webpages
- 💾 Markdown Export: Save crawled content directly as Markdown files
🚀 Installation
pip install crawl4ai-mcp-server
🛠️ Usage
Command Line
Run the server directly from the command line:
crawl4ai-mcp
Python API
import asyncio
from crawl4ai_mcp import serve
# Run the server
asyncio.run(serve())
📝 Available Tools
crawl_webpage
Crawls a single webpage and returns its content as markdown.
Parameters:
url(string, required): URL to crawlinclude_images(boolean, optional): Whether to include images in the result (default: true)bypass_cache(boolean, optional): Whether to bypass cache (default: false)
crawl_website
Crawls a website starting from the given URL, with specified depth and page limit.
Parameters:
url(string, required): Starting URLmax_depth(integer, optional): Maximum crawl depth (default: 1)max_pages(integer, optional): Maximum number of pages to crawl (default: 5)include_images(boolean, optional): Whether to include images (default: true)
extract_structured_data
Extracts structured data from a webpage using CSS selectors.
Parameters:
url(string, required): URL to extract data fromschema(object, optional): Schema defining what to extractcss_selector(string, optional): CSS selector to locate specific parts of the page (default: "body")
save_as_markdown
Crawls a webpage and saves the content as a Markdown file.
Parameters:
url(string, required): URL to crawlfilename(string, required): Filename to save the Markdowninclude_images(boolean, optional): Whether to include images (default: true)
🔌 Available Prompts
crawl
Crawls a webpage and retrieves its content.
Arguments:
url(required): URL to crawl
save_page
Crawls a webpage and saves it as a Markdown file.
Arguments:
url(required): URL to crawlfilename(required): Filename to save the Markdown
🧩 Requirements
- Python 3.8+
- mcp>=1.0.0
- crawl4ai
- pydantic
📄 License
MIT License - see the LICENSE file for details.
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crawl4ai_mcp_server-0.1.1.tar.gz.
File metadata
- Download URL: crawl4ai_mcp_server-0.1.1.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b7438913e142d83c9954ed93276b2f51fcbd9ef733168998e06932156838345
|
|
| MD5 |
3c8477c14fcd89bb4ab9059039296fe1
|
|
| BLAKE2b-256 |
60346a230640699e5976d56b3f805bf5440e57606108b1f7de9cc726707c855e
|
File details
Details for the file crawl4ai_mcp_server-0.1.1-py3-none-any.whl.
File metadata
- Download URL: crawl4ai_mcp_server-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0898fc2bf6641b8c3a83c521cd50018e1a5b59dc2b85a75d38addf135c196d80
|
|
| MD5 |
24c5f29406b3ef8041c8f8f5d8dcb4f1
|
|
| BLAKE2b-256 |
1055e76caa7d77285c95b8ba0d894c1471c24a37a56e9bf0c2ddbd8497883943
|