Skip to main content

A Python MCP (Model Context Protocol) client for connecting to remote web crawler servers.

Project description

JQuad MCP Web Crawler Client

A JQuad Python client library for connecting to MCP (Model Context Protocol) web crawler servers. This library provides an easy-to-use interface for communicating with remote MCP servers that implement web crawling capabilities.

Features

  • Full MCP Protocol Support: Implements MCP 2025-03-26 specification
  • Async HTTP Client: Built on aiohttp for high performance
  • Multiple Usage Patterns: Simple functions, full client class, and CLI interface
  • Comprehensive Error Handling: Detailed error types and messages
  • Context Manager Support: Easy resource management with async with
  • Tool Validation: Validates available tools before execution
  • Standalone Client: Self-contained client that can run without project dependencies

Installation

From PyPI (End Users)

pip install mcp-web-crawler-client

For Development

Prerequisites: Install uv

git clone https://github.com/jquad-group/mcp-web-crawler-client.git
cd mcp-web-crawler-client
uv sync

Quick Start

Simple URL Crawling

import asyncio
# When installed via pip:
from mcp_web_crawler_client import crawl_url_remote
# When using in development:
# from src.tools.web_crawler import crawl_url_remote

async def main():
    # One-liner to crawl any URL
    content = await crawl_url_remote("https://example.com")
    print(f"Crawled {len(content)} characters")

asyncio.run(main())

Full Client Usage

import asyncio
# When installed via pip:
from mcp_web_crawler_client import MCPClient
# When using in development:
# from src.tools.web_crawler import MCPClient

async def main():
    async with MCPClient("https://mcp-api.jquad.rocks/web-crawler") as client:
        # Test connectivity
        if await client.ping():
            print("Connected!")
        
        # List available tools
        tools = await client.list_tools()
        print(f"Available tools: {[tool.name for tool in tools]}")
        
        # Crawl URLs
        content = await client.crawl_url("https://example.com")
        print(f"Content: {content[:200]}...")

asyncio.run(main())

Command Line Usage

# Install with CLI support
pip install mcp-web-crawler-client

# Test connection
mcp-client test

# Or using uv in development:
uv run python -m src.tools.web_crawler.mcp_client_cli test

# List available tools
mcp-client list-tools

# Crawl a URL
mcp-client crawl https://example.com

# Save to file
mcp-client crawl https://example.com -o content.txt

API Reference

MCPClient Class

The main client class for connecting to MCP servers.

class MCPClient:
    def __init__(self, server_url: str, timeout: int = 30)
    
    async def connect(self) -> None
    async def disconnect(self) -> None
    async def ping(self) -> bool
    async def list_tools(self) -> List[MCPTool]
    async def call_tool(self, tool_name: str, arguments: Dict[str, Any]) -> Dict[str, Any]
    async def crawl_url(self, url: str) -> str
    
    def get_server_info(self) -> Optional[Dict[str, Any]]
    def get_capabilities(self) -> Optional[Dict[str, Any]]

Convenience Functions

# Create a client instance
async def create_mcp_client(server_url: str = "https://mcp-api.jquad.rocks/web-crawler") -> MCPClient

# Quick URL crawling
async def crawl_url_remote(url: str, server_url: str = "https://mcp-api.jquad.rocks/web-crawler") -> str

Error Handling

The library provides specific exception types:

  • MCPClientError: Base exception for all MCP client errors
  • MCPConnectionError: Connection-related errors
  • MCPProtocolError: MCP protocol errors
  • MCPToolError: Tool execution errors
# When installed via pip:
from mcp_web_crawler_client import MCPClient, MCPClientError
# When using in development:
# from src.tools.web_crawler import MCPClient, MCPClientError

try:
    async with MCPClient("https://invalid-server.com") as client:
        content = await client.crawl_url("https://example.com")
except MCPClientError as e:
    print(f"MCP Error: {e}")

Configuration

Default Server

The client defaults to using https://mcp-api.jquad.rocks/web-crawler as the MCP server. You can specify a different server:

client = MCPClient("https://your-mcp-server.com/web-crawler")

Timeout Configuration

# Set custom timeout (default: 30 seconds)
client = MCPClient("https://mcp-server.com", timeout=60)

Examples

See the examples/ directory for comprehensive usage examples:

  • Simple crawling: Basic URL crawling with error handling
  • Client management: Manual connection management
  • Tool introspection: Discovering server capabilities
  • Concurrent requests: Performance testing with multiple URLs

Running Examples

# Run the comprehensive examples
uv run python examples/mcp_client_examples.py

# Test the CLI directly
uv run python -m src.tools.web_crawler.mcp_client_cli test

Development

This project uses uv for dependency management and virtual environment handling.

Prerequisites

  • Python 3.11+
  • uv for dependency management

Development Setup

# Clone the repository
git clone https://github.com/jquad-group/mcp-web-crawler-client.git
cd mcp-web-crawler-client

# Install dependencies and create virtual environment
uv sync

# Activate the virtual environment (optional, uv run handles this automatically)
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Running Tests

# Install development dependencies
uv sync

# Run tests
uv run pytest

# Run tests with coverage
uv run pytest --cov=src --cov-report=html

# Run specific test
uv run pytest tests/tools/web_crawler/test_mcp_client.py -v

Project Structure

mcp-web-crawler-client/
├── src/
│   └── tools/
│       └── web_crawler/
│           ├── __init__.py
│           ├── mcp_client.py           # Main client implementation
│           ├── mcp_client_cli.py       # CLI interface
│           └── standalone_mcp_client.py # Standalone client
├── tests/
│   └── tools/
│       └── web_crawler/
│           └── test_mcp_client.py      # Client tests
├── examples/
│   └── mcp_client_examples.py         # Usage examples
├── pyproject.toml                      # Project configuration
└── README.md                          # This file

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues and questions:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_web_crawler_client-1.0.0.tar.gz (57.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_web_crawler_client-1.0.0-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file mcp_web_crawler_client-1.0.0.tar.gz.

File metadata

File hashes

Hashes for mcp_web_crawler_client-1.0.0.tar.gz
Algorithm Hash digest
SHA256 5edaa8e9584a4ac86ef1a5fb6ed9ca81c14df0b0b151b9fef440a767a78b05fc
MD5 ec2a4b5db0dca25cb56b4e1228b55948
BLAKE2b-256 b5354ef4e435214925e2668b8b62c4afee05e7550bc61b64bea7fa9d6f53770a

See more details on using hashes here.

File details

Details for the file mcp_web_crawler_client-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_web_crawler_client-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1920b4135e70bff8d0cfae391bedeff4cb2a2506a1840f014d46984f7af8c98f
MD5 573f2d480deabc36ce86361b438d46aa
BLAKE2b-256 f6fa3e995e9720465ae95d43ed496c61fa7a206d6662aa5d4d0f3dd78b66d360

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page