Skip to main content

Official Python client for Spider MCP web scraping API

Project description

Spider MCP Client

PyPI version Python Support License: MIT

Official Python client for Spider MCP - a professional web scraping API with advanced anti-detection capabilities.

🚀 Quick Start

Installation

pip install spider-mcp-client

Basic Usage

from spider_mcp_client import SpiderMCPClient

# Initialize client
client = SpiderMCPClient(
    api_key="your-api-key-here",
    base_url="http://localhost:8003"  # Your Spider MCP server
)

# Parse a URL
result = client.parse_url("https://example.com/article")

print(f"Status: {result['status']}")
print(f"Title: {result['html_data'].get('title', 'N/A')}")
print(f"Parser: {result['status_detail']['parser_used']}")
print(f"API Calls: {len(result['api_calls'])}")
print(f"Images: {len(result['downloaded_images'])}")

📋 Features

  • Simple API - One method to parse any supported URL
  • Built-in retry logic - Automatic retries with exponential backoff
  • Rate limiting - Respectful delays between requests
  • Error handling - Clear exceptions for different error types
  • Image support - Optional image download and localization
  • Session isolation - Multiple isolated browser sessions
  • Type hints - Full typing support for better IDE experience

🔧 API Reference

SpiderMCPClient

client = SpiderMCPClient(
    api_key="your-api-key",           # Required: Your API key
    base_url="http://localhost:8003", # Spider MCP server URL
    timeout=30,                       # Request timeout (seconds)
    max_retries=3,                    # Max retry attempts
    rate_limit_delay=1.0             # Delay between requests (seconds)
)

parse_url()

result = client.parse_url(
    url="https://example.com/article",  # Required: URL to parse
    download_images=False,              # Optional: Download images
    session_name="my-session",          # Optional: Session name
    retry=1                             # Optional: Retry attempts (default: 1)
)

Returns:

{
    "status": "success",
    "url": "https://example.com/article",
    "html_data": {
        "type": "article",
        "title": "Article Title",
        "content": "Full article content...",
        "author": "Author Name",
        "publish_date": "2025-01-17"
    },
    "api_calls": [...],  # Captured API calls
    "downloaded_images": [...],  # Downloaded images
    "status_detail": {
        "parser_used": "example.com - article_parser",
        "parser_id": 123,
        "success": true
    }
}

📖 Examples

Basic Article Parsing

from spider_mcp_client import SpiderMCPClient

client = SpiderMCPClient(api_key="sk-1234567890abcdef")

# Parse a news article
result = client.parse_url("https://techcrunch.com/2025/01/17/ai-news")

if result['status'] == 'success':
    html_data = result['html_data']
    print(f"📰 {html_data.get('title', 'N/A')}")
    print(f"✍️  {html_data.get('author', 'Unknown')}")
    print(f"📅 {html_data.get('publish_date', 'Unknown')}")
    print(f"🔧 Parser: {result['status_detail']['parser_used']}")

With Image Download

# Parse with image download
result = client.parse_url(
    url="https://news-site.com/photo-story",
    download_images=True
)

if result['status'] == 'success':
    images = result['downloaded_images']
    print(f"Downloaded {len(images)} images:")
    for img_url in images:
        print(f"  🖼️  {img_url}")

Error Handling

from spider_mcp_client import (
    SpiderMCPClient,
    ParserNotFoundError,
    AuthenticationError
)

client = SpiderMCPClient(api_key="your-api-key")

try:
    result = client.parse_url("https://unsupported-site.com/article")
    if result['status'] == 'success':
        print(f"Success: {result['html_data'].get('title', 'N/A')}")
    else:
        print(f"Parse failed: {result['status_detail'].get('error', 'Unknown error')}")

except ParserNotFoundError:
    print("❌ No parser available for this website")

except AuthenticationError:
    print("❌ Invalid API key")

except Exception as e:
    print(f"❌ Error: {e}")

With Retry Logic

# Parse with automatic retries
result = client.parse_url(
    url="https://sometimes-slow-site.com/article",
    retry=3  # Will attempt up to 4 times (initial + 3 retries)
)

if result['status'] == 'success':
    print(f"✅ Success: {result['html_data'].get('title')}")
    print(f"🔧 Parser: {result['status_detail']['parser_used']}")
else:
    print(f"❌ Failed: {result['status_detail'].get('error')}")

API Calls and Images

# Parse a page that makes API calls and has images
result = client.parse_url(
    url="https://dynamic-site.com/article",
    download_images=True
)

if result['status'] == 'success':
    print(f"📰 Title: {result['html_data'].get('title')}")
    print(f"🌐 API calls captured: {len(result['api_calls'])}")
    print(f"🖼️  Images downloaded: {len(result['downloaded_images'])}")

    # Show captured API calls
    for api_call in result['api_calls']:
        print(f"  📡 {api_call['method']} {api_call['url']}")

Check Parser Availability

# Check if parser exists before parsing
parser_info = client.check_parser("https://target-site.com/article")

if parser_info.get('found'):
    print(f"✅ Parser available: {parser_info['parser']['site_name']}")
    result = client.parse_url("https://target-site.com/article")
    if result['status'] == 'success':
        print(f"📰 {result['html_data'].get('title')}")
else:
    print("❌ No parser found for this URL")

🚨 Exception Types

from spider_mcp_client import (
    SpiderMCPError,        # Base exception
    AuthenticationError,   # Invalid API key
    ParserNotFoundError,   # No parser for URL
    RateLimitError,        # Rate limit exceeded
    ServerError,           # Server error (5xx)
    TimeoutError,          # Request timeout
    ConnectionError        # Connection failed
)

🔑 Getting Your API Key

  1. Start Spider MCP server:

    # On your Spider MCP server
    ./restart.sh
    
  2. Visit admin interface:

    http://localhost:8003/admin/users
    
  3. Create/view user and copy API key

🌐 Server Requirements

This client requires a running Spider MCP server. The server provides:

  • Custom parsers for each website
  • Undetected ChromeDriver for Cloudflare bypass
  • Professional anti-detection capabilities
  • Image processing and localization
  • Session management and isolation

📚 Advanced Usage

Session Isolation

# Use session names for browser isolation
client = SpiderMCPClient(api_key="your-api-key")

# Each session gets its own browser context
result1 = client.parse_url(
    "https://site.com/page1",
    session_name="session-1"
)

result2 = client.parse_url(
    "https://site.com/page2",
    session_name="session-2"
)

Configuration

# Production configuration
client = SpiderMCPClient(
    api_key="your-api-key",
    base_url="https://your-spider-mcp-server.com",
    timeout=60,           # Longer timeout for complex pages
    max_retries=5,        # More retries for reliability
    rate_limit_delay=2.0  # Slower rate for respectful scraping
)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Links


Made with ❤️ by the Spider MCP Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spider_mcp_client-0.1.8.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spider_mcp_client-0.1.8-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file spider_mcp_client-0.1.8.tar.gz.

File metadata

  • Download URL: spider_mcp_client-0.1.8.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for spider_mcp_client-0.1.8.tar.gz
Algorithm Hash digest
SHA256 3aa0662b2c399301bac8eafe25ae9479604f1fac9f8f87119f78fb01ef2cdfda
MD5 18ce4e88624833a7efc652897b77c2d3
BLAKE2b-256 431670e03d1925959e8a6883b34f1e8a7c6a6dbd07f843eb161c3989c84321b0

See more details on using hashes here.

File details

Details for the file spider_mcp_client-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for spider_mcp_client-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 2ef9d314dd2cf1c0bbcd1abf94d73fd70733e9fa3f077fc6c76dd54ef3dc54f8
MD5 82e656c92b84e7fcd56e81445c5a58b4
BLAKE2b-256 ddb59143f69683384b7a662cffbc3f66682acdb372ef22b93b869ec5b6e93c25

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page