Skip to main content

Official Python client for Spider MCP web scraping API

Project description

Spider MCP Client

PyPI version Python Support License: MIT

Official Python client for Spider MCP - a professional web scraping API with advanced anti-detection capabilities.

🚀 Quick Start

Installation

pip install spider-mcp-client

Basic Usage

from spider_mcp_client import SpiderMCPClient

# Initialize client
client = SpiderMCPClient(
    api_key="your-api-key-here",
    base_url="http://localhost:8003"  # Your Spider MCP server
)

# Parse a URL
result = client.parse_url("https://example.com/article")

print(f"Status: {result['status']}")
print(f"Title: {result['html_data'].get('title', 'N/A')}")
print(f"Parser: {result['status_detail']['parser_used']}")
print(f"API Calls: {len(result['api_calls'])}")
print(f"Images: {len(result['downloaded_images'])}")

📋 Features

  • Simple API - One method to parse any supported URL
  • Built-in retry logic - Automatic retries with exponential backoff
  • Rate limiting - Respectful delays between requests
  • Error handling - Clear exceptions for different error types
  • Image support - Optional image download and localization
  • Session isolation - Multiple isolated browser sessions
  • Type hints - Full typing support for better IDE experience

🔧 API Reference

SpiderMCPClient

client = SpiderMCPClient(
    api_key="your-api-key",           # Required: Your API key
    base_url="http://localhost:8003", # Spider MCP server URL
    timeout=30,                       # Request timeout (seconds)
    max_retries=3,                    # Max retry attempts
    rate_limit_delay=1.0             # Delay between requests (seconds)
)

parse_url()

result = client.parse_url(
    url="https://example.com/article",  # Required: URL to parse
    download_images=False,              # Optional: Download images
    session_name="my-session",          # Optional: Session name
    retry=1                             # Optional: Retry attempts (default: 1)
)

Returns:

{
    "status": "success",
    "url": "https://example.com/article",
    "html_data": {
        "type": "article",
        "title": "Article Title",
        "content": "Full article content...",
        "author": "Author Name",
        "publish_date": "2025-01-17"
    },
    "api_calls": [...],  # Captured API calls
    "downloaded_images": [...],  # Downloaded images
    "status_detail": {
        "parser_used": "example.com - article_parser",
        "parser_id": 123,
        "success": true
    }
}

📖 Examples

Basic Article Parsing

from spider_mcp_client import SpiderMCPClient

client = SpiderMCPClient(api_key="sk-1234567890abcdef")

# Parse a news article
result = client.parse_url("https://techcrunch.com/2025/01/17/ai-news")

if result['status'] == 'success':
    html_data = result['html_data']
    print(f"📰 {html_data.get('title', 'N/A')}")
    print(f"✍️  {html_data.get('author', 'Unknown')}")
    print(f"📅 {html_data.get('publish_date', 'Unknown')}")
    print(f"🔧 Parser: {result['status_detail']['parser_used']}")

With Image Download

# Parse with image download
result = client.parse_url(
    url="https://news-site.com/photo-story",
    download_images=True
)

if result['status'] == 'success':
    images = result['downloaded_images']
    print(f"Downloaded {len(images)} images:")
    for img_url in images:
        print(f"  🖼️  {img_url}")

Error Handling

from spider_mcp_client import (
    SpiderMCPClient,
    ParserNotFoundError,
    AuthenticationError
)

client = SpiderMCPClient(api_key="your-api-key")

try:
    result = client.parse_url("https://unsupported-site.com/article")
    if result['status'] == 'success':
        print(f"Success: {result['html_data'].get('title', 'N/A')}")
    else:
        print(f"Parse failed: {result['status_detail'].get('error', 'Unknown error')}")

except ParserNotFoundError:
    print("❌ No parser available for this website")

except AuthenticationError:
    print("❌ Invalid API key")

except Exception as e:
    print(f"❌ Error: {e}")

With Retry Logic

# Parse with automatic retries
result = client.parse_url(
    url="https://sometimes-slow-site.com/article",
    retry=3  # Will attempt up to 4 times (initial + 3 retries)
)

if result['status'] == 'success':
    print(f"✅ Success: {result['html_data'].get('title')}")
    print(f"🔧 Parser: {result['status_detail']['parser_used']}")
else:
    print(f"❌ Failed: {result['status_detail'].get('error')}")

API Calls and Images

# Parse a page that makes API calls and has images
result = client.parse_url(
    url="https://dynamic-site.com/article",
    download_images=True
)

if result['status'] == 'success':
    print(f"📰 Title: {result['html_data'].get('title')}")
    print(f"🌐 API calls captured: {len(result['api_calls'])}")
    print(f"🖼️  Images downloaded: {len(result['downloaded_images'])}")

    # Show captured API calls
    for api_call in result['api_calls']:
        print(f"  📡 {api_call['method']} {api_call['url']}")

Check Parser Availability

# Check if parser exists before parsing
parser_info = client.check_parser("https://target-site.com/article")

if parser_info.get('found'):
    print(f"✅ Parser available: {parser_info['parser']['site_name']}")
    result = client.parse_url("https://target-site.com/article")
    if result['status'] == 'success':
        print(f"📰 {result['html_data'].get('title')}")
else:
    print("❌ No parser found for this URL")

🚨 Exception Types

from spider_mcp_client import (
    SpiderMCPError,        # Base exception
    AuthenticationError,   # Invalid API key
    ParserNotFoundError,   # No parser for URL
    RateLimitError,        # Rate limit exceeded
    ServerError,           # Server error (5xx)
    TimeoutError,          # Request timeout
    ConnectionError        # Connection failed
)

🔑 Getting Your API Key

  1. Start Spider MCP server:

    # On your Spider MCP server
    ./restart.sh
    
  2. Visit admin interface:

    http://localhost:8003/admin/users
    
  3. Create/view user and copy API key

🌐 Server Requirements

This client requires a running Spider MCP server. The server provides:

  • Custom parsers for each website
  • Undetected ChromeDriver for Cloudflare bypass
  • Professional anti-detection capabilities
  • Image processing and localization
  • Session management and isolation

📚 Advanced Usage

Session Isolation

# Use session names for browser isolation
client = SpiderMCPClient(api_key="your-api-key")

# Each session gets its own browser context
result1 = client.parse_url(
    "https://site.com/page1",
    session_name="session-1"
)

result2 = client.parse_url(
    "https://site.com/page2",
    session_name="session-2"
)

Configuration

# Production configuration
client = SpiderMCPClient(
    api_key="your-api-key",
    base_url="https://your-spider-mcp-server.com",
    timeout=60,           # Longer timeout for complex pages
    max_retries=5,        # More retries for reliability
    rate_limit_delay=2.0  # Slower rate for respectful scraping
)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Links


Made with ❤️ by the Spider MCP Team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spider_mcp_client-0.1.7.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spider_mcp_client-0.1.7-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file spider_mcp_client-0.1.7.tar.gz.

File metadata

  • Download URL: spider_mcp_client-0.1.7.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for spider_mcp_client-0.1.7.tar.gz
Algorithm Hash digest
SHA256 d642445a0555beb5bf62ab4ecc61f5fe2d2343667fa03310e5903d774751664d
MD5 9f16c0bea97523ebe6e5fc666ace7d1e
BLAKE2b-256 785f67d384bce66ee02604c0d5bdd638b8bc7639bdf0437465bde599bc6e3df3

See more details on using hashes here.

File details

Details for the file spider_mcp_client-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for spider_mcp_client-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 8373b07cc2280e9372d7e0a3231f36023f5f9e7c75b24f3c00e317fc8a2e2d94
MD5 cce0119ac22adb6609aa3af58d057edb
BLAKE2b-256 be77bef5e943bd318953e1beffdb526adf62bb3889672c599cae6295409b1614

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page