Official Python client for Spider MCP web scraping API

These details have not been verified by PyPI

Project links

Project description

Spider MCP Client

Official Python client for Spider MCP - a professional web scraping API with advanced anti-detection capabilities.

🚀 Quick Start

Installation

pip install spider-mcp-client

Basic Usage

from spider_mcp_client import SpiderMCPClient

# Initialize client
client = SpiderMCPClient(
    api_key="your-api-key-here",
    base_url="http://localhost:8003"  # Your Spider MCP server
)

# Parse a URL
result = client.parse_url("https://example.com/article")

print(f"Status: {result['status']}")
print(f"Title: {result['html_data'].get('title', 'N/A')}")
print(f"Parser: {result['status_detail']['parser_used']}")
print(f"API Calls: {len(result['api_calls'])}")
print(f"Images: {len(result['downloaded_images'])}")

📋 Features

✅ Simple API - One method to parse any supported URL
✅ Built-in retry logic - Automatic retries with exponential backoff
✅ Rate limiting - Respectful delays between requests
✅ Error handling - Clear exceptions for different error types
✅ Image support - Optional image download and localization
✅ Session isolation - Multiple isolated browser sessions
✅ Type hints - Full typing support for better IDE experience

🔧 API Reference

SpiderMCPClient

client = SpiderMCPClient(
    api_key="your-api-key",           # Required: Your API key
    base_url="http://localhost:8003", # Spider MCP server URL
    timeout=30,                       # Request timeout (seconds)
    max_retries=3,                    # Max retry attempts
    rate_limit_delay=1.0             # Delay between requests (seconds)
)

parse_url()

result = client.parse_url(
    url="https://example.com/article",  # Required: URL to parse
    download_images=False,              # Optional: Download images
    session_name="my-session",          # Optional: Session name
    retry=1                             # Optional: Retry attempts (default: 1)
)

Returns:

{
    "status": "success",
    "url": "https://example.com/article",
    "html_data": {
        "type": "article",
        "title": "Article Title",
        "content": "Full article content...",
        "author": "Author Name",
        "publish_date": "2025-01-17"
    },
    "api_calls": [...],  # Captured API calls
    "downloaded_images": [...],  # Downloaded images
    "status_detail": {
        "parser_used": "example.com - article_parser",
        "parser_id": 123,
        "success": true
    }
}

📖 Examples

Basic Article Parsing

from spider_mcp_client import SpiderMCPClient

client = SpiderMCPClient(api_key="sk-1234567890abcdef")

# Parse a news article
result = client.parse_url("https://techcrunch.com/2025/01/17/ai-news")

if result['status'] == 'success':
    html_data = result['html_data']
    print(f"📰 {html_data.get('title', 'N/A')}")
    print(f"✍️  {html_data.get('author', 'Unknown')}")
    print(f"📅 {html_data.get('publish_date', 'Unknown')}")
    print(f"🔧 Parser: {result['status_detail']['parser_used']}")

With Image Download

# Parse with image download
result = client.parse_url(
    url="https://news-site.com/photo-story",
    download_images=True
)

if result['status'] == 'success':
    images = result['downloaded_images']
    print(f"Downloaded {len(images)} images:")
    for img_url in images:
        print(f"  🖼️  {img_url}")

Error Handling

from spider_mcp_client import (
    SpiderMCPClient,
    ParserNotFoundError,
    AuthenticationError
)

client = SpiderMCPClient(api_key="your-api-key")

try:
    result = client.parse_url("https://unsupported-site.com/article")
    if result['status'] == 'success':
        print(f"Success: {result['html_data'].get('title', 'N/A')}")
    else:
        print(f"Parse failed: {result['status_detail'].get('error', 'Unknown error')}")

except ParserNotFoundError:
    print("❌ No parser available for this website")

except AuthenticationError:
    print("❌ Invalid API key")

except Exception as e:
    print(f"❌ Error: {e}")

With Retry Logic

# Parse with automatic retries
result = client.parse_url(
    url="https://sometimes-slow-site.com/article",
    retry=3  # Will attempt up to 4 times (initial + 3 retries)
)

if result['status'] == 'success':
    print(f"✅ Success: {result['html_data'].get('title')}")
    print(f"🔧 Parser: {result['status_detail']['parser_used']}")
else:
    print(f"❌ Failed: {result['status_detail'].get('error')}")

API Calls and Images

# Parse a page that makes API calls and has images
result = client.parse_url(
    url="https://dynamic-site.com/article",
    download_images=True
)

if result['status'] == 'success':
    print(f"📰 Title: {result['html_data'].get('title')}")
    print(f"🌐 API calls captured: {len(result['api_calls'])}")
    print(f"🖼️  Images downloaded: {len(result['downloaded_images'])}")

    # Show captured API calls
    for api_call in result['api_calls']:
        print(f"  📡 {api_call['method']} {api_call['url']}")

Check Parser Availability

# Check if parser exists before parsing
parser_info = client.check_parser("https://target-site.com/article")

if parser_info.get('found'):
    print(f"✅ Parser available: {parser_info['parser']['site_name']}")
    result = client.parse_url("https://target-site.com/article")
    if result['status'] == 'success':
        print(f"📰 {result['html_data'].get('title')}")
else:
    print("❌ No parser found for this URL")

🚨 Exception Types

from spider_mcp_client import (
    SpiderMCPError,        # Base exception
    AuthenticationError,   # Invalid API key
    ParserNotFoundError,   # No parser for URL
    RateLimitError,        # Rate limit exceeded
    ServerError,           # Server error (5xx)
    TimeoutError,          # Request timeout
    ConnectionError        # Connection failed
)

🔑 Getting Your API Key

Start Spider MCP server:

# On your Spider MCP server
./restart.sh

Visit admin interface:
```
http://localhost:8003/admin/users
```
Create/view user and copy API key

🌐 Server Requirements

This client requires a running Spider MCP server. The server provides:

✅ Custom parsers for each website
✅ Undetected ChromeDriver for Cloudflare bypass
✅ Professional anti-detection capabilities
✅ Image processing and localization
✅ Session management and isolation

📚 Advanced Usage

Session Isolation

# Use session names for browser isolation
client = SpiderMCPClient(api_key="your-api-key")

# Each session gets its own browser context
result1 = client.parse_url(
    "https://site.com/page1",
    session_name="session-1"
)

result2 = client.parse_url(
    "https://site.com/page2",
    session_name="session-2"
)

Configuration

# Production configuration
client = SpiderMCPClient(
    api_key="your-api-key",
    base_url="https://your-spider-mcp-server.com",
    timeout=60,           # Longer timeout for complex pages
    max_retries=5,        # More retries for reliability
    rate_limit_delay=2.0  # Slower rate for respectful scraping
)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Links

PyPI Package: https://pypi.org/project/spider-mcp-client/
GitHub Repository: https://github.com/spider-mcp/spider-mcp-client
Documentation: https://spider-mcp.readthedocs.io/
Spider MCP Server: https://github.com/spider-mcp/spider-mcp

Made with ❤️ by the Spider MCP Team

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.8

Sep 25, 2025

0.1.7

Sep 25, 2025

0.1.6

Sep 12, 2025

0.1.5

Sep 3, 2025

0.1.4

Aug 30, 2025

0.1.3

Aug 30, 2025

0.1.2

Aug 24, 2025

0.1.1

Aug 17, 2025

0.1.0

Aug 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spider_mcp_client-0.1.8.tar.gz (19.3 kB view details)

Uploaded Sep 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spider_mcp_client-0.1.8-py3-none-any.whl (10.1 kB view details)

Uploaded Sep 25, 2025 Python 3

File details

Details for the file spider_mcp_client-0.1.8.tar.gz.

File metadata

Download URL: spider_mcp_client-0.1.8.tar.gz
Upload date: Sep 25, 2025
Size: 19.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for spider_mcp_client-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`3aa0662b2c399301bac8eafe25ae9479604f1fac9f8f87119f78fb01ef2cdfda`
MD5	`18ce4e88624833a7efc652897b77c2d3`
BLAKE2b-256	`431670e03d1925959e8a6883b34f1e8a7c6a6dbd07f843eb161c3989c84321b0`

See more details on using hashes here.

File details

Details for the file spider_mcp_client-0.1.8-py3-none-any.whl.

File metadata

Download URL: spider_mcp_client-0.1.8-py3-none-any.whl
Upload date: Sep 25, 2025
Size: 10.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for spider_mcp_client-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2ef9d314dd2cf1c0bbcd1abf94d73fd70733e9fa3f077fc6c76dd54ef3dc54f8`
MD5	`82e656c92b84e7fcd56e81445c5a58b4`
BLAKE2b-256	`ddb59143f69683384b7a662cffbc3f66682acdb372ef22b93b869ec5b6e93c25`

See more details on using hashes here.

spider-mcp-client 0.1.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Spider MCP Client

🚀 Quick Start

Installation

Basic Usage

📋 Features

🔧 API Reference

SpiderMCPClient

parse_url()

📖 Examples

Basic Article Parsing

With Image Download

Error Handling

With Retry Logic

API Calls and Images

Check Parser Availability

🚨 Exception Types

🔑 Getting Your API Key

🌐 Server Requirements

📚 Advanced Usage

Session Isolation

Configuration

🤝 Contributing

📄 License

🔗 Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes