Official Python client for Spider MCP web scraping API
Project description
Spider MCP Client
Official Python client for Spider MCP - a professional web scraping API with advanced anti-detection capabilities.
🚀 Quick Start
Installation
pip install spider-mcp-client
Basic Usage
from spider_mcp_client import SpiderMCPClient
# Initialize client
client = SpiderMCPClient(
api_key="your-api-key-here",
base_url="http://localhost:8003" # Your Spider MCP server
)
# Parse a URL
result = client.parse_url("https://example.com/article")
print(f"Status: {result['status']}")
print(f"Title: {result['html_data'].get('title', 'N/A')}")
print(f"Parser: {result['status_detail']['parser_used']}")
print(f"API Calls: {len(result['api_calls'])}")
print(f"Images: {len(result['downloaded_images'])}")
📋 Features
- ✅ Simple API - One method to parse any supported URL
- ✅ Built-in retry logic - Automatic retries with exponential backoff
- ✅ Rate limiting - Respectful delays between requests
- ✅ Error handling - Clear exceptions for different error types
- ✅ Image support - Optional image download and localization
- ✅ Session isolation - Multiple isolated browser sessions
- ✅ Type hints - Full typing support for better IDE experience
🔧 API Reference
SpiderMCPClient
client = SpiderMCPClient(
api_key="your-api-key", # Required: Your API key
base_url="http://localhost:8003", # Spider MCP server URL
timeout=30, # Request timeout (seconds)
max_retries=3, # Max retry attempts
rate_limit_delay=1.0 # Delay between requests (seconds)
)
parse_url()
result = client.parse_url(
url="https://example.com/article", # Required: URL to parse
download_images=False, # Optional: Download images
session_name="my-session", # Optional: Session name
retry=1 # Optional: Retry attempts (default: 1)
)
Returns:
{
"status": "success",
"url": "https://example.com/article",
"html_data": {
"type": "article",
"title": "Article Title",
"content": "Full article content...",
"author": "Author Name",
"publish_date": "2025-01-17"
},
"api_calls": [...], # Captured API calls
"downloaded_images": [...], # Downloaded images
"status_detail": {
"parser_used": "example.com - article_parser",
"parser_id": 123,
"success": true
}
}
📖 Examples
Basic Article Parsing
from spider_mcp_client import SpiderMCPClient
client = SpiderMCPClient(api_key="sk-1234567890abcdef")
# Parse a news article
result = client.parse_url("https://techcrunch.com/2025/01/17/ai-news")
if result['status'] == 'success':
html_data = result['html_data']
print(f"📰 {html_data.get('title', 'N/A')}")
print(f"✍️ {html_data.get('author', 'Unknown')}")
print(f"📅 {html_data.get('publish_date', 'Unknown')}")
print(f"🔧 Parser: {result['status_detail']['parser_used']}")
With Image Download
# Parse with image download
result = client.parse_url(
url="https://news-site.com/photo-story",
download_images=True
)
if result['status'] == 'success':
images = result['downloaded_images']
print(f"Downloaded {len(images)} images:")
for img_url in images:
print(f" 🖼️ {img_url}")
Error Handling
from spider_mcp_client import (
SpiderMCPClient,
ParserNotFoundError,
AuthenticationError
)
client = SpiderMCPClient(api_key="your-api-key")
try:
result = client.parse_url("https://unsupported-site.com/article")
if result['status'] == 'success':
print(f"Success: {result['html_data'].get('title', 'N/A')}")
else:
print(f"Parse failed: {result['status_detail'].get('error', 'Unknown error')}")
except ParserNotFoundError:
print("❌ No parser available for this website")
except AuthenticationError:
print("❌ Invalid API key")
except Exception as e:
print(f"❌ Error: {e}")
With Retry Logic
# Parse with automatic retries
result = client.parse_url(
url="https://sometimes-slow-site.com/article",
retry=3 # Will attempt up to 4 times (initial + 3 retries)
)
if result['status'] == 'success':
print(f"✅ Success: {result['html_data'].get('title')}")
print(f"🔧 Parser: {result['status_detail']['parser_used']}")
else:
print(f"❌ Failed: {result['status_detail'].get('error')}")
API Calls and Images
# Parse a page that makes API calls and has images
result = client.parse_url(
url="https://dynamic-site.com/article",
download_images=True
)
if result['status'] == 'success':
print(f"📰 Title: {result['html_data'].get('title')}")
print(f"🌐 API calls captured: {len(result['api_calls'])}")
print(f"🖼️ Images downloaded: {len(result['downloaded_images'])}")
# Show captured API calls
for api_call in result['api_calls']:
print(f" 📡 {api_call['method']} {api_call['url']}")
Check Parser Availability
# Check if parser exists before parsing
parser_info = client.check_parser("https://target-site.com/article")
if parser_info.get('found'):
print(f"✅ Parser available: {parser_info['parser']['site_name']}")
result = client.parse_url("https://target-site.com/article")
if result['status'] == 'success':
print(f"📰 {result['html_data'].get('title')}")
else:
print("❌ No parser found for this URL")
🚨 Exception Types
from spider_mcp_client import (
SpiderMCPError, # Base exception
AuthenticationError, # Invalid API key
ParserNotFoundError, # No parser for URL
RateLimitError, # Rate limit exceeded
ServerError, # Server error (5xx)
TimeoutError, # Request timeout
ConnectionError # Connection failed
)
🔑 Getting Your API Key
-
Start Spider MCP server:
# On your Spider MCP server ./restart.sh -
Visit admin interface:
http://localhost:8003/admin/users -
Create/view user and copy API key
🌐 Server Requirements
This client requires a running Spider MCP server. The server provides:
- ✅ Custom parsers for each website
- ✅ Undetected ChromeDriver for Cloudflare bypass
- ✅ Professional anti-detection capabilities
- ✅ Image processing and localization
- ✅ Session management and isolation
📚 Advanced Usage
Session Isolation
# Use session names for browser isolation
client = SpiderMCPClient(api_key="your-api-key")
# Each session gets its own browser context
result1 = client.parse_url(
"https://site.com/page1",
session_name="session-1"
)
result2 = client.parse_url(
"https://site.com/page2",
session_name="session-2"
)
Configuration
# Production configuration
client = SpiderMCPClient(
api_key="your-api-key",
base_url="https://your-spider-mcp-server.com",
timeout=60, # Longer timeout for complex pages
max_retries=5, # More retries for reliability
rate_limit_delay=2.0 # Slower rate for respectful scraping
)
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🔗 Links
- PyPI Package: https://pypi.org/project/spider-mcp-client/
- GitHub Repository: https://github.com/spider-mcp/spider-mcp-client
- Documentation: https://spider-mcp.readthedocs.io/
- Spider MCP Server: https://github.com/spider-mcp/spider-mcp
Made with ❤️ by the Spider MCP Team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spider_mcp_client-0.1.7.tar.gz.
File metadata
- Download URL: spider_mcp_client-0.1.7.tar.gz
- Upload date:
- Size: 18.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d642445a0555beb5bf62ab4ecc61f5fe2d2343667fa03310e5903d774751664d
|
|
| MD5 |
9f16c0bea97523ebe6e5fc666ace7d1e
|
|
| BLAKE2b-256 |
785f67d384bce66ee02604c0d5bdd638b8bc7639bdf0437465bde599bc6e3df3
|
File details
Details for the file spider_mcp_client-0.1.7-py3-none-any.whl.
File metadata
- Download URL: spider_mcp_client-0.1.7-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8373b07cc2280e9372d7e0a3231f36023f5f9e7c75b24f3c00e317fc8a2e2d94
|
|
| MD5 |
cce0119ac22adb6609aa3af58d057edb
|
|
| BLAKE2b-256 |
be77bef5e943bd318953e1beffdb526adf62bb3889672c599cae6295409b1614
|