A modern wrapper for CloudScraper with caching and rate limiting
Project description
CloudScraper Wrapper
A powerful Python wrapper for CloudScraper that adds proxy management, rate limiting, caching, and extensive error handling capabilities.
Features
- Proxy support with automatic rotation and health monitoring
- Intelligent rate limiting and request throttling
- Response caching with customizable expiration
- Async support for parallel requests
- Comprehensive error handling and retry logic
- Detailed request statistics and logging
- File download support with progress tracking
- Cookie management and persistence
- Custom header support
- SSL verification options
Installation
pip install cloudscraper-wrapper
Basic Usage
from cloudscraper_wrapper import CloudScraperWrapper
# Basic initialization
scraper = CloudScraperWrapper(
delay_range=(2, 5),
max_retries=3,
timeout=30
)
# Simple GET request
response = scraper.get('https://example.com')
if response:
print("Successfully retrieved page")
Advanced Usage
Proxy Configuration
# Single proxy
scraper = CloudScraperWrapper(
proxy='http://user:pass@host:port'
)
# Proxy rotation
proxies = [
'http://user1:pass1@host1:port1',
'http://user2:pass2@host2:port2',
'http://user3:pass3@host3:port3'
]
scraper = CloudScraperWrapper(
proxy_rotation=True,
proxy_list=proxies,
max_retries=3
)
# Manage proxies
scraper.add_proxy_to_rotation('http://user4:pass4@host4:port4')
scraper.remove_proxy_from_rotation('http://user1:pass1@host1:port1')
scraper.set_proxy('http://newuser:newpass@newhost:port')
# Get proxy statistics
stats = scraper.get_proxy_stats()
print(f"Proxy performance: {stats}")
Async Batch Requests
# Multiple URLs in parallel
urls = [
'https://example1.com',
'https://example2.com',
'https://example3.com'
]
async def main():
responses = await scraper.batch_get(
urls,
max_concurrent=5
)
for url, response in zip(urls, responses):
if response:
print(f"Successfully retrieved {url}")
# Run with asyncio
import asyncio
asyncio.run(main())
File Downloads
# Download with progress bar
success = scraper.download_file(
url='https://example.com/file.pdf',
output_path='downloads/file.pdf',
show_progress=True
)
Cache Management
# Custom cache settings
response = scraper.get(
'https://example.com',
use_cache=True,
max_cache_age=7200 # Cache for 2 hours
)
# Clear cache
scraper.clear_cache() # Clear all cache
scraper.clear_cache(max_age=86400) # Clear cache older than 24 hours
Context Manager
with CloudScraperWrapper() as scraper:
response = scraper.get('https://example.com')
# Resources automatically cleaned up after use
Configuration Options
Parameter | Type | Default | Description |
---|---|---|---|
delay_range | tuple | (2, 5) | Range for random delay between requests (seconds) |
proxy | str/dict | None | Single proxy configuration |
proxy_rotation | bool | False | Enable proxy rotation |
proxy_list | list | None | List of proxies for rotation |
max_retries | int | 3 | Maximum number of retry attempts |
timeout | int | 30 | Request timeout in seconds |
custom_headers | dict | None | Custom headers for requests |
cookie_file | str | None | Path to cookie file for persistence |
max_concurrent_requests | int | 10 | Maximum concurrent requests for batch operations |
log_level | str | 'INFO' | Logging level |
log_file | str | 'scraper.log' | Path to log file |
cache_enabled | bool | True | Enable response caching |
verify_ssl | bool | True | Verify SSL certificates |
Statistics and Monitoring
# Get request statistics
stats = scraper.get_stats()
print(f"""
Success Rate: {stats['successful_requests']/stats['requests_made']*100}%
Cache Hit Rate: {stats['cache_hits']/(stats['cache_hits'] + stats['cache_misses'])*100}%
Average Request Time: {stats['total_request_time']/stats['successful_requests']}s
""")
Error Handling
from cloudscraper_wrapper import RequestError, ProxyError, CacheError
try:
response = scraper.get('https://example.com')
except RequestError as e:
print(f"Request failed: {e}")
except ProxyError as e:
print(f"Proxy error: {e}")
except CacheError as e:
print(f"Cache error: {e}")
License
MIT License
Support
- GitHub Issues: Report a bug
- Documentation: Full documentation
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Disclaimer
This tool is for educational purposes only. Follow websites' terms of service and robots.txt rules when scraping.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cloudscraper_wrapper-0.1.3.tar.gz
.
File metadata
- Download URL: cloudscraper_wrapper-0.1.3.tar.gz
- Upload date:
- Size: 3.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 707dd6865125d12539befb1aaf89e7972dba144fb78727ab5546098a8bb340fc |
|
MD5 | 3001fb335042a767e53459c2d4df1962 |
|
BLAKE2b-256 | 5d3b6f3399987d922f32cc7dab9bdeff121ff0fc1b5bd6e5b4d65869d4538ee1 |
File details
Details for the file cloudscraper_wrapper-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: cloudscraper_wrapper-0.1.3-py3-none-any.whl
- Upload date:
- Size: 3.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34ffab57997354a19d1669df5d19cab40bc0b6ccbe5b2f270ccc969c3c0f74b8 |
|
MD5 | 212442eeadcba833042d123e6ad8d09b |
|
BLAKE2b-256 | acda70414ca3aba9e1b7d07e5b7130ba21c6e971761cedd3e09e8736bf47e261 |