Skip to main content

A modern wrapper for CloudScraper with caching and rate limiting

Project description

CloudScraper Wrapper

A powerful Python wrapper for CloudScraper that adds proxy management, rate limiting, caching, and extensive error handling capabilities.

Features

  • Proxy support with automatic rotation and health monitoring
  • Intelligent rate limiting and request throttling
  • Response caching with customizable expiration
  • Async support for parallel requests
  • Comprehensive error handling and retry logic
  • Detailed request statistics and logging
  • File download support with progress tracking
  • Cookie management and persistence
  • Custom header support
  • SSL verification options

Installation

pip install cloudscraper-wrapper

Basic Usage

from cloudscraper_wrapper import CloudScraperWrapper

# Basic initialization
scraper = CloudScraperWrapper(
    delay_range=(2, 5),
    max_retries=3,
    timeout=30
)

# Simple GET request
response = scraper.get('https://example.com')
if response:
    print("Successfully retrieved page")

Advanced Usage

Proxy Configuration

# Single proxy
scraper = CloudScraperWrapper(
    proxy='http://user:pass@host:port'
)

# Proxy rotation
proxies = [
    'http://user1:pass1@host1:port1',
    'http://user2:pass2@host2:port2',
    'http://user3:pass3@host3:port3'
]

scraper = CloudScraperWrapper(
    proxy_rotation=True,
    proxy_list=proxies,
    max_retries=3
)

# Manage proxies
scraper.add_proxy_to_rotation('http://user4:pass4@host4:port4')
scraper.remove_proxy_from_rotation('http://user1:pass1@host1:port1')
scraper.set_proxy('http://newuser:newpass@newhost:port')

# Get proxy statistics
stats = scraper.get_proxy_stats()
print(f"Proxy performance: {stats}")

Async Batch Requests

# Multiple URLs in parallel
urls = [
    'https://example1.com',
    'https://example2.com',
    'https://example3.com'
]

async def main():
    responses = await scraper.batch_get(
        urls,
        max_concurrent=5
    )
    for url, response in zip(urls, responses):
        if response:
            print(f"Successfully retrieved {url}")

# Run with asyncio
import asyncio
asyncio.run(main())

File Downloads

# Download with progress bar
success = scraper.download_file(
    url='https://example.com/file.pdf',
    output_path='downloads/file.pdf',
    show_progress=True
)

Cache Management

# Custom cache settings
response = scraper.get(
    'https://example.com',
    use_cache=True,
    max_cache_age=7200  # Cache for 2 hours
)

# Clear cache
scraper.clear_cache()  # Clear all cache
scraper.clear_cache(max_age=86400)  # Clear cache older than 24 hours

Context Manager

with CloudScraperWrapper() as scraper:
    response = scraper.get('https://example.com')
    # Resources automatically cleaned up after use

Configuration Options

Parameter Type Default Description
delay_range tuple (2, 5) Range for random delay between requests (seconds)
proxy str/dict None Single proxy configuration
proxy_rotation bool False Enable proxy rotation
proxy_list list None List of proxies for rotation
max_retries int 3 Maximum number of retry attempts
timeout int 30 Request timeout in seconds
custom_headers dict None Custom headers for requests
cookie_file str None Path to cookie file for persistence
max_concurrent_requests int 10 Maximum concurrent requests for batch operations
log_level str 'INFO' Logging level
log_file str 'scraper.log' Path to log file
cache_enabled bool True Enable response caching
verify_ssl bool True Verify SSL certificates

Statistics and Monitoring

# Get request statistics
stats = scraper.get_stats()
print(f"""
Success Rate: {stats['successful_requests']/stats['requests_made']*100}%
Cache Hit Rate: {stats['cache_hits']/(stats['cache_hits'] + stats['cache_misses'])*100}%
Average Request Time: {stats['total_request_time']/stats['successful_requests']}s
""")

Error Handling

from cloudscraper_wrapper import RequestError, ProxyError, CacheError

try:
    response = scraper.get('https://example.com')
except RequestError as e:
    print(f"Request failed: {e}")
except ProxyError as e:
    print(f"Proxy error: {e}")
except CacheError as e:
    print(f"Cache error: {e}")

License

MIT License

Support

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Disclaimer

This tool is for educational purposes only. Follow websites' terms of service and robots.txt rules when scraping.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloudscraper_wrapper-0.1.3.tar.gz (3.3 kB view details)

Uploaded Source

Built Distribution

cloudscraper_wrapper-0.1.3-py3-none-any.whl (3.4 kB view details)

Uploaded Python 3

File details

Details for the file cloudscraper_wrapper-0.1.3.tar.gz.

File metadata

  • Download URL: cloudscraper_wrapper-0.1.3.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.6

File hashes

Hashes for cloudscraper_wrapper-0.1.3.tar.gz
Algorithm Hash digest
SHA256 707dd6865125d12539befb1aaf89e7972dba144fb78727ab5546098a8bb340fc
MD5 3001fb335042a767e53459c2d4df1962
BLAKE2b-256 5d3b6f3399987d922f32cc7dab9bdeff121ff0fc1b5bd6e5b4d65869d4538ee1

See more details on using hashes here.

File details

Details for the file cloudscraper_wrapper-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for cloudscraper_wrapper-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 34ffab57997354a19d1669df5d19cab40bc0b6ccbe5b2f270ccc969c3c0f74b8
MD5 212442eeadcba833042d123e6ad8d09b
BLAKE2b-256 acda70414ca3aba9e1b7d07e5b7130ba21c6e971761cedd3e09e8736bf47e261

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page