Skip to main content

A professional Scrapy integration for seamlessly routing requests through AskPablos Proxy API with support for headless browser rendering and rotating IP addresses

Project description

AskPablos Scrapy API

A professional Scrapy integration for seamlessly routing requests through AskPablos Proxy API with support for headless browser rendering and rotating IP addresses.

Features

  • 🔄 Selective Proxying: Only routes requests with askpablos_api_map in their meta
  • 🌐 Headless Browser Support: Render JavaScript-heavy pages
  • 🔄 Rotating Proxies: Access to a pool of rotating IP addresses
  • 🔒 Secure Authentication: HMAC-SHA256 request signing
  • 🔁 Automatic Retries: With exponential backoff
  • ⚠️ Comprehensive Error Handling: Detailed logging and error reporting
  • 🛡️ Rate Limiting: Built-in request rate limiting to avoid overloading the API

Installation

pip install askpablos-scrapy-api

Quick Start

  1. Configure your Scrapy project settings:
# In your settings.py
API_KEY = "your_api_key"  # Your AskPablos API key
SECRET_KEY = "your_secret_key"  # Your AskPablos secret key

# Optional settings
ASKPABLOS_TIMEOUT = 30  # Request timeout in seconds
ASKPABLOS_MAX_RETRIES = 2  # Maximum number of retries for failed requests
ASKPABLOS_RETRY_DELAY = 1.0  # Initial delay between retries in seconds

# Add the middleware
DOWNLOADER_MIDDLEWARES = {
    'askpablos_scrapy_api.middleware.AskPablosAPIDownloaderMiddleware': 950,  # Adjust priority as needed
}
  1. Or use custom settings in your spider:
class MySpider(scrapy.Spider):
    name = 'myspider'
    
    custom_settings = {
        "DOWNLOADER_MIDDLEWARES": {
            "askpablos_scrapy_api.middleware.AskPablosAPIDownloaderMiddleware": 543,
        },
        "API_KEY": "your-api-key-here",
        "SECRET_KEY": "your-secret-key-here",
        "ASKPABLOS_TIMEOUT": 30,
        "ASKPABLOS_MAX_RETRIES": 2,
        "ASKPABLOS_RETRY_DELAY": 1.0
    }
    
    # ...spider implementation...
  1. Use the middleware in your spider:
import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'
    
    def start_requests(self):
        urls = [
            'https://example.com',
            'https://api-intensive-site.com'
        ]
        
        # Regular Scrapy request - NOT using the API
        yield scrapy.Request(url=urls[0], callback=self.parse_regular)
        
        # Request using AskPablos API with a headless browser
        yield scrapy.Request(
            url=urls[1],
            callback=self.parse_api,
            meta={
                'askpablos_api_map': {
                    'browser': True,  # Use headless browser
                    'rotate_proxy': True,  # Use rotating proxies
                }
            }
        )
    
    def parse_regular(self, response):
        # Handle response from a direct request
        pass
        
    def parse_api(self, response):
        # Handle response from AskPablos API
        # (Response will be processed exactly like a normal Scrapy response)
        pass

Environment Variables

Instead of putting sensitive API keys in your settings file, you can use environment variables:

# Set these environment variables before running your spider
export ASKPABLOS_API_KEY="your_api_key"
export ASKPABLOS_SECRET_KEY="your_secret_key"

License

MIT License - See LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

askpablos_scrapy_api-0.2.0.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

askpablos_scrapy_api-0.2.0-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file askpablos_scrapy_api-0.2.0.tar.gz.

File metadata

  • Download URL: askpablos_scrapy_api-0.2.0.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for askpablos_scrapy_api-0.2.0.tar.gz
Algorithm Hash digest
SHA256 818a0c4fb815bbeefaa204e8de4ec8d59ceb3470cba9c709056280506b073378
MD5 f1a1989a9cbe533c2e485ce26e157d83
BLAKE2b-256 39e9f148312e86a071374c20603782cc28b0f5acc1bcb0d62f9c4bd91032d5d1

See more details on using hashes here.

File details

Details for the file askpablos_scrapy_api-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for askpablos_scrapy_api-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a3967a86e17e3da059c9eb555dff9a86db263647a1614e1e7d6f616ff3298937
MD5 5f7df45a992004f63a32f0111b54bdcd
BLAKE2b-256 e28fb3a61a609d142142ea4a31f68da3e4dafa949db2c7cff6bdf8a7e6e9b718

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page