Skip to main content

A professional Scrapy integration for seamlessly routing requests through AskPablos Proxy API with support for headless browser rendering and rotating IP addresses

Project description

AskPablos Scrapy API

A professional Scrapy integration for seamlessly routing requests through AskPablos Proxy API with support for headless browser rendering and rotating IP addresses.

Features

  • 🔄 Selective Proxying: Only routes requests with askpablos_api_map in their meta
  • 🌐 Headless Browser Support: Render JavaScript-heavy pages
  • 🔄 Rotating Proxies: Access to a pool of rotating IP addresses
  • 🔒 Secure Authentication: HMAC-SHA256 request signing
  • 🔁 Automatic Retries: With exponential backoff
  • ⚠️ Comprehensive Error Handling: Detailed logging and error reporting
  • 🛡️ Rate Limiting: Built-in request rate limiting to avoid overloading the API

Installation

pip install askpablos-scrapy-api

Quick Start

  1. Configure your Scrapy project settings:
# In your settings.py
API_KEY = "your_api_key"  # Your AskPablos API key
SECRET_KEY = "your_secret_key"  # Your AskPablos secret key

# Optional settings
ASKPABLOS_TIMEOUT = 30  # Request timeout in seconds
ASKPABLOS_MAX_RETRIES = 2  # Maximum number of retries for failed requests
ASKPABLOS_RETRY_DELAY = 1.0  # Initial delay between retries in seconds

# Add the middleware
DOWNLOADER_MIDDLEWARES = {
    'askpablos_scrapy_api.middleware.AskPablosAPIDownloaderMiddleware': 950,  # Adjust priority as needed
}
  1. Or use custom settings in your spider:
class MySpider(scrapy.Spider):
    name = 'myspider'
    
    custom_settings = {
        "DOWNLOADER_MIDDLEWARES": {
            "askpablos_scrapy_api.middleware.AskPablosAPIDownloaderMiddleware": 543,
        },
        "API_KEY": "your-api-key-here",
        "SECRET_KEY": "your-secret-key-here",
        "ASKPABLOS_TIMEOUT": 30,
        "ASKPABLOS_MAX_RETRIES": 2,
        "ASKPABLOS_RETRY_DELAY": 1.0
    }
    
    # ...spider implementation...
  1. Use the middleware in your spider:
import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'
    
    def start_requests(self):
        urls = [
            'https://example.com',
            'https://api-intensive-site.com'
        ]
        
        # Regular Scrapy request - NOT using the API
        yield scrapy.Request(url=urls[0], callback=self.parse_regular)
        
        # Request using AskPablos API with a headless browser
        yield scrapy.Request(
            url=urls[1],
            callback=self.parse_api,
            meta={
                'askpablos_api_map': {
                    'browser': True,  # Use headless browser
                    'rotate_proxy': True,  # Use rotating proxies
                }
            }
        )
    
    def parse_regular(self, response):
        # Handle response from a direct request
        pass
        
    def parse_api(self, response):
        # Handle response from AskPablos API
        # (Response will be processed exactly like a normal Scrapy response)
        pass

Environment Variables

Instead of putting sensitive API keys in your settings file, you can use environment variables:

# Set these environment variables before running your spider
export ASKPABLOS_API_KEY="your_api_key"
export ASKPABLOS_SECRET_KEY="your_secret_key"

License

MIT License - See LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

askpablos_scrapy_api-0.1.3.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

askpablos_scrapy_api-0.1.3-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file askpablos_scrapy_api-0.1.3.tar.gz.

File metadata

  • Download URL: askpablos_scrapy_api-0.1.3.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for askpablos_scrapy_api-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f29124ada29d623f16bad6fae3281fdbbf70d34e7187430103c0a71ef6b10880
MD5 cc813078b64c74b2e57171fbfa1a801b
BLAKE2b-256 819032a68fd378ccb72aca8356732aa6aa393f6b495babd2b401bd0df70ef0dc

See more details on using hashes here.

File details

Details for the file askpablos_scrapy_api-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for askpablos_scrapy_api-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3a42724a43d32f46c742c8693a821456151fae45147c7ce6df2f9627f82e5b0c
MD5 6951f099d3b1f4e7049d7b8280bb7c7f
BLAKE2b-256 acbe6aa11eb3851979a2a88ae031630604938612dc30847027b7af15bd49304e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page