A professional Scrapy integration for seamlessly routing requests through AskPablos Proxy API with support for headless browser rendering and rotating IP addresses
Project description
AskPablos Scrapy API
A professional Scrapy integration for seamlessly routing requests through AskPablos Proxy API with support for headless browser rendering and rotating IP addresses.
Features
- 🔄 Selective Proxying: Only routes requests with
askpablos_api_mapin their meta - 🌐 Headless Browser Support: Render JavaScript-heavy pages
- 🔄 Rotating Proxies: Access to a pool of rotating IP addresses
- 🔒 Secure Authentication: HMAC-SHA256 request signing
- 🔁 Automatic Retries: With exponential backoff
- ⚠️ Comprehensive Error Handling: Detailed logging and error reporting
- 🛡️ Rate Limiting: Built-in request rate limiting to avoid overloading the API
Installation
pip install askpablos-scrapy-api
Quick Start
- Configure your Scrapy project settings:
# In your settings.py
API_KEY = "your_api_key" # Your AskPablos API key
SECRET_KEY = "your_secret_key" # Your AskPablos secret key
# Optional settings
ASKPABLOS_TIMEOUT = 30 # Request timeout in seconds
ASKPABLOS_MAX_RETRIES = 2 # Maximum number of retries for failed requests
ASKPABLOS_RETRY_DELAY = 1.0 # Initial delay between retries in seconds
# Add the middleware
DOWNLOADER_MIDDLEWARES = {
'askpablos_scrapy_api.middleware.AskPablosAPIDownloaderMiddleware': 950, # Adjust priority as needed
}
- Or use custom settings in your spider:
class MySpider(scrapy.Spider):
name = 'myspider'
custom_settings = {
"DOWNLOADER_MIDDLEWARES": {
"askpablos_scrapy_api.middleware.AskPablosAPIDownloaderMiddleware": 543,
},
"API_KEY": "your-api-key-here",
"SECRET_KEY": "your-secret-key-here",
"ASKPABLOS_TIMEOUT": 30,
"ASKPABLOS_MAX_RETRIES": 2,
"ASKPABLOS_RETRY_DELAY": 1.0
}
# ...spider implementation...
- Use the middleware in your spider:
import scrapy
class MySpider(scrapy.Spider):
name = 'myspider'
def start_requests(self):
urls = [
'https://example.com',
'https://api-intensive-site.com'
]
# Regular Scrapy request - NOT using the API
yield scrapy.Request(url=urls[0], callback=self.parse_regular)
# Request using AskPablos API with a headless browser
yield scrapy.Request(
url=urls[1],
callback=self.parse_api,
meta={
'askpablos_api_map': {
'browser': True, # Use headless browser
'rotate_proxy': True, # Use rotating proxies
}
}
)
def parse_regular(self, response):
# Handle response from a direct request
pass
def parse_api(self, response):
# Handle response from AskPablos API
# (Response will be processed exactly like a normal Scrapy response)
pass
Environment Variables
Instead of putting sensitive API keys in your settings file, you can use environment variables:
# Set these environment variables before running your spider
export ASKPABLOS_API_KEY="your_api_key"
export ASKPABLOS_SECRET_KEY="your_secret_key"
License
MIT License - See LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file askpablos_scrapy_api-0.1.3.tar.gz.
File metadata
- Download URL: askpablos_scrapy_api-0.1.3.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f29124ada29d623f16bad6fae3281fdbbf70d34e7187430103c0a71ef6b10880
|
|
| MD5 |
cc813078b64c74b2e57171fbfa1a801b
|
|
| BLAKE2b-256 |
819032a68fd378ccb72aca8356732aa6aa393f6b495babd2b401bd0df70ef0dc
|
File details
Details for the file askpablos_scrapy_api-0.1.3-py3-none-any.whl.
File metadata
- Download URL: askpablos_scrapy_api-0.1.3-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a42724a43d32f46c742c8693a821456151fae45147c7ce6df2f9627f82e5b0c
|
|
| MD5 |
6951f099d3b1f4e7049d7b8280bb7c7f
|
|
| BLAKE2b-256 |
acbe6aa11eb3851979a2a88ae031630604938612dc30847027b7af15bd49304e
|