Define download_delay for different domain.
Project description
Scrapy-Domain-Delay
Install
$ pip install scrapy-domain-delay
Usage
Step 1: Extract only the domain name from a url using Python tldextract.
>>> import tldextract
>>> tldextract.extract('https://www.google.com/').domain
'google'
Step 2: Use the following config values in your scrapy settings:
-
Enable the AutoThrottle extension.
AUTOTHROTTLE_ENABLED = True
-
Enable the Custom Delay Throttle by adding it to
EXTENSIONS
.EXTENSIONS = { 'scrapy.extensions.throttle.AutoThrottle': None, 'scrapy_domain_delay.extensions.CustomDelayThrottle': 300, }
-
Add
{'domain': 'download delay (in seconds)'}
to theDOMAIN_DELAYS
.something like:
# set up custom delays per domain DOMAIN_DELAYS = { 'google': 1.0, }
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for scrapy-domain-delay-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e88e317f089a4fcaf9826ff5ef86e0894abec56ceac356e50cf247f6f904681 |
|
MD5 | 5e856df57a6765e31fb0f8bab40882d8 |
|
BLAKE2b-256 | affb8180a9f128925279d0102e77afb5ae8d9eec564051eaf41ac2476b3c37e4 |