This package provides a way to let you set different delay for different website, using the Scrapy framework.
Project description
Scrapy-Domain-Delay
Scrapy-Domain-Delay
is a package that lets you set different delay for different website, using the Scrapy framework.
Install
$ pip install scrapy-domain-delay
Usage
Step 1: Extract the domain name from a full url using Python tldextract.
>>> import tldextract
>>> tldextract.extract('https://www.google.com/').domain
'google'
In this example, we would extract "google"
as domain name from a full url "https://www.google.com/"
.
Step 2: Use the following config values in your scrapy settings:
-
Enable the AutoThrottle extension.
AUTOTHROTTLE_ENABLED = True
-
Enable the Custom Delay Throttle by adding it to
EXTENSIONS
.EXTENSIONS = { 'scrapy.extensions.throttle.AutoThrottle': None, 'scrapy_domain_delay.extensions.CustomDelayThrottle': 300, }
-
Add
{'domain': 'download delay (in seconds)'}
to theDOMAIN_DELAYS
.something like:
# set up custom delays per domain DOMAIN_DELAYS = { 'google': 1.0, 'github': 0.5, }
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for scrapy-domain-delay-0.0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b99d196b967fea5216053d289af74c6cbdd38e9e717cd0958da607f924c10357 |
|
MD5 | 87a10a71ba456a7e086f280555e4e67c |
|
BLAKE2b-256 | d7bdb975b43c6bcd48ae2db3078a9248328760bd994204144ead41797a778aea |