This package provides a way to let you set different delay for different website, using the Scrapy framework.
Project description
Scrapy-Domain-Delay
Scrapy-Domain-Delay is a package that lets you set different delay for different website, using the Scrapy framework.
Install
$ pip install scrapy-domain-delay
Usage
Step 1: Extract the domain name from a full url using Python tldextract.
>>> import tldextract
>>> tldextract.extract('https://www.google.com/').domain
'google'
In this example, we would extract "google" as domain name from a full url "https://www.google.com/".
Step 2: Use the following config values in your scrapy settings:
-
Enable the AutoThrottle extension.
AUTOTHROTTLE_ENABLED = True
-
Enable the Custom Delay Throttle by adding it to
EXTENSIONS.EXTENSIONS = { 'scrapy.extensions.throttle.AutoThrottle': None, 'scrapy_domain_delay.extensions.CustomDelayThrottle': 300, }
-
Add
{'domain': 'download delay (in seconds)'}to theDOMAIN_DELAYS.something like:
# set up custom delays per domain DOMAIN_DELAYS = { 'google': 1.0, 'github': 0.5, }
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file scrapy-domain-delay-0.0.4.tar.gz.
File metadata
- Download URL: scrapy-domain-delay-0.0.4.tar.gz
- Upload date:
- Size: 2.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b99d196b967fea5216053d289af74c6cbdd38e9e717cd0958da607f924c10357
|
|
| MD5 |
87a10a71ba456a7e086f280555e4e67c
|
|
| BLAKE2b-256 |
d7bdb975b43c6bcd48ae2db3078a9248328760bd994204144ead41797a778aea
|