ajax utils for scrapy.
Project description
scrapy_ajax_utils
utils for ajax in scrapy project. includes selenium, splash.
Usage
For selenium
Define the selenium webdriver name, executable path and is headless in settings.py
# Default: chrome
# Only support chrome & firefox.
SELENIUM_DRIVER_NAME = 'chrome'
# Default: None
SELENIUM_DRIVER_PATH = None
# Default: True
SELENIUM_HEADLESS = True
# Default: 30
SELENIUM_DRIVER_PAGE_LOAD_TIMEOUT = 30
# Set the min/max concurrent drivers to download page.
# Default: 3
# SELENIUM_MIN_DRIVERS = 5
# Default: 5
# SELENIUM_MAX_DRIVERS = 10
Use in your spider:
import scrapy
from scrapy_ajax_utils import selenium_support, SeleniumRequest
@selenium_support
class MySpider(scrapy.Spider):
start_urls = ['https://www.baidu.com']
def start_requests(self):
for url in self.start_urls:
yield SeleniumRequest(url)
For splash
The default splash url is http://127.0.0.1:8050, if not right, DO NOT use the splash_support
function.
import scrapy
from scrapy_ajax_utils import splash_support, SplashRequest
@splash_support
class MySpider(scrapy.Spider):
start_urls = ['https://www.baidu.com']
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file scrapy_ajax_utils-0.1272.tar.gz
.
File metadata
- Download URL: scrapy_ajax_utils-0.1272.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aee826eedd2df4d71df13888cdc18d96f974f82d6a38639292b4153e11fb7d6e |
|
MD5 | c3d1367e39949e936e283cb47735180c |
|
BLAKE2b-256 | c458316df46b2f3aebd7349c39f02a8669f4f8be5dc976c0034396a6142988d7 |