Scrapy middleware for downloading a page html source using selenium, and interacting with the web driver in the request context eventually returning an HtmlResponse to the spider
Project description
scrapy-selenium-middleware
requirements
- This downloader middleware should be used inside an existing Scrapy project
- Install Firefox and gekodriver on the machine running this middleware
pip
pip install scrapy-selenium-middleware
usage example
The middleware receives its settings from scrapy project settings
in your scrapy project settings.py file add the following settings
DOWNLOADER_MIDDLEWARES = {"scrapy_selenium_middleware.SeleniumDownloader":451}
CONCURRENT_REQUESTS = 1 # multiple concurrent browsers are not supported yet
SELENIUM_IS_HEADLESS = False
SELENIUM_PROXY = "http://user:password@my-proxy-server:port" # set to None to not use a proxy
SELENIUM_USER_AGENT = "User-Agent: Mozilla/5.0 (<system-information>) <platform> (<platform-details>) <extensions>"
SELENIUM_REQUEST_RECORD_SCOPE = ["api*"] # a list of regular expression to record the incoming requests by matching the url
SELENIUM_FIREFOX_PROFILE_SETTINGS = {}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for scrapy_selenium_middleware-0.0.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | aac5ea62651f56f9726e0f3fdcb4c27857707f8d25f084de7ed155872872594e |
|
MD5 | fa24f0e798da31915bc83528cbce8b5b |
|
BLAKE2b-256 | 2651ef81b11c8b819d4fe5bdfdfac021961a0e37a268ab999d15193b93934768 |
Close
Hashes for scrapy_selenium_middleware-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 737124f317fbe286712ac7e5e33db1bcf01c7e8829392ebbe5363449a6c3207b |
|
MD5 | ac32044b977bb493b76cc381f696ecca |
|
BLAKE2b-256 | 6aec1bed5e01223e80fb2b4c8d4e5a88ee0e0d497850006664282b929fbe95c4 |