Scrapy middleware for downloading a page html source using selenium, and interacting with the web driver in the request context eventually returning an HtmlResponse to the spider
Project description
scrapy-selenium-middleware
requirements
- This downloader middleware should be used inside an existing Scrapy project
- Install Firefox and gekodriver on the machine running this middleware
pip
pip install scrapy-selenium-middleware
usage example
for a full scrapy project demo please go here
The middleware receives its settings from scrapy project settings
in your scrapy project settings.py file add the following settings
DOWNLOADER_MIDDLEWARES = {"scrapy_selenium_middleware.SeleniumDownloader":451}
CONCURRENT_REQUESTS = 1 # multiple concurrent browsers are not supported yet
SELENIUM_IS_HEADLESS = False
SELENIUM_PROXY = "http://user:password@my-proxy-server:port" # set to None to not use a proxy
SELENIUM_USER_AGENT = "User-Agent: Mozilla/5.0 (<system-information>) <platform> (<platform-details>) <extensions>"
SELENIUM_REQUEST_RECORD_SCOPE = ["api*"] # a list of regular expression to record the incoming requests by matching the url
SELENIUM_FIREFOX_PROFILE_SETTINGS = {}
SELENIUM_PAGE_LOAD_TIMEOUT = 120
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scrapy_selenium_middleware-0.0.5.tar.gz
.
File metadata
- Download URL: scrapy_selenium_middleware-0.0.5.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c34f20dd918a908a6633645804ce634b27d2ac30498e93dfd9e045d7b74c514 |
|
MD5 | 992130834dcc3047443540ca1d6df132 |
|
BLAKE2b-256 | 9709e28b6b1fa43897ad6154bc3ee9e951112a20ef06b0a270517fbbbc0f781a |
File details
Details for the file scrapy_selenium_middleware-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: scrapy_selenium_middleware-0.0.5-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d02f2b675713d24ae52a2b1d8b13ff00bb9f012240d698579066b861e122d610 |
|
MD5 | 66a7b0213239b2ece05beb8ae58c0439 |
|
BLAKE2b-256 | d9016e59b04386302011e87bcc33c68e7b2eb0510f04c699bd0377df8e974987 |