Scrapy middleware for downloading a page html source using selenium, and interacting with the web driver in the request context eventually returning an HtmlResponse to the spider
Project description
scrapy-selenium-middleware
requirements
- This downloader middleware should be used inside an existing Scrapy project
- Install Firefox and gekodriver on the machine running this middleware
pip
pip install scrapy-selenium-middleware
usage example
for a full scrapy project demo please go here
The middleware receives its settings from scrapy project settings
in your scrapy project settings.py file add the following settings
DOWNLOADER_MIDDLEWARES = {"scrapy_selenium_middleware.SeleniumDownloader":451}
CONCURRENT_REQUESTS = 1 # multiple concurrent browsers are not supported yet
SELENIUM_IS_HEADLESS = False
SELENIUM_PROXY = "http://user:password@my-proxy-server:port" # set to None to not use a proxy
SELENIUM_USER_AGENT = "User-Agent: Mozilla/5.0 (<system-information>) <platform> (<platform-details>) <extensions>"
SELENIUM_REQUEST_RECORD_SCOPE = ["api*"] # a list of regular expression to record the incoming requests by matching the url
SELENIUM_FIREFOX_PROFILE_SETTINGS = {}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for scrapy_selenium_middleware-0.0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05ec2e12a4ac14d2aae208450854612a3b26b8fb3ff6d6861c4c90660d7660d2 |
|
MD5 | a259d7891b76c225622907ee0e4ee9c8 |
|
BLAKE2b-256 | 751e74c52c281dc68da9c65270dea263a5249f75039b475407eb5e21a804fb87 |
Close
Hashes for scrapy_selenium_middleware-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5fd17914a9155bad79428bfa478aeaa000096e0c87ca64691e17352beb3f4d9 |
|
MD5 | 82948aad03e0610e7df7f7b9a2d61580 |
|
BLAKE2b-256 | 7a4eb047ffd61c0cd5ee6af2efda0597cf555f6c84bbfa31eb2af682347533c5 |