Skip to main content

Scrapy middleware for downloading a page html source using selenium, and interacting with the web driver in the request context eventually returning an HtmlResponse to the spider

Project description

scrapy-selenium-middleware

requirements

  • This downloader middleware should be used inside an existing Scrapy project
  • Install Firefox and gekodriver on the machine running this middleware

pip

  • pip install scrapy-selenium-middleware

usage example

for a full scrapy project demo please go here

The middleware receives its settings from scrapy project settings
in your scrapy project settings.py file add the following settings

DOWNLOADER_MIDDLEWARES = {"scrapy_selenium_middleware.SeleniumDownloader":451}
CONCURRENT_REQUESTS = 1 # multiple concurrent browsers are not supported yet
SELENIUM_IS_HEADLESS = False
SELENIUM_PROXY = "http://user:password@my-proxy-server:port" # set to None to not use a proxy
SELENIUM_USER_AGENT = "User-Agent: Mozilla/5.0 (<system-information>) <platform> (<platform-details>) <extensions>"           
SELENIUM_REQUEST_RECORD_SCOPE = ["api*"] # a list of regular expression to record the incoming requests by matching the url
SELENIUM_FIREFOX_PROFILE_SETTINGS = {}
SELENIUM_PAGE_LOAD_TIMEOUT = 120

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_selenium_middleware-0.0.5.tar.gz (4.0 kB view hashes)

Uploaded Source

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page