Skip to main content

Scrapy middleware for downloading a page html source using selenium, and interacting with the web driver in the request context eventually returning an HtmlResponse to the spider

Project description

scrapy-selenium-middleware

requirements

  • This downloader middleware should be used inside an existing Scrapy project
  • Install Firefox and gekodriver on the machine running this middleware

pip

  • pip install scrapy-selenium-middleware

usage example

for a full scrapy project demo please go here

The middleware receives its settings from scrapy project settings
in your scrapy project settings.py file add the following settings

DOWNLOADER_MIDDLEWARES = {"scrapy_selenium_middleware.SeleniumDownloader":451}
CONCURRENT_REQUESTS = 1 # multiple concurrent browsers are not supported yet
SELENIUM_IS_HEADLESS = False
SELENIUM_PROXY = "http://user:password@my-proxy-server:port" # set to None to not use a proxy
SELENIUM_USER_AGENT = "User-Agent: Mozilla/5.0 (<system-information>) <platform> (<platform-details>) <extensions>"           
SELENIUM_REQUEST_RECORD_SCOPE = ["api*"] # a list of regular expression to record the incoming requests by matching the url
SELENIUM_FIREFOX_PROFILE_SETTINGS = {}
SELENIUM_PAGE_LOAD_TIMEOUT = 120

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_selenium_middleware-0.0.5.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file scrapy_selenium_middleware-0.0.5.tar.gz.

File metadata

  • Download URL: scrapy_selenium_middleware-0.0.5.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for scrapy_selenium_middleware-0.0.5.tar.gz
Algorithm Hash digest
SHA256 8c34f20dd918a908a6633645804ce634b27d2ac30498e93dfd9e045d7b74c514
MD5 992130834dcc3047443540ca1d6df132
BLAKE2b-256 9709e28b6b1fa43897ad6154bc3ee9e951112a20ef06b0a270517fbbbc0f781a

See more details on using hashes here.

File details

Details for the file scrapy_selenium_middleware-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: scrapy_selenium_middleware-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for scrapy_selenium_middleware-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 d02f2b675713d24ae52a2b1d8b13ff00bb9f012240d698579066b861e122d610
MD5 66a7b0213239b2ece05beb8ae58c0439
BLAKE2b-256 d9016e59b04386302011e87bcc33c68e7b2eb0510f04c699bd0377df8e974987

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page