A middleware to change user-agent in request for Scrapy
Project description
Overview
Scrapy is a great framework for web crawling. This downloader middleware provides a user-agent rotation based on the settings in settings.py, spider, request.
Requirements
Tests on Python 2.7 and Python 3.5, but it should work on other version higher then Python 3.3
Tests on Linux, but it’s a pure python module, it should work on other platforms with official python supported, e.g. Windows, Mac OSX, BSD
Installation
The quick way:
pip install scrapy-useragents
Or put this middleware just beside the scrapy project.
Documentation
In setting.py, for example:
# ----------------------------------------------------------------------------- # USER AGENT # ----------------------------------------------------------------------------- DOWNLOADER_MIDDLEWARES.update({ 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None, 'scrapy_useragents.downloadermiddlewares.useragents.UserAgentsMiddleware': 500, }) USER_AGENTS = [ ('Mozilla/5.0 (X11; Linux x86_64) ' 'AppleWebKit/537.36 (KHTML, like Gecko) ' 'Chrome/57.0.2987.110 ' 'Safari/537.36'), # chrome ('Mozilla/5.0 (X11; Linux x86_64) ' 'AppleWebKit/537.36 (KHTML, like Gecko) ' 'Chrome/61.0.3163.79 ' 'Safari/537.36'), # chrome ('Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) ' 'Gecko/20100101 ' 'Firefox/55.0') # firefox ]
Settings Reference
USER_AGENTS
A list of User-Agent to use when crawling, unless overridden.
The middleware will rotate this list by function cycle from the module itertools.
Be careful this middleware can’t handle the situation that the COOKIES_ENABLED is True, and the website binds the cookies with User-Agent, it may cause unpredictable result of the spider. This problem will be solved in the future.
TODO
Read User-Agent from a backend, e.g. MongoDB, MySQL, or even a file saved on the local disk.
Rotate User-Agent binding with cookies, keep the consistence
Add meta key for User-Agent selection based on each request
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file Scrapy-UserAgents-0.0.1.tar.gz
.
File metadata
- Download URL: Scrapy-UserAgents-0.0.1.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | caa6d5b3bdbddcd79678caad3bae5d5cd0f3a96144807acf491925795e75c44e |
|
MD5 | a9d0d5de20b134d5e29db718a0274c54 |
|
BLAKE2b-256 | d753f83dd78f44ad6310aec870f50b216d56f938478b4bdb9886c86aff81bfc4 |
File details
Details for the file Scrapy_UserAgents-0.0.1-py2.py3-none-any.whl
.
File metadata
- Download URL: Scrapy_UserAgents-0.0.1-py2.py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 316ef88068aa5107591e97c9d04a75effc2600914ac7198bbed52a1c65a4434c |
|
MD5 | 95d2296b49f8ee2345196a7a7918dbcf |
|
BLAKE2b-256 | ee37efaea9801d3080facde05b79ece2fe65c0c2265a88ba5d1767432efe6ca9 |