Automatically pick an User-Agent for every request
Project description
Random User-Agent middleware picks up User-Agent strings based on Python User Agents and MDN.
Installation
The simplest way is to install it via pip:
pip install scrapy-user-agents
Configuration
Turn off the built-in UserAgentMiddleware and add RandomUserAgentMiddleware.
In Scrapy >=1.0:
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
}
In Scrapy <1.0:
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
}
User-Agent File
A default User-Agent file is included in this repository, it contains about 2200 user agent strings collected from <https://developers.whatismybrowser.com/> using <https://github.com/hyan15/crawler-demo/tree/master/crawling-basic/common_user_agents>. You can supply your own User-Agent file by set RANDOM_UA_FILE.
Configuring User-Agent type
There’s a configuration parameter RANDOM_UA_TYPE in format <device_type>.<browser_type>, default is desktop.chrome. For device_type part, only desktop, mobile, tablet are supported. For browser_type part, only chrome, firefox, safari, ie, safari are supported. If you don’t want to fix to only one browser type, you can use random to choose from all browser types.
You can set RANDOM_UA_SAME_OS_FAMILY to True to just use user agents that belong to the same os family, such as windows, mac os, linux, or android, ios, etc. Default value is True.
Usage with scrapy-proxies
To use with middlewares of random proxy such as scrapy-proxies, you need:
set RANDOM_UA_PER_PROXY to True to allow switch per proxy
set priority of RandomUserAgentMiddleware to be greater than scrapy-proxies, so that proxy is set before handle UA
Configuring Fake-UserAgent fallback
There’s a configuration parameter FAKEUSERAGENT_FALLBACK defaulting to None. You can set it to a string value, for example Mozilla or Your favorite browser, this configuration can completely disable any annoying exception.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapy_user_agents-0.1.1.win-amd64.zip
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa1f78c8cbae42f1a7159c5ea16c2638ac17e78d7d44111d164ed099ec48705f |
|
MD5 | 90ceaf139d9d9bad8a082413f5696e6f |
|
BLAKE2b-256 | 8918dcf232312662f4242439691142ef58b90c59eb8bb196b9cc86fcbd8c6c08 |
Hashes for scrapy_user_agents-0.1.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 284c9af555f3128697a2953ab3cdb987b160b091a12896562d969cf9e81d1350 |
|
MD5 | 5c34d14eb5955e76ea21c42d781c8a30 |
|
BLAKE2b-256 | 501f58a58f465f6d3c75b6cca0e470613349504b8c69f3f3963c898ebabdfa21 |