Skip to main content

Automatically pick an User-Agent for every request

Project description

Random User-Agent middleware picks up User-Agent strings based on Python User Agents and MDN.

Installation

The simplest way is to install it via pip:

pip install scrapy-user-agents

Configuration

Turn off the built-in UserAgentMiddleware and add RandomUserAgentMiddleware.

In Scrapy >=1.0:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
}

In Scrapy <1.0:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
    'scrapy_user_agents.middlewares.RandomUserAgentMiddleware': 400,
}

User-Agent File

A default User-Agent file is included in this repository, it contains about 2200 user agent strings collected from <https://developers.whatismybrowser.com/> using <https://github.com/hyan15/crawler-demo/tree/master/crawling-basic/common_user_agents>. You can supply your own User-Agent file by set RANDOM_UA_FILE.

Configuring User-Agent type

There’s a configuration parameter RANDOM_UA_TYPE in format <device_type>.<browser_type>, default is desktop.chrome. For device_type part, only desktop, mobile, tablet are supported. For browser_type part, only chrome, firefox, safari, ie, safari are supported. If you don’t want to fix to only one browser type, you can use random to choose from all browser types.

You can set RANDOM_UA_SAME_OS_FAMILY to True to just use user agents that belong to the same os family, such as windows, mac os, linux, or android, ios, etc. Default value is True.

Usage with scrapy-proxies

To use with middlewares of random proxy such as scrapy-proxies, you need:

  1. set RANDOM_UA_PER_PROXY to True to allow switch per proxy

  2. set priority of RandomUserAgentMiddleware to be greater than scrapy-proxies, so that proxy is set before handle UA

Configuring Fake-UserAgent fallback

There’s a configuration parameter FAKEUSERAGENT_FALLBACK defaulting to None. You can set it to a string value, for example Mozilla or Your favorite browser, this configuration can completely disable any annoying exception.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_user_agents-0.1.1.win-amd64.zip (30.1 kB view details)

Uploaded Source

Built Distribution

scrapy_user_agents-0.1.1-py2.py3-none-any.whl (27.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file scrapy_user_agents-0.1.1.win-amd64.zip.

File metadata

  • Download URL: scrapy_user_agents-0.1.1.win-amd64.zip
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/38.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.4.3

File hashes

Hashes for scrapy_user_agents-0.1.1.win-amd64.zip
Algorithm Hash digest
SHA256 aa1f78c8cbae42f1a7159c5ea16c2638ac17e78d7d44111d164ed099ec48705f
MD5 90ceaf139d9d9bad8a082413f5696e6f
BLAKE2b-256 8918dcf232312662f4242439691142ef58b90c59eb8bb196b9cc86fcbd8c6c08

See more details on using hashes here.

File details

Details for the file scrapy_user_agents-0.1.1-py2.py3-none-any.whl.

File metadata

  • Download URL: scrapy_user_agents-0.1.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 27.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/38.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.4.3

File hashes

Hashes for scrapy_user_agents-0.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 284c9af555f3128697a2953ab3cdb987b160b091a12896562d969cf9e81d1350
MD5 5c34d14eb5955e76ea21c42d781c8a30
BLAKE2b-256 501f58a58f465f6d3c75b6cca0e470613349504b8c69f3f3963c898ebabdfa21

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page