Skip to main content

Use a random User-Agent provided by fake-useragent for every request

Project description

https://travis-ci.org/alecxe/scrapy-fake-useragent.svg?branch=master https://codecov.io/gh/alecxe/scrapy-fake-useragent/branch/master/graph/badge.svg PyPI version PyPI version Requirements Status Package license

scrapy-fake-useragent

Random User-Agent middleware for Scrapy scraping framework based on fake-useragent, which picks up User-Agent strings based on usage statistics from a real world database, but also has the option to configure a generator of fake UA strings, as a backup, powered by Faker.

It also has the possibility of extending the capabilities of the middleware, by adding your own providers.

Changes

Please see CHANGELOG.

Installation

The simplest way is to install it via pip:

pip install scrapy-fake-useragent

Configuration

Turn off the built-in UserAgentMiddleware and RetryMiddleware and add RandomUserAgentMiddleware and RetryUserAgentMiddleware.

In Scrapy >=1.0:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
    'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
    'scrapy_fake_useragent.middleware.RetryUserAgentMiddleware': 401,
}

In Scrapy <1.0:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
    'scrapy.contrib.downloadermiddleware.retry.RetryMiddleware': None,
    'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
    'scrapy_fake_useragent.middleware.RetryUserAgentMiddleware': 401,
}

Recommended setting (1.3.0+):

FAKEUSERAGENT_PROVIDERS = [
    'scrapy_fake_useragent.providers.FakeUserAgentProvider',  # this is the first provider we'll try
    'scrapy_fake_useragent.providers.FakerProvider',  # if FakeUserAgentProvider fails, we'll use faker to generate a user-agent string for us
    'scrapy_fake_useragent.providers.FixedUserAgentProvider',  # fall back to USER_AGENT value
]
USER_AGENT = '<your user agent string which you will fall back to if all other providers fail>'

Additional configuration information

Enabling providers

The package comes with a thin abstraction layer of User-Agent providers, which for purposes of backwards compatibility defaults to:

FAKEUSERAGENT_PROVIDERS = [
    'scrapy_fake_useragent.providers.FakeUserAgentProvider'
]

The package has also FakerProvider (powered by Faker library) and FixedUserAgentProvider implemented and available for use if needed.

Each provider is enabled individually, and used in the order they are defined. In case a provider fails execute (for instance, it can happen to fake-useragent because of it’s dependency with an online service), the next one will be used.

Example of what FAKEUSERAGENT_PROVIDERS setting may look like in your case:

FAKEUSERAGENT_PROVIDERS = [
    'scrapy_fake_useragent.providers.FakeUserAgentProvider',
    'scrapy_fake_useragent.providers.FakerProvider',
    'scrapy_fake_useragent.providers.FixedUserAgentProvider',
    'mypackage.providers.CustomProvider'
]

Configuring fake-useragent

Parameter: FAKE_USERAGENT_RANDOM_UA_TYPE defaulting to random.

Other options, as example:

  • firefox to mimic only firefox browsers

  • desktop or mobile values to send desktop or mobile strings respectively.

You can also set the FAKEUSERAGENT_FALLBACK option, which is a fake-useragent specific fallback. For example:

FAKEUSERAGENT_FALLBACK = 'Mozilla/5.0 (Android; Mobile; rv:40.0)'

What it does is, if the selected FAKE_USERAGENT_RANDOM_UA_TYPE fails to retrieve a UA, it will use the type set in FAKEUSERAGENT_FALLBACK.

Configuring faker

Parameter: FAKER_RANDOM_UA_TYPE defaulting to user_agent which is the way of selecting totally random User-Agents values. Other options, as example:

  • chrome

  • firefox

Configuring FixedUserAgent

It also comes with a fixed provider (only provides one user agent), reusing the Scrapy’s default USER_AGENT setting value.

Usage with scrapy-proxies

To use with middlewares of random proxy such as scrapy-proxies, you need:

  1. set RANDOM_UA_PER_PROXY to True to allow switch per proxy

  2. set priority of RandomUserAgentMiddleware to be greater than scrapy-proxies, so that proxy is set before handle UA

License

The package is under MIT license. Please see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

scrapy_fake_useragent-1.4.4-py2.py3-none-any.whl (6.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file scrapy-fake-useragent-1.4.4.macosx-10.9-x86_64.tar.gz.

File metadata

  • Download URL: scrapy-fake-useragent-1.4.4.macosx-10.9-x86_64.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.7

File hashes

Hashes for scrapy-fake-useragent-1.4.4.macosx-10.9-x86_64.tar.gz
Algorithm Hash digest
SHA256 3b17e982e646918dc25080da0672812d07bfb7a92a58377c014c74e0182c665e
MD5 d6b65d7a44d60e81f59a126a5d5bda60
BLAKE2b-256 5067936f24eb4486c6340bc6ffd65b68ebb23a1121fbe6db9280acd88fb95716

See more details on using hashes here.

File details

Details for the file scrapy_fake_useragent-1.4.4-py2.py3-none-any.whl.

File metadata

  • Download URL: scrapy_fake_useragent-1.4.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.7

File hashes

Hashes for scrapy_fake_useragent-1.4.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 da0589d9245fe6348b491821f3be3387dd6563540146058e6b6c4f1bbe1358bf
MD5 b4149ab3f3ae0fe5ce61d36312f7c612
BLAKE2b-256 154e19eb30faa4189d0a0c94ab1013c776c62d315ec1a45af8b5720f44b1d507

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page