Skip to main content

Use a random User-Agent provided by fake-useragent for every request

Project description

https://travis-ci.org/alecxe/scrapy-fake-useragent.svg?branch=master https://codecov.io/gh/alecxe/scrapy-fake-useragent/branch/master/graph/badge.svg PyPI version PyPI version Requirements Status

scrapy-fake-useragent

Random User-Agent middleware based on fake-useragent. It picks up User-Agent strings based on usage statistics from a real world database.

Installation

The simplest way is to install it via pip:

pip install scrapy-fake-useragent

Configuration

Turn off the built-in UserAgentMiddleware and RetryMiddleware and add RandomUserAgentMiddleware and RetryUserAgentMiddleware.

In Scrapy >=1.0:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
    'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
    'scrapy_fake_useragent.middleware.RetryUserAgentMiddleware': 401,
}

In Scrapy <1.0:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
    'scrapy.contrib.downloadermiddleware.retry.RetryMiddleware': None,
    'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
    'scrapy_fake_useragent.middleware.RetryUserAgentMiddleware': 401,
}

Configuring User-Agent type

There’s a configuration parameter RANDOM_UA_TYPE defaulting to random which is passed verbatim to the fake-user-agent. Therefore you can set it to say firefox to mimic only firefox browsers. Most useful though would be to use desktop or mobile values to send desktop or mobile strings respectively.

Usage with scrapy-proxies

To use with middlewares of random proxy such as scrapy-proxies, you need:

  1. set RANDOM_UA_PER_PROXY to True to allow switch per proxy

  2. set priority of RandomUserAgentMiddleware to be greater than scrapy-proxies, so that proxy is set before handle UA

Configuring Fake-UserAgent fallback

There’s a configuration parameter FAKEUSERAGENT_FALLBACK defaulting to None. You can set it to a string value, for example Mozilla or Your favorite browser, this configuration can completely disable any annoying exception that may happen if fake-useragent failed to retrieve a random UA string.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-fake-useragent-1.2.0.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scrapy_fake_useragent-1.2.0-py2.py3-none-any.whl (4.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file scrapy-fake-useragent-1.2.0.tar.gz.

File metadata

  • Download URL: scrapy-fake-useragent-1.2.0.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.7.2

File hashes

Hashes for scrapy-fake-useragent-1.2.0.tar.gz
Algorithm Hash digest
SHA256 7480c9487304775601d8d1b81f89caf97dc2a664f2f42b7909b46d02d0f4aa0a
MD5 ea080e27a699d4788e03d75eb7d79767
BLAKE2b-256 5c7cd0169ce7302191cc5ba2d1fd255a63303c82789849d23dc692f0a8f92c09

See more details on using hashes here.

File details

Details for the file scrapy_fake_useragent-1.2.0-py2.py3-none-any.whl.

File metadata

  • Download URL: scrapy_fake_useragent-1.2.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.9.1 tqdm/4.41.0 CPython/3.7.2

File hashes

Hashes for scrapy_fake_useragent-1.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8dcf4c8cffd745d669c88ce59fc5daab550945c5dbae4c2b50985e5d12d735c5
MD5 2b8105d387816d2508fdb20ac6d912c9
BLAKE2b-256 cc8dfaa730b8d1cb5114cb8d314b078167694d17c3d394992490551c2308928d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page