Skip to main content

A middleware to change user-agent in request for Scrapy

Project description

Overview

Scrapy is a great framework for web crawling. This downloader middleware provides a user-agent rotation based on the settings in settings.py, spider, request.

Requirements

  • Tests on Python 2.7 and Python 3.5, but it should work on other version higher then Python 3.3

  • Tests on Linux, but it’s a pure python module, it should work on other platforms with official python supported, e.g. Windows, Mac OSX, BSD

Installation

The quick way:

pip install scrapy-useragents

Or put this middleware just beside the scrapy project.

Documentation

In setting.py, for example:

# -----------------------------------------------------------------------------
# USER AGENT
# -----------------------------------------------------------------------------

DOWNLOADER_MIDDLEWARES.update({
    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
    'scrapy_useragents.downloadermiddlewares.useragents.UserAgentsMiddleware': 500,
})

USER_AGENTS = [
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/57.0.2987.110 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Linux x86_64) '
     'AppleWebKit/537.36 (KHTML, like Gecko) '
     'Chrome/61.0.3163.79 '
     'Safari/537.36'),  # chrome
    ('Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:55.0) '
     'Gecko/20100101 '
     'Firefox/55.0')  # firefox
]

Settings Reference

USER_AGENTS

A list of User-Agent to use when crawling, unless overridden.

The middleware will rotate this list by function cycle from the module itertools.

Be careful this middleware can’t handle the situation that the COOKIES_ENABLED is True, and the website binds the cookies with User-Agent, it may cause unpredictable result of the spider. This problem will be solved in the future.

TODO

  • Read User-Agent from a backend, e.g. MongoDB, MySQL, or even a file saved on the local disk.

  • Rotate User-Agent binding with cookies, keep the consistence

  • Add meta key for User-Agent selection based on each request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Scrapy-UserAgents-0.0.1.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

Scrapy_UserAgents-0.0.1-py2.py3-none-any.whl (5.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file Scrapy-UserAgents-0.0.1.tar.gz.

File metadata

File hashes

Hashes for Scrapy-UserAgents-0.0.1.tar.gz
Algorithm Hash digest
SHA256 caa6d5b3bdbddcd79678caad3bae5d5cd0f3a96144807acf491925795e75c44e
MD5 a9d0d5de20b134d5e29db718a0274c54
BLAKE2b-256 d753f83dd78f44ad6310aec870f50b216d56f938478b4bdb9886c86aff81bfc4

See more details on using hashes here.

File details

Details for the file Scrapy_UserAgents-0.0.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for Scrapy_UserAgents-0.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 316ef88068aa5107591e97c9d04a75effc2600914ac7198bbed52a1c65a4434c
MD5 95d2296b49f8ee2345196a7a7918dbcf
BLAKE2b-256 ee37efaea9801d3080facde05b79ece2fe65c0c2265a88ba5d1767432efe6ca9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page