Skip to main content

A module for collecting and providing popular user agent strings, with a requests session which rotates user agents.

Project description

ua_spoofer

A Python module which collects, lists, and returns up to date and commonly used User Agent strings. This can be helpful for avoiding fingerprinting, and bypassing anti-bot/scraping measures. It also provides a Requests session wrapper which automatically uses a random user agent on every connection.

User Agents

A user agent string is sent as a header in HTTP requests to identify which browser and operating system the client is using. It can be used by websites to tailor the content to the device and software a visitor is using. It can also be used to block or restrict certain programs' access, such as bots, web crawlers and scrapers. Another consequence of these strings is they can help build a profile of a user, using the unique compination of browser and operating system versions, a technique called fingerprinting.

User agent spoofing replaces the user agent string with a random one from a list of common strings, disguising the type of client from the server and making it harder to track the user between requests. This is one of the ways to bypass restrictions and mitigate against fingerprinting.

Details

A problem with similar modules and programs is they either use a static dataset, or scrape user agents from sources which are either badly outdated or completely broken. ua_spoofer attempts to solve this by fetching data which is up to date, based on the latest browser versions, and also amalgamates data from several sources. This provides redundancy and a good mix of current user agents, without depending on an API or downloading a static dataset which quickly goes out of date. More sources can be added over time without breaking compatibility.

Installing

ua_spoofer requires Python 3, plus Requests and BeautifulSoup, commonly used modules for scraping purposes.

pip install ua_spoofer

Using

Getting User Agents

from ua_spoofer import UserAgent

ua = UserAgent()

# Random user agents from a specified browser    
ua.chrome
ua.firefox
ua.ie

# Any random user agent
ua.random

# Get a list of supported browsers
ua.BROWSERS

# Get the list of all user agent strings
ua.all

# Update the list
ua.update()

Using the Requests Session wrapper

from ua_spoofer import SpoofSession

s = SpoofSession()

# Each request will use a different user agent string
# A few other headers are randomised too
# To demonstrate:
s.get("https://icanhazheaders.com/").json()
s.get("https://icanhazheaders.com/").json()
s.get("https://icanhazheaders.com/").json()

# To get the UserAgent instance of the session
s.ua

# Updating the user agent list is done as you would expect
s.ua.update()

Other projects

As mentioned earlier, there are other Python modules which attempt to do similar things:

User agent spoofing isn't the only technique to bypass restrictions, with more sites being Javascript based and using more aggressive techniques to protect against crawlers, bots and DDoS attacks, sometimes other methods are necessary, including headless browser automation.

  • cloudflare-scrape is a module to bypass Cloudflare's anti-bot system
  • PhantomJS is a scriptable headless browser
  • Selenium is a full browser automation framework
  • Scrapy is a Python framework for building crawlers
  • Spynner is another scriptable Python browser module

In some cases, Tor or a VPN can be used to hide the client's IP address for proper anonymity.

License

ua_spoofer is released under the terms of the Apache 2.0 license.

Project details


Release history Release notifications

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for ua-spoofer, version 1.0
Filename, size File type Python version Upload date Hashes
Filename, size ua_spoofer-1.0-py3-none-any.whl (9.3 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size ua_spoofer-1.0.tar.gz (5.2 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page