Seamlessly get authentic and ready-to-use browser headers with ease
Project description
Contents
Features
- Authentic: All header values, combinations and their ordering are
verified to work
with most web servers. - Complete: Generates all
Sec-Ch-Ua
,Sec-Ch-Ua-Mobile
,Sec-Fetch-Dest
,...
headers for Chrome-based browsers. - Powerful: Pass your own user agent in or use the convenience functions to get common, real-world user agents.
- Wide Support: Almost
all user agents supported
: Windows, macOS, Linux, Android and iOS: Google Chrome, Firefox, Safari, Edge, Opera, Whale and QQ. - Lightweight: Designed to consume minimal system resources and optimized for performance.
- Simple: Easy to use and understand with a clean and simple API.
- Compatible: Supports
Python 3.8 and above
. Runs on Windows, macOS and Linux. - Tested: Has 99% test coverage and is continuously tested.
- Open Source: Provides transparency and allows community contributions for continuous development.
Installation
Just install the package from PyPi using pip:
pip install simple-header
Usage
Quickstart
Just import the package and use the convenience function.
import simple_header as sh
sh.get_dict(url="https://www.example.com/cat/pics.html")
# {'User-Agent': 'Mozilla/5.0 ...', 'Host': 'www.example.com', 'Sec-Ch-Ua': '"Not A(Brand";v="99", ...', ...}
Advanced Usage
Import the package and use the full-fledged get()
function. For detailed explanation of function parameters, please see Settings and Parameters.
import simple_header as sh
# Get a Header instance with a random mobile user agent to scrape the desired url.
header = sh.get(url="https://www.example.com/cat/pics.html", mobile=True)
header.dict
# {'User-Agent': 'Mozilla/5.0 ...', 'Host': 'www.example.org', 'Connection': 'keep-alive', ...}
# Access more attributes of the Header instance (just a few examples).
header.connection # 'keep-alive'
header.referer # 'https://www.example.com' <- url without path
header.user_agent.string # 'Mozilla/5.0 ...' <- randomly chosen user agent
header.user_agent.os # 'Windows'
header.sec_ch_ua # '"Not A(Brand";v="99", "Microsoft Edge";v="108", "Chromium";v="108"'
header.sec_fetch_mode # ['navigate', 'same-origin', 'cors'] <- multiple values possible (list of strings)
# Overwrite auto language detection (.com = 'en-US' -> 'de-DE') and set custom seed.
header = sh.get(url="https://www.example.com/cat/pics.html", language="de-DE",seed=3)
header.referer # 'https://www.web.de/' <- referer from pool of common german websites
header.accept_language # 'de-DE,de;q=0.5' <- language set to German
sh.get(url="https...com", user_agent="Mozilla/5.0 ...") # Header instance with given user agent string.
# Header('Mozilla/5.0 ...', 'https...com', 'keep-alive', ...)
ua = sh.sua.get(num=2, mobile=True) # List of 2 the two most common mobile user agent as UserAgent instance.
sh.get(url="https...com", user_agent=ua[0]) # Header instance with the previously fetched UserAgent instance passed.
# Header('Mozilla/5.0 ...', 'https...com', 'keep-alive', ...)
You can also use get more than one Header instance at once with the get_list()
function.
The get_dict()
function returns a dictionary with the headers directly usable in a request.
# Get a list of 10 Header instances, each with the passed user agent string.
sh.get_list(url="https...com", user_agent="Mozilla/5.0 ...", num=10)
# [Header(...), Header(...), ...]
sh.get_dict(url="https://www.example.com/cat/pics.html") # Dictionary with just the headers.
# {'User-Agent': 'Mozilla/5.0 ...', 'Host': 'www.example.com', 'Connection': 'keep-alive', ...}
Fetching User Agents. For full explanation check the simple-useragent package.
# Fetch a specified number of random mobile user agent instances.
sh.sua.get(num=2, shuffle=True, mobile=True)
# [UserAgent('Mozilla/5.0 (iPhone ...'), UserAgent('Mozilla/5.0 (iPhone; ...')]
sh.sua.get_list(force_cached=True) # List of all available desktop user agents as strings.
# ['Mozilla/5.0 ...', 'Mozilla/5.0 (iPhone ...', 'Mozilla/5.0 (iPhone ...', ...]
sh.sua.get_dict() # Dictionary with all desktop and mobile user agents.
# {'desktop': ['Mozilla/5.0 ...', ...] 'mobile': ['Mozilla/5.0 (iPhone ...', ...]}
The UserAgent instance offers attributes for the user agent properties. You can also access the properties with dictionary syntax.
# Parse a custom string directly to the UserAgent class and access its attributes.
obj = sh.sua.parse('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36')
obj.string # 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit ...'
obj.browser # 'Chrome', 'Firefox', 'Safari', 'Edge', 'IE', 'Opera', 'Whale', 'QQ Browser', 'Samsung Browser', 'Other'
obj.browser_version # '110', '109', '537', ...
obj.browser_version_minor # '0', '1', '36', ...
obj['os'] # 'Windows', 'macOS', 'Linux', 'Android', 'iOS', 'Other'
obj['os_version'] # '10', '7', '11', '14', ...
obj['os_version_minor'] # '0', '1', '2', ...
obj['mobile'] # True / False
Settings and Parameters
The functions can take the following parameters:
- url: The url of the website you want to scrape.
- language: The language of the website you want to scrape or where the request is made from (default: None = auto-detect).
- user_agent: A custom user agent string or a UserAgent instance to use for header generation (default: None = random user agent).
- mobile: If no
user_agent
is passed: Generate a mobile or desktop user agent (default: False = desktop). - seed: The random seed for referer selection and header value combinations (default: None = most plausible values chosen, max: 720).
- num: The number of Header instances to fetch only for
get_list
method (default: 10, max: 720).
Notes:
- The
src/simple_header/inspect_headers.py
file contains a commented-out Flask app to validate which headers your browser or scraper sends.- The language auto-detection is based on the top-level domain of the url. You can overwrite it with the
language
parameter, by giving it a language (e.g. 'de-DE') or a country code (e.g. 'de'). Fallback for unknown or non-country domains (.org, .dev, ...) is 'en-US'.- For each language there is a pool of common websites, which are used to get a plausible referer. Also, we use the url to scrape without the path as referer (e.g. 'https://www.example.com/cat/pics.html' -> 'https://www.example.com'). The referer is used to make the request look more realistic, as it seems like the user is browsing between different pages of the website.
- The
seed
parameter is used to set the random seed for referer selection and header values (if multiple are available). This is useful if your request got blocked by the server, so you try again with another seed. There are around 720 different combinations/seeds possible.- The order of the headers is important, as most servers and bot-detectors check for that, even if the web standards say it should not be considered. I manually tested for every browser and OS which headers are sent and in which order.
Development
As an open-source project, I strive for transparency and collaboration in my development process. I greatly appreciate any contributions members of our community can provide. Whether you are fixing bugs, proposing features, improving documentation, or spreading awareness - your involvement strengthens the project. Please review the code of conduct to understand how we work together respectfully.
- Bug Report: If you are experiencing an issue while using the package, please create an issue.
- Feature Request: Make this project better by submitting a feature request.
- Documentation: Improve our documentation by adding a wiki page.
- Community Support: Help others on GitHub Discussions.
- Security Report: Report critical security issues via our template.
Contributors
Thank you so much for giving feedback, implementing features and improving the code and project!
Credits
Full credits are in the ACKNOWLEDGMENTS file.
License
Provided under the terms of the GNU GPL3 License © Lennart Haack 2024.
See LICENSE file for details. For the licenses of used third party libraries and software, please refer to the ACKNOWLEDGMENTS file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file simple-header-0.1.1.tar.gz
.
File metadata
- Download URL: simple-header-0.1.1.tar.gz
- Upload date:
- Size: 37.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 696d92b08fc8e3f59ccca8c113111d2c69e76f15b326ea991db59965a199c34c |
|
MD5 | 827cd01b098ba1cad71bfec3dd48e4ef |
|
BLAKE2b-256 | 9a57d1131b98008211d556b754349625b3c776b1a4e7e94b9a5334930f399e54 |
File details
Details for the file simple_header-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: simple_header-0.1.1-py3-none-any.whl
- Upload date:
- Size: 33.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90f92c1f6217ccd3db1eadbe7b025669a1ed2513ac033358d265b4a2979a6c5f |
|
MD5 | 390cc474d4869128b5900671b710a970 |
|
BLAKE2b-256 | b6af4e7d1e396cc516ab29c91c6a5edee465a8f002e72ef3fc9c69da2697d4a3 |