osn-requests simplifies web scraping and requests in Python. It provides easy-to-use functions for fetching HTML, finding web elements using XPath, managing proxies, and generating random user agents.
Project description
osn-requests: Simplified Web Scraping and Requests
A Python library that simplifies making HTTP requests, especially for web scraping, with enhanced features for header management, HTML parsing, and proxy handling.
Key Features
osn-requests is designed to be a user-friendly wrapper around the popular requests library, providing a set of functions to streamline common web scraping tasks. It includes features for:
- Simplified GET Requests: A straightforward function (
get_req) for making GET requests with automatic header reformatting. - Easy HTML Parsing: Fetching and parsing HTML content into an
lxmlElementTree with a single function (get_html), making it easy to navigate and extract data using XPath. - XPath Element Finding: Convenient functions (
find_web_elements,find_web_element) to locate elements within parsed HTML using XPath expressions. - Header Management: Automatic reformatting of headers to replace underscores with hyphens, and tools to generate realistic, randomized HTTP headers such as
User-Agent,Accept,Accept-Language,Accept-Encoding, andAccept-Charset. - Proxy Handling: Fetching lists of free proxies with filtering options for protocol and country (
get_free_proxies).
Installation
-
With pip:
pip install osn-requests
-
With git:
pip install git+https://github.com/oddshellnick/osn-requests.git
Usage
Here are some examples of how to use osn-requests:
Making a simple GET request
from osn_requests import get_req
response = get_req("https://httpbin.org/get")
print(response.status_code)
print(response.json())
Making a GET request with custom headers
from osn_requests import get_req
from osn_requests.types import RequestHeaders
headers = RequestHeaders(
User_Agent="My Custom Agent",
Accept_Language="en-US,en;q=0.9"
)
response = get_req("https://httpbin.org/get", headers=headers)
print(response.request.headers)
Fetching and parsing HTML content
from osn_requests import get_html, find_web_elements
html_tree = get_html("https://example.com")
title_elements = find_web_elements(html_tree, "//title/text()")
if title_elements:
print("Title:", title_elements[0])
Finding a single web element
from osn_requests import get_html, find_web_element
html_tree = get_html("https://example.com")
link_element = find_web_element(html_tree, "//a/@href")
if link_element:
print("First Link URL:", link_element)
Getting a list of free proxies
from osn_requests.proxies import get_free_proxies
proxies = get_free_proxies(protocol_filter="https", country_filter="US")
if proxies:
print("Found free HTTPS proxies in US:")
for proxy in proxies[:3]: # Print first 3 proxies
print(f" {proxy['protocol']}://{proxy['ip']}:{proxy['port']} ({proxy['country']})")
else:
print("No free proxies found matching the criteria.")
Generating random realistic headers
from osn_requests.headers.user_agent import generate_random_user_agent_header
from osn_requests.headers.accept import generate_random_realistic_accept_header
from osn_requests.headers.accept_language import generate_random_realistic_accept_language_header
from osn_requests.headers.accept_encoding import generate_random_realistic_accept_encoding_header
from osn_requests.headers.accept_charset import generate_random_realistic_accept_charset_header
print("Random User-Agent:", generate_random_user_agent_header())
print("Random Accept:", generate_random_realistic_accept_header())
print("Random Accept-Language:", generate_random_realistic_accept_language_header())
print("Random Accept-Encoding:", generate_random_realistic_accept_encoding_header())
print("Random Accept-Charset:", generate_random_realistic_accept_charset_header())
Functions
get_req(...)
Sends a GET request to the specified URL. This function is a wrapper around requests.get with automatic header reformatting (underscores in header keys are replaced with hyphens).
get_html(...)
Fetches HTML content from a URL and parses it into an lxml ElementTree for easy XPath querying. It uses get_req to fetch the content and BeautifulSoup and lxml to parse it.
find_web_elements(...)
Finds all web elements within an lxml ElementTree that match the given XPath expression. Returns a list of lxml ElementTree objects.
find_web_element(...)
Finds the first web element within an lxml ElementTree that matches the given XPath expression. Returns the first matching lxml ElementTree object or None if no match is found.
get_free_proxies(...)
Fetches a list of free proxies from a public API, optionally filtered by protocol (http, https, etc.) and country (ISO country code). Returns a list of Proxy dictionaries.
Header Generation Functions (osn_requests.headers)
generate_random_user_agent_header(): Generates a complete random User-Agent header string.generate_random_realistic_accept_header(...): Generates a realistic random Accept header string.generate_random_accept_header(...): Generates a random Accept header string from all available MIME types.generate_random_realistic_accept_language_header(...): Generates a realistic random Accept-Language header string.generate_random_accept_language_header(...): Generates a random Accept-Language header string from all available languages.generate_random_realistic_accept_encoding_header(...): Generates a realistic random Accept-Encoding header string.generate_random_accept_encoding_header(...): Generates a random Accept-Encoding header string from all available encodings.generate_random_realistic_accept_charset_header(...): Generates a realistic random Accept-Charset header string.generate_random_accept_charset_header(...): Generates a random Accept-Charset header string from all available charsets.
reformat_headers(...)
Reformats header keys in a dictionary by replacing underscores with hyphens.
get_proxy_link(...)
Constructs a proxy link string from a Proxy dictionary, in the format protocol://ip:port.
Types
The library defines several types using TypedDict for better type hinting and clarity:
RequestHeaders: A dictionary type for HTTP request headers.RequestProxy: A dictionary type for proxy configurations for different protocols.Proxy: A dictionary type representing a proxy server withprotocol,ip,port, andcountry.QualityValue: A dictionary type for representing items with associated quality values, used in headers likeAcceptandAccept-Language.
Future Notes
osn-requests is continually being developed and improved. Future plans include adding support for more advanced scraping techniques and incorporating additional utilities for handling various web data formats. Contributions and feature requests are welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file osn_requests-1.1.0.tar.gz.
File metadata
- Download URL: osn_requests-1.1.0.tar.gz
- Upload date:
- Size: 28.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e62dc6e81b594b0138cef433b26227e3a44269b2bb20e3e5d0095c2b977af3c
|
|
| MD5 |
21ce4f2a57d6aad1ab50d88164d81798
|
|
| BLAKE2b-256 |
1901001bc0566f0cfa709633413c7c4bc388a208bb60dd34818f27ced64142bb
|
Provenance
The following attestation bundles were made for osn_requests-1.1.0.tar.gz:
Publisher:
python-publish.yml on oddshellnick/osn-requests
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
osn_requests-1.1.0.tar.gz -
Subject digest:
2e62dc6e81b594b0138cef433b26227e3a44269b2bb20e3e5d0095c2b977af3c - Sigstore transparency entry: 180379275
- Sigstore integration time:
-
Permalink:
oddshellnick/osn-requests@9a8be04b40e16e79a1c34035885e39c2d2b93d34 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/oddshellnick
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@9a8be04b40e16e79a1c34035885e39c2d2b93d34 -
Trigger Event:
release
-
Statement type:
File details
Details for the file osn_requests-1.1.0-py3-none-any.whl.
File metadata
- Download URL: osn_requests-1.1.0-py3-none-any.whl
- Upload date:
- Size: 33.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50367535ab995e2a65dec54b84d8637619185046ae8ac35f136915cedd92c1a9
|
|
| MD5 |
045065f74d3ce4edfb4062db1b1be9f1
|
|
| BLAKE2b-256 |
997d655ffcf672ac5345e889e97c00c2fc4d7c008c34ed40ac01a804286e62b2
|
Provenance
The following attestation bundles were made for osn_requests-1.1.0-py3-none-any.whl:
Publisher:
python-publish.yml on oddshellnick/osn-requests
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
osn_requests-1.1.0-py3-none-any.whl -
Subject digest:
50367535ab995e2a65dec54b84d8637619185046ae8ac35f136915cedd92c1a9 - Sigstore transparency entry: 180379277
- Sigstore integration time:
-
Permalink:
oddshellnick/osn-requests@9a8be04b40e16e79a1c34035885e39c2d2b93d34 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/oddshellnick
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@9a8be04b40e16e79a1c34035885e39c2d2b93d34 -
Trigger Event:
release
-
Statement type: