Skip to main content

osn-requests simplifies web scraping and requests in Python. It provides easy-to-use functions for fetching HTML, finding web elements using XPath, managing proxies, and generating random user agents.

Project description

osn-requests: Simplified Web Scraping and Requests

A Python library that simplifies making HTTP requests, especially for web scraping, with enhanced features for header management, HTML parsing, and proxy handling.

Key Features

osn-requests is designed to be a user-friendly wrapper around the popular requests library, providing a set of functions to streamline common web scraping tasks. It includes features for:

  • Simplified GET Requests: A straightforward function (get_req) for making GET requests with automatic header reformatting.
  • Easy HTML Parsing: Fetching and parsing HTML content into an lxml ElementTree with a single function (get_html), making it easy to navigate and extract data using XPath.
  • XPath Element Finding: Convenient functions (find_web_elements, find_web_element) to locate elements within parsed HTML using XPath expressions.
  • Header Management: Automatic reformatting of headers to replace underscores with hyphens, and tools to generate realistic, randomized HTTP headers such as User-Agent, Accept, Accept-Language, Accept-Encoding, and Accept-Charset.
  • Proxy Handling: Fetching lists of free proxies with filtering options for protocol and country (get_free_proxies).

Installation

  • With pip:

    pip install osn-requests
    
  • With git:

    pip install git+https://github.com/oddshellnick/osn-requests.git
    

Usage

Here are some examples of how to use osn-requests:

Making a simple GET request

from osn_requests import get_req

response = get_req("https://httpbin.org/get")
print(response.status_code)
print(response.json())

Making a GET request with custom headers

from osn_requests import get_req
from osn_requests.types import RequestHeaders

headers = RequestHeaders(
    User_Agent="My Custom Agent",
    Accept_Language="en-US,en;q=0.9"
)

response = get_req("https://httpbin.org/get", headers=headers)
print(response.request.headers)

Fetching and parsing HTML content

from osn_requests import get_html, find_web_elements

html_tree = get_html("https://example.com")
title_elements = find_web_elements(html_tree, "//title/text()")

if title_elements:
    print("Title:", title_elements[0])

Finding a single web element

from osn_requests import get_html, find_web_element

html_tree = get_html("https://example.com")
link_element = find_web_element(html_tree, "//a/@href")

if link_element:
    print("First Link URL:", link_element)

Getting a list of free proxies

from osn_requests.proxies import get_free_proxies

proxies = get_free_proxies(protocol_filter="https", country_filter="US")

if proxies:
    print("Found free HTTPS proxies in US:")
    for proxy in proxies[:3]: # Print first 3 proxies
        print(f"  {proxy['protocol']}://{proxy['ip']}:{proxy['port']} ({proxy['country']})")
else:
    print("No free proxies found matching the criteria.")

Generating random realistic headers

from osn_requests.headers.user_agent import generate_random_user_agent_header
from osn_requests.headers.accept import generate_random_realistic_accept_header
from osn_requests.headers.accept_language import generate_random_realistic_accept_language_header
from osn_requests.headers.accept_encoding import generate_random_realistic_accept_encoding_header
from osn_requests.headers.accept_charset import generate_random_realistic_accept_charset_header

print("Random User-Agent:", generate_random_user_agent_header())
print("Random Accept:", generate_random_realistic_accept_header())
print("Random Accept-Language:", generate_random_realistic_accept_language_header())
print("Random Accept-Encoding:", generate_random_realistic_accept_encoding_header())
print("Random Accept-Charset:", generate_random_realistic_accept_charset_header())

Functions

get_req(...)

Sends a GET request to the specified URL. This function is a wrapper around requests.get with automatic header reformatting (underscores in header keys are replaced with hyphens).

get_html(...)

Fetches HTML content from a URL and parses it into an lxml ElementTree for easy XPath querying. It uses get_req to fetch the content and BeautifulSoup and lxml to parse it.

find_web_elements(...)

Finds all web elements within an lxml ElementTree that match the given XPath expression. Returns a list of lxml ElementTree objects.

find_web_element(...)

Finds the first web element within an lxml ElementTree that matches the given XPath expression. Returns the first matching lxml ElementTree object or None if no match is found.

get_free_proxies(...)

Fetches a list of free proxies from a public API, optionally filtered by protocol (http, https, etc.) and country (ISO country code). Returns a list of Proxy dictionaries.

Header Generation Functions (osn_requests.headers)

  • generate_random_user_agent_header(): Generates a complete random User-Agent header string.
  • generate_random_realistic_accept_header(...): Generates a realistic random Accept header string.
  • generate_random_accept_header(...): Generates a random Accept header string from all available MIME types.
  • generate_random_realistic_accept_language_header(...): Generates a realistic random Accept-Language header string.
  • generate_random_accept_language_header(...): Generates a random Accept-Language header string from all available languages.
  • generate_random_realistic_accept_encoding_header(...): Generates a realistic random Accept-Encoding header string.
  • generate_random_accept_encoding_header(...): Generates a random Accept-Encoding header string from all available encodings.
  • generate_random_realistic_accept_charset_header(...): Generates a realistic random Accept-Charset header string.
  • generate_random_accept_charset_header(...): Generates a random Accept-Charset header string from all available charsets.

reformat_headers(...)

Reformats header keys in a dictionary by replacing underscores with hyphens.

get_proxy_link(...)

Constructs a proxy link string from a Proxy dictionary, in the format protocol://ip:port.

Types

The library defines several types using TypedDict for better type hinting and clarity:

  • RequestHeaders: A dictionary type for HTTP request headers.
  • RequestProxy: A dictionary type for proxy configurations for different protocols.
  • Proxy: A dictionary type representing a proxy server with protocol, ip, port, and country.
  • QualityValue: A dictionary type for representing items with associated quality values, used in headers like Accept and Accept-Language.

Future Notes

osn-requests is continually being developed and improved. Future plans include adding support for more advanced scraping techniques and incorporating additional utilities for handling various web data formats. Contributions and feature requests are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osn_requests-1.1.0.tar.gz (28.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

osn_requests-1.1.0-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file osn_requests-1.1.0.tar.gz.

File metadata

  • Download URL: osn_requests-1.1.0.tar.gz
  • Upload date:
  • Size: 28.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for osn_requests-1.1.0.tar.gz
Algorithm Hash digest
SHA256 2e62dc6e81b594b0138cef433b26227e3a44269b2bb20e3e5d0095c2b977af3c
MD5 21ce4f2a57d6aad1ab50d88164d81798
BLAKE2b-256 1901001bc0566f0cfa709633413c7c4bc388a208bb60dd34818f27ced64142bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for osn_requests-1.1.0.tar.gz:

Publisher: python-publish.yml on oddshellnick/osn-requests

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file osn_requests-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: osn_requests-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for osn_requests-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 50367535ab995e2a65dec54b84d8637619185046ae8ac35f136915cedd92c1a9
MD5 045065f74d3ce4edfb4062db1b1be9f1
BLAKE2b-256 997d655ffcf672ac5345e889e97c00c2fc4d7c008c34ed40ac01a804286e62b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for osn_requests-1.1.0-py3-none-any.whl:

Publisher: python-publish.yml on oddshellnick/osn-requests

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page