osn-requests simplifies web scraping and requests in Python. It provides easy-to-use functions for fetching HTML, finding web elements using XPath, managing proxies, and generating random user agents.
Project description
osn-requests: Simplified Web Scraping and Requests
osn-requests is a lightweight Python library designed to simplify common web scraping and request tasks. It builds upon popular libraries like requests, lxml, and BeautifulSoup, providing a cleaner and more convenient interface for fetching and extracting data from websites.
Key Features:
- Easy HTML Parsing: Quickly parse HTML content using
get_html, which returns anlxmletree object ready for XPath queries. - Simplified Element Finding: Locate specific web elements using
find_web_elementandfind_web_elements, abstracting away the complexities of XPath handling. - Integrated Proxy Support: Seamlessly integrate proxies into your requests using the
proxiesparameter inget_htmlandget_json. - Dynamic User-Agent Generation: Easily obtain random user agents using
get_random_user_agentto avoid being blocked by websites. This function generates~5 * 10^777unique user-agents. - Free Proxy List Retrieval: Fetch a list of free proxies with
get_free_proxies, filtering by protocol if desired.
Installation:
-
With pip:
pip install osn-requests
-
With git:
pip install git+https://github.com/oddshellnick/osn-requests.git
Example Usage:
from osn_requests import find_web_element, get_req, get_html
from osn_requests.user_agents import generate_random_user_agent
from osn_requests.proxies import get_free_proxies
user_agent = generate_random_user_agent()
print(f"Using User-Agent: {user_agent}")
http_proxies = get_free_proxies("http")
print(f"Found {len(http_proxies)} HTTP proxies")
html = get_html("https://www.example.com", headers={"User-Agent": user_agent}, proxies=http_proxies)
title_element = find_web_element(html, "//title")
if title_element is not None:
print(f"Page Title: {title_element.text}")
json_data = get_req("https://api.example.com/data", headers={"User-Agent": user_agent}).json()
print(f"JSON Data: {json_data}")
Future Notes
osn-requests is continually being developed and improved. Future plans include adding support for more advanced scraping techniques, expanding proxy management features, and incorporating additional utilities for handling various web data formats. Contributions and feature requests are welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file osn_requests-1.0.0.tar.gz.
File metadata
- Download URL: osn_requests-1.0.0.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d4ac9118650cf2a270cbfc09d819627f59570ad0b41ab17061cf1ed67c10a00
|
|
| MD5 |
a1a6b3aa2b812860a035bbea4aad915f
|
|
| BLAKE2b-256 |
a6125b0bcc0e9492172e9f9f3dbcebaf919cf26cd2fc2afc543fbd9a14853123
|
Provenance
The following attestation bundles were made for osn_requests-1.0.0.tar.gz:
Publisher:
python-publish.yml on oddshellnick/osn-requests
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
osn_requests-1.0.0.tar.gz -
Subject digest:
9d4ac9118650cf2a270cbfc09d819627f59570ad0b41ab17061cf1ed67c10a00 - Sigstore transparency entry: 178711141
- Sigstore integration time:
-
Permalink:
oddshellnick/osn-requests@64032b1df364df72bf3944cd2ba2c0a4abe716e8 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/oddshellnick
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@64032b1df364df72bf3944cd2ba2c0a4abe716e8 -
Trigger Event:
release
-
Statement type:
File details
Details for the file osn_requests-1.0.0-py3-none-any.whl.
File metadata
- Download URL: osn_requests-1.0.0-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
841fb63f2ae8589d894f3234abde99bc471a5fef3129503ad938f6ec74c67322
|
|
| MD5 |
44e111e4fc273d35d0511fbf2ed7f982
|
|
| BLAKE2b-256 |
2ef1157c7afe16ad63dc63608da6bddeb45822925f5f1d892eec9ebc7d95235e
|
Provenance
The following attestation bundles were made for osn_requests-1.0.0-py3-none-any.whl:
Publisher:
python-publish.yml on oddshellnick/osn-requests
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
osn_requests-1.0.0-py3-none-any.whl -
Subject digest:
841fb63f2ae8589d894f3234abde99bc471a5fef3129503ad938f6ec74c67322 - Sigstore transparency entry: 178711142
- Sigstore integration time:
-
Permalink:
oddshellnick/osn-requests@64032b1df364df72bf3944cd2ba2c0a4abe716e8 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/oddshellnick
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@64032b1df364df72bf3944cd2ba2c0a4abe716e8 -
Trigger Event:
release
-
Statement type: