PyWebRequests simplifies web scraping and requests in Python. It provides easy-to-use functions for fetching HTML, finding web elements using XPath, managing proxies, and generating random user agents.
Project description
PyWebRequests is a lightweight Python library designed to simplify common web scraping and request tasks. It builds upon popular libraries like requests, lxml, and BeautifulSoup, providing a cleaner and more convenient interface for fetching and extracting data from websites.
Key Features:
Easy HTML Parsing: Quickly parse HTML content using get_html, which returns an lxml etree object ready for XPath queries.
Simplified Element Finding: Locate specific web elements using find_web_element and find_web_elements, abstracting away the complexities of XPath handling.
Integrated Proxy Support: Seamlessly integrate proxies into your requests using the proxies parameter in get_html and get_json.
Dynamic User-Agent Generation: Easily obtain random user agents using get_random_user_agent to avoid being blocked by websites.
Free Proxy List Retrieval: Fetch a list of free proxies with get_free_proxies, filtering by protocol if desired.
Installation:
pip install PyWebRequests
Example Usage:
from PyWebRequests.functions import find_web_element, get_html
from PyWebRequests.getters import get_free_proxies, get_random_user_agent
# Get a random user agent
user_agent = get_random_user_agent()
print(f"Using User-Agent: {user_agent}")
# Fetch free HTTP proxies
http_proxies = get_free_proxies("http")
print(f"Found {len(http_proxies)} HTTP proxies")
# Fetch HTML content using a random user agent and a proxy
html = get_html("https://www.example.com", headers={"User-Agent": user_agent}, proxies=http_proxies)
# Find a specific element
title_element = find_web_element(html, "//title")
if title_element is not None:
print(f"Page Title: {title_element.text}")
# Fetch JSON data
json_data = get_file("https://api.example.com/data", headers={"User-Agent": user_agent}).json()
print(f"JSON Data: {json_data}")
Future Notes
PyWebRequests is continually being developed and improved. Future plans include adding support for more advanced scraping techniques, expanding proxy management features, and incorporating additional utilities for handling various web data formats. Contributions and feature requests are welcome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pywebrequests-1.3.7.tar.gz
.
File metadata
- Download URL: pywebrequests-1.3.7.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c23b305c4911ca391cd6e2e75d4ac535c6eab2a46c52f018273d9f0bff06414 |
|
MD5 | 3cdbb5b2d81924f4667537780c79efcb |
|
BLAKE2b-256 | 96fb99035c36aba3ce269b9b7bc79a23ba0fbfa30e738dd43336b9c62cbcc535 |
File details
Details for the file PyWebRequests-1.3.7-py3-none-any.whl
.
File metadata
- Download URL: PyWebRequests-1.3.7-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | edab922e8b831063973b498aab41d9caaffdec7c419cc06eb61558a11ae87357 |
|
MD5 | 39ac6a7498819f9bc2e43dd825089d2d |
|
BLAKE2b-256 | 03a0f422a893fbf8e7a200b73855815b56ac296211b3a9bfb016095eade5a38a |