PyWebRequests

PyWebRequests simplifies web scraping and requests in Python. It provides easy-to-use functions for fetching HTML, finding web elements using XPath, managing proxies, and generating random user agents.

Project description

PyWebRequests is a lightweight Python library designed to simplify common web scraping and request tasks. It builds upon popular libraries like requests, lxml, and BeautifulSoup, providing a cleaner and more convenient interface for fetching and extracting data from websites.

Key Features:

Easy HTML Parsing: Quickly parse HTML content using get_html, which returns an lxml etree object ready for XPath queries.
Simplified Element Finding: Locate specific web elements using find_web_element and find_web_elements, abstracting away the complexities of XPath handling.
Integrated Proxy Support: Seamlessly integrate proxies into your requests using the proxies parameter in get_html and get_json.
Dynamic User-Agent Generation: Easily obtain random user agents using get_random_user_agent to avoid being blocked by websites.
Free Proxy List Retrieval: Fetch a list of free proxies with get_free_proxies, filtering by protocol if desired.

Installation:

pip install PyWebRequests

Example Usage:

from PyWebRequests.functions import find_web_element, get_html
from PyWebRequests.getters import get_free_proxies, get_random_user_agent

# Get a random user agent
user_agent = get_random_user_agent()
print(f"Using User-Agent: {user_agent}")

# Fetch free HTTP proxies
http_proxies = get_free_proxies("http")
print(f"Found {len(http_proxies)} HTTP proxies")

# Fetch HTML content using a random user agent and a proxy
html = get_html("https://www.example.com", headers={"User-Agent": user_agent}, proxies=http_proxies)

# Find a specific element
title_element = find_web_element(html, "//title")
if title_element is not None:
    print(f"Page Title: {title_element.text}")

# Fetch JSON data
json_data = get_file("https://api.example.com/data", headers={"User-Agent": user_agent}).json()
print(f"JSON Data: {json_data}")

Future Notes

PyWebRequests is continually being developed and improved. Future plans include adding support for more advanced scraping techniques, expanding proxy management features, and incorporating additional utilities for handling various web data formats. Contributions and feature requests are welcome!

Project details

Release history Release notifications | RSS feed

This version

1.3.7

Nov 15, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pywebrequests-1.3.7.tar.gz (3.9 kB view details)

Uploaded Nov 15, 2024 Source

Built Distribution

PyWebRequests-1.3.7-py3-none-any.whl (4.1 kB view details)

Uploaded Nov 15, 2024 Python 3

File details

Details for the file pywebrequests-1.3.7.tar.gz.

File metadata

Download URL: pywebrequests-1.3.7.tar.gz
Upload date: Nov 15, 2024
Size: 3.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for pywebrequests-1.3.7.tar.gz
Algorithm	Hash digest
SHA256	`2c23b305c4911ca391cd6e2e75d4ac535c6eab2a46c52f018273d9f0bff06414`
MD5	`3cdbb5b2d81924f4667537780c79efcb`
BLAKE2b-256	`96fb99035c36aba3ce269b9b7bc79a23ba0fbfa30e738dd43336b9c62cbcc535`

See more details on using hashes here.

File details

Details for the file PyWebRequests-1.3.7-py3-none-any.whl.

File metadata

Download URL: PyWebRequests-1.3.7-py3-none-any.whl
Upload date: Nov 15, 2024
Size: 4.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for PyWebRequests-1.3.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`edab922e8b831063973b498aab41d9caaffdec7c419cc06eb61558a11ae87357`
MD5	`39ac6a7498819f9bc2e43dd825089d2d`
BLAKE2b-256	`03a0f422a893fbf8e7a200b73855815b56ac296211b3a9bfb016095eade5a38a`