Skip to main content

PyWebRequests simplifies web scraping and requests in Python. It provides easy-to-use functions for fetching HTML, finding web elements using XPath, managing proxies, and generating random user agents.

Project description

PyWebRequests is a lightweight Python library designed to simplify common web scraping and request tasks. It builds upon popular libraries like requests, lxml, and BeautifulSoup, providing a cleaner and more convenient interface for fetching and extracting data from websites.

Key Features:

  • Easy HTML Parsing: Quickly parse HTML content using get_html, which returns an lxml etree object ready for XPath queries.

  • Simplified Element Finding: Locate specific web elements using find_web_element and find_web_elements, abstracting away the complexities of XPath handling.

  • Integrated Proxy Support: Seamlessly integrate proxies into your requests using the proxies parameter in get_html and get_json.

  • Dynamic User-Agent Generation: Easily obtain random user agents using get_random_user_agent to avoid being blocked by websites.

  • Free Proxy List Retrieval: Fetch a list of free proxies with get_free_proxies, filtering by protocol if desired.

Installation:

pip install PyWebRequests

Example Usage:

from PyWebRequests.functions import find_web_element, get_html
from PyWebRequests.getters import get_free_proxies, get_random_user_agent

# Get a random user agent
user_agent = get_random_user_agent()
print(f"Using User-Agent: {user_agent}")

# Fetch free HTTP proxies
http_proxies = get_free_proxies("http")
print(f"Found {len(http_proxies)} HTTP proxies")

# Fetch HTML content using a random user agent and a proxy
html = get_html("https://www.example.com", headers={"User-Agent": user_agent}, proxies=http_proxies)

# Find a specific element
title_element = find_web_element(html, "//title")
if title_element is not None:
    print(f"Page Title: {title_element.text}")

# Fetch JSON data
json_data = get_file("https://api.example.com/data", headers={"User-Agent": user_agent}).json()
print(f"JSON Data: {json_data}")

Future Notes

PyWebRequests is continually being developed and improved. Future plans include adding support for more advanced scraping techniques, expanding proxy management features, and incorporating additional utilities for handling various web data formats. Contributions and feature requests are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pywebrequests-1.3.7.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

PyWebRequests-1.3.7-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file pywebrequests-1.3.7.tar.gz.

File metadata

  • Download URL: pywebrequests-1.3.7.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for pywebrequests-1.3.7.tar.gz
Algorithm Hash digest
SHA256 2c23b305c4911ca391cd6e2e75d4ac535c6eab2a46c52f018273d9f0bff06414
MD5 3cdbb5b2d81924f4667537780c79efcb
BLAKE2b-256 96fb99035c36aba3ce269b9b7bc79a23ba0fbfa30e738dd43336b9c62cbcc535

See more details on using hashes here.

File details

Details for the file PyWebRequests-1.3.7-py3-none-any.whl.

File metadata

File hashes

Hashes for PyWebRequests-1.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 edab922e8b831063973b498aab41d9caaffdec7c419cc06eb61558a11ae87357
MD5 39ac6a7498819f9bc2e43dd825089d2d
BLAKE2b-256 03a0f422a893fbf8e7a200b73855815b56ac296211b3a9bfb016095eade5a38a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page