Skip to main content

Den K's Web Module

Project description

Den K's Web Module - dkwebmod

About The Project

Den K's Web Module contains a set of scripts for web operations: downloading files, fetching page content (static and via Playwright), URL parsing, and SSL-aware HTTP requests.

Getting Started

To get a local copy up and running follow these simple steps.

Installation

  1. Install Python.
  2. Install the library using pip:
    pip install dkwebmod
    

Modules

web

Core module for web operations:

  • download — download a file from a URL to a local directory, with SSL fallback (certifi then system CA), progress output, and overwrite control.
  • download_and_extract_file — download an archive and extract it in one step.
  • get_page_bytes — fetch raw page content as bytes, with optional user-agent spoofing.
  • get_page_content — fetch page content using urllib (static pages) or Playwright (dynamic/JS pages), with output as HTML, text, PDF, PNG, or JPEG.
  • is_status_ok — check whether an HTTP status code is 200.
  • get_filename_from_url — extract the filename from a URL.

githubw — GitHub Wrapper

Wrapper around the GitHub API for downloading repositories, releases, and querying commits:

  • GitHubWrapper class — initialize with user/repo or a repo URL, then:
    • download_and_extract_branch — download and extract a branch (or a specific path within it).
    • download_file / download_directory — download individual files or entire directories from a repo.
    • download_latest_release / download_and_extract_latest_release — download the latest release asset matching a glob pattern.
    • get_latest_release_json / get_latest_release_version / get_latest_release_url — query release metadata.
    • get_releases_json — list releases with optional pattern filtering and pagination.
    • get_latest_commit / get_latest_commit_message — retrieve the latest commit data or message for a branch/path.
    • list_files — list files in the repo (with glob pattern and recursive options).

Running from the command line

The githubw module can be executed directly:

python -m dkwebmod.githubw -u https://github.com/user/repo -b main [options]
Flag Description
-u, --repo_url Repository URL (required)
-b, --branch Branch name (required)
-p, --path Path to a file/folder inside the repo
-t, --target_directory Local directory to download to
--pat Personal access token
-glcm Print the latest commit message
-glcj Print the latest commit JSON
-db Download the branch (or path if -p is set)

Examples:

:: Get the latest commit message for a specific path
python -m dkwebmod.githubw -u https://github.com/user/repo -b main -p src/config.json -glcm

:: Download a branch to a local directory
python -m dkwebmod.githubw -u https://github.com/user/repo -b main -t C:\Downloads\repo -db

:: Download only a specific folder from the branch
python -m dkwebmod.githubw -u https://github.com/user/repo -b main -p docs -t C:\Downloads\docs -db

urls

URL parsing and validation utilities:

  • url_parser — parse a URL into its components (scheme, netloc, path, directories, queries, file).
  • is_valid_url — check whether a string is a valid URL.
  • find_urls_in_text — extract all URLs from a block of text.

user_agents

A dictionary of common browser user-agent strings for use with web requests.

Playwright Wrapper Module - playwrightw

This module was built in the early stages of the project mainly for reference purposes and is not actively used or maintained. However, if you're interested in Playwright, you can find there some useful usage examples for browser automation, element interaction, waiting strategies, and more.

If you still want to use it, you will need to install the following dependencies:

  • beautifulsoup4 — HTML parsing library.
  • playwright — Python bindings for Playwright.
  • Playwright browsers — the actual browser binaries used by Playwright.
pip install beautifulsoup4==4.14.3
pip install playwright==1.56.0
pip install pillow==12.2.0
playwright install

Note: You can use newer versions of these modules, but they were not tested with this project.

License

Distributed under the MIT License. See LICENSE.txt for more information.

History

History.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dkwebmod-1.0.2-py3-none-any.whl (36.3 kB view details)

Uploaded Python 3

File details

Details for the file dkwebmod-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: dkwebmod-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 36.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for dkwebmod-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8d394fdf5462139936599e3ee4cd714fea81f971f46439114321c53b76a0ae0d
MD5 5b90e73cf0479b2252a741f8ff6d5f4c
BLAKE2b-256 da58c85e1c4ff61acbd2c13055c03c0cf0b7f1617bf020e65ff4f6aed3b7805f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page