Help convert dynamic webistes to static ones.
Project description
Geler
Help convert dynamic websites to static ones.
Install
pip install geler-CERTIC
Usage
As a library in your own program:
from geler import freeze
result = freeze("https://acme.tld/", "/path/to/local/dir/", thread_pool_size=1, http_get_timeout=30)
for err in result.http_errors:
logger.error(
f'status {err.get("status_code")} on URL {err.get("url")}. Contents below:\n{err.get("content")}'
)
As a CLI tool:
$> geler --help
usage: geler [-h] [-t THREAD_POOL_SIZE] [--http-get-timeout HTTP_GET_TIMEOUT] [-s SKIP_EXTENSIONS] [-v] start-from-url save-to-path
positional arguments:
start-from-url -
save-to-path -
optional arguments:
-h, --help show this help message and exit
-t THREAD_POOL_SIZE, --thread-pool-size THREAD_POOL_SIZE
1
--http-get-timeout HTTP_GET_TIMEOUT
30
-s SKIP_EXTENSIONS, --skip-extensions SKIP_EXTENSIONS
-
-v, --verbose False
Thread pool size (--thread-pool-size
) defaults to 1. Increase the number to have multiple downloads in parallel.
HTTP get timeout (--http-get-timeout
) default to 30s. This includes the time needed to download the file. Increase the number to increase the timeout, or set it to 0 for no timeout.
List of skipped (--skip-extensions
) is a comma-separated list of extensions that won't be downloaded.
Verbose mode (--verbose
) will show downloaded URLs and HTTP errors.
Complete example:
geler --http-get-timeout 30 --thread-pool-size 10 --skip-extension ".mp4,.zip" https://acme.tld/ /path/to/local/dir
Why ?
For MaX and associated tools, we needed a lightweight, portable, pure Python solution to convert small dynamic websites to static ones.
Alternatives
This tool has a narrow scope, on purpose. Please turn to these solutions if you need more:
Known Limitations
- only works with HTTP GET
- does not submit forms (even with GET method)
- only considers URLs in
src
orhref
attributes - only considers URLs with
http
orhttps
schemes - only downloads what is in the same netloc (same domain, same port) as the start URL
- only patches URLs in
*.html
files and*.css
files, not*.js
files (watch out for modules import) - does not support URLs in
<style></style>
tags - does not support URLs in
style
HTML attributes - does not throttle requests
- does not respect
robots.txt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file geler_certic-0.2.8.tar.gz
.
File metadata
- Download URL: geler_certic-0.2.8.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.0 CPython/3.9.4 Darwin/24.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 932c838b63fb65da4d3ee9768ff6523f0f5f13b0b15bcccd8be01c5b2acb759c |
|
MD5 | d7d7fb25e62c1d5aac7ff43cbd512f8d |
|
BLAKE2b-256 | 5e6fded1e6894a88713ea4e8975459688a42fac5bcd9325b8f5bcfaa0d3a54de |
File details
Details for the file geler_certic-0.2.8-py3-none-any.whl
.
File metadata
- Download URL: geler_certic-0.2.8-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.0 CPython/3.9.4 Darwin/24.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f9d69cafd3d063858ab0fb2f3bb576ab8136d55e77c29aa8f40eaf8bd8e7e6c |
|
MD5 | 81c5cc569f066abee6968cc8e03951f3 |
|
BLAKE2b-256 | 2d15fa4af054c7dd2a3b836ac95c91513235e2f5d03682d5b1de197d966f309a |