Skip to main content

Help convert dynamic webistes to static ones.

Project description

Geler

Help convert dynamic websites to static ones.

Install

pip install geler-CERTIC

Usage

As a library in your own program:

from geler import freeze
result = freeze("https://acme.tld/", "/path/to/local/dir/", thread_pool_size=1, http_get_timeout=30)
for err in result.http_errors:
    logger.error(
        f'status {err.get("status_code")} on URL  {err.get("url")}. Contents below:\n{err.get("content")}'
    )

As a CLI tool:

$> geler --help
usage: geler [-h] [-t THREAD_POOL_SIZE] [--http-get-timeout HTTP_GET_TIMEOUT] [-s SKIP_EXTENSIONS] [-v] start-from-url save-to-path

positional arguments:
  start-from-url        -
  save-to-path          -

optional arguments:
  -h, --help            show this help message and exit
  -t THREAD_POOL_SIZE, --thread-pool-size THREAD_POOL_SIZE
                        1
  --http-get-timeout HTTP_GET_TIMEOUT
                        30
  -s SKIP_EXTENSIONS, --skip-extensions SKIP_EXTENSIONS
                        -
  -v, --verbose         False

Thread pool size (--thread-pool-size) defaults to 1. Increase the number to have multiple downloads in parallel.

HTTP get timeout (--http-get-timeout) default to 30s. This includes the time needed to download the file. Increase the number to increase the timeout, or set it to 0 for no timeout.

List of skipped (--skip-extensions) is a comma-separated list of extensions that won't be downloaded.

Verbose mode (--verbose) will show downloaded URLs and HTTP errors.

Complete example:

geler --http-get-timeout 30 --thread-pool-size 10 --skip-extension ".mp4,.zip" https://acme.tld/ /path/to/local/dir

Why ?

For MaX and associated tools, we needed a lightweight, portable, pure Python solution to convert small dynamic websites to static ones.

Alternatives

This tool has a narrow scope, on purpose. Please turn to these solutions if you need more:

Known Limitations

  • only works with HTTP GET
  • does not submit forms (even with GET method)
  • only considers URLs in src or href attributes
  • only considers URLs with http or https schemes
  • only downloads what is in the same netloc (same domain, same port) as the start URL
  • only patches URLs in *.html files and *.css files, not *.js files (watch out for modules import)
  • does not support URLs in <style></style> tags
  • does not support URLs in style HTML attributes
  • does not throttle requests
  • does not respect robots.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geler_certic-0.2.8.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

geler_certic-0.2.8-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file geler_certic-0.2.8.tar.gz.

File metadata

  • Download URL: geler_certic-0.2.8.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.0 CPython/3.9.4 Darwin/24.1.0

File hashes

Hashes for geler_certic-0.2.8.tar.gz
Algorithm Hash digest
SHA256 932c838b63fb65da4d3ee9768ff6523f0f5f13b0b15bcccd8be01c5b2acb759c
MD5 d7d7fb25e62c1d5aac7ff43cbd512f8d
BLAKE2b-256 5e6fded1e6894a88713ea4e8975459688a42fac5bcd9325b8f5bcfaa0d3a54de

See more details on using hashes here.

File details

Details for the file geler_certic-0.2.8-py3-none-any.whl.

File metadata

  • Download URL: geler_certic-0.2.8-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.0 CPython/3.9.4 Darwin/24.1.0

File hashes

Hashes for geler_certic-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 5f9d69cafd3d063858ab0fb2f3bb576ab8136d55e77c29aa8f40eaf8bd8e7e6c
MD5 81c5cc569f066abee6968cc8e03951f3
BLAKE2b-256 2d15fa4af054c7dd2a3b836ac95c91513235e2f5d03682d5b1de197d966f309a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page