Skip to main content

Help convert dynamic webistes to static ones.

Project description

Geler

Help convert dynamic websites to static ones.

Install

pip install geler-CERTIC

Usage

As a library in your own program:

from geler import freeze
result = freeze("https://acme.tld/", "/path/to/local/dir/", thread_pool_size=1, http_get_timeout=30)
for err in result.http_errors:
    logger.error(
        f'status {err.get("status_code")} on URL  {err.get("url")}. Contents below:\n{err.get("content")}'
    )

As a CLI tool:

$> geler --help
usage: geler [-h] [-t THREAD_POOL_SIZE] [--http-get-timeout HTTP_GET_TIMEOUT] [-s SKIP_EXTENSIONS] [-v] start-from-url save-to-path

positional arguments:
  start-from-url        -
  save-to-path          -

optional arguments:
  -h, --help            show this help message and exit
  -t THREAD_POOL_SIZE, --thread-pool-size THREAD_POOL_SIZE
                        1
  --http-get-timeout HTTP_GET_TIMEOUT
                        30
  -s SKIP_EXTENSIONS, --skip-extensions SKIP_EXTENSIONS
                        -
  -v, --verbose         False

Thread pool size (--thread-pool-size) defaults to 1. Increase the number to have multiple downloads in parallel.

HTTP get timeout (--http-get-timeout) default to 30s. This includes the time needed to download the file. Increase the number to increase the timeout, or set it to 0 for no timeout.

List of skipped (--skip-extensions) is a comma-separated list of extensions that won't be downloaded.

Verbose mode (--verbose) will show downloaded URLs and HTTP errors.

Complete example:

geler --http-get-timeout 30 --thread-pool-size 10 --skip-extension ".mp4,.zip" https://acme.tld/ /path/to/local/dir

Why ?

For MaX and associated tools, we needed a lightweight, portable, pure Python solution to convert small dynamic websites to static ones.

Alternatives

This tool has a narrow scope, on purpose. Please turn to these solutions if you need more:

Known Limitations

  • only works with HTTP GET
  • does not submit forms (even with GET method)
  • only considers URLs in src or href attributes
  • only considers URLs with http or https schemes
  • only downloads what is in the same netloc (same domain, same port) as the start URL
  • only patches URLs in *.html files and *.css files, not *.js files (watch out for modules import)
  • does not support URLs in <style></style> tags
  • does not support URLs in style HTML attributes
  • does not throttle requests
  • does not respect robots.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geler_certic-0.2.7.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

geler_certic-0.2.7-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file geler_certic-0.2.7.tar.gz.

File metadata

  • Download URL: geler_certic-0.2.7.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.0 CPython/3.9.4 Darwin/24.1.0

File hashes

Hashes for geler_certic-0.2.7.tar.gz
Algorithm Hash digest
SHA256 1e8ddd47299f2e4bf4e95caae0af53e0a2cc45fe99e93eb65b7e8ad21b8b77ca
MD5 3bd83e2299f88b7b55915faf5fdbfc39
BLAKE2b-256 cf7fbd7d9e53c71e62665e7ed3c232b570f18d41c4338e7104b5b562459354e2

See more details on using hashes here.

File details

Details for the file geler_certic-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: geler_certic-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.0 CPython/3.9.4 Darwin/24.1.0

File hashes

Hashes for geler_certic-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 0ca349904220e5f4016f679c2a4d2b52c4699cbdf890345d1ff109b51e8231b3
MD5 bd2b5997e7d5617560eed472ebd0c6a1
BLAKE2b-256 7314bb52ebf9a4eb9a51e26dc1136bf665a12a1c265ea115519889a2ad8a18ce

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page