Offline website cloner, updater, and packager

These details have not been verified by PyPI

Project links

Environment
- Console
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3.8
Topic
- Internet :: WWW/HTTP
- Utilities

Project description

WebCloner

Clone, update, package & serve websites for offline use – all from one tiny Python script.

Made by Synthfax

Features

Command	What it does
clone	Recursively downloads a live site to a local folder and rewrites internal links.
run	Fires up a lightweight Flask web‑server that serves a cloned repo.
update	Refreshes an existing repo safely by cloning into a temp dir and syncing changes
savewcof	Bundles an entire repo into a single `.wcof` archive (ZIP under the hood).
runwcof	Serves a `.wcof` file directly – no manual extraction required.

Additional niceties:

Progress bars via tqdm so you’re never in the dark.
Domain‑locked crawling – stays on the origin host.
Depth limiter so you don’t mirror the whole internet by accident.
Pure‑Python – works on Windows, macOS & Linux (incl. WSL & Termux).

Requirements

Python ≥ 3.8
The following PyPI packages (automatically pulled in by pip install):
- requests
- beautifulsoup4
- tqdm
- flask

Installation

🔌 One‑liner (recommended)

python -m pip install webcloner

(Replace python with python3 on some systems.)

🛠️ From source (for bleeding‑edge or hacking)

git clone https://github.com/yourname/webcloner.git
cd webcloner
python -m pip install -r requirements.txt
# Make the script globally available
python setup.py install  # or `pip install -e .` for editable mode

The installer drops a console entry‑point named webcloner into your PATH.

Quick Start

# 1. Mirror the site into ./offline_copy (max 2 levels deep)
webcloner clone https://example.com ./offline_copy --depth 2

# 2. Take a look in your browser
webcloner run ./offline_copy 8000  # -> http://localhost:8000

# 3. Package the repo into a single file you can email or stick on a USB drive
webcloner savewcof mysite.wcof ./offline_copy

# 4. Hand the .wcof to a friend – they can serve it instantly:
webcloner runwcof mysite.wcof 8080

Detailed Command Guide

`clone`

webcloner clone <url> <output_dir> [--depth N]

url – starting page (must include protocol).
output_dir – destination folder (will be created if missing).
--depth – recursion limit (default 2). Set to 0 for only the start page.

Behind the scenes the crawler:

Downloads the page.
Parses the HTML with BeautifulSoup.
Rewrites internal links (href, src) to point at local paths.
Enqueues discovered same‑domain assets & pages until the depth limit.

`run`

webcloner run <repo_dir> <port> [--host 0.0.0.0]

Serves static files out of repo_dir using Flask. Perfect for quick checks or sharing over LAN.

`update`

webcloner update <url> <repo_dir> [--depth N]

Safely refreshes an existing repo:

Clones the live site into a temporary directory.
Compares modification times and copies newer/added files back.
Leaves untouched anything that the live site no longer has (in case you keep local notes).

`savewcof`

webcloner savewcof <filename.wcof> <dest_dir> <repo_dir>

Creates a zip‑compressed Web Cloner Offline File. Think of it as a self‑contained website in a single file.

`runwcof`

webcloner runwcof <file.wcof> <port> [--host 0.0.0.0]

Extracts the archive to a temp folder in memory and launches the server – super handy for throw‑and‑go demos.

Typical Workflows

Archiving a Documentation Site

webcloner clone https://docs.oldsoftware.com ./docs --depth 3
webcloner savewcof docs_2025-06-25.wcof ./dist ./docs

Transfer the .wcof to any air‑gapped machine and serve:

webcloner runwcof docs_2025-06-25.wcof 7000

Keeping a Local Mirror Fresh

# Nightly cron job (Linux/macOS)
0 3 * * * webcloner update https://myblog.com /srv/mirrors/myblog --depth 2 >> /var/log/webcloner.log 2>&1

How It Works

URL Normalisation – Strips query/fragment, treats a bare path as /index.html.
Same‑Domain Filter – No cross‑site requests (stops runaway downloads).
Breadth‑first Crawl – Queue of (url, depth); avoids recursion stack blow‑ups.
HTML Re‑write – Converts each internal link to a relative filesystem path so that the site works off‑disk.
Asset Handling – Non‑HTML responses are stored verbatim (images, CSS, JS, etc.).
Packaging – A .wcof is just a ZIP with your folder structure – the magic is knowing to look for index.html when serving.

FAQ & Troubleshooting

Question	Answer
It’s downloading external CDNs!	Only same‑host links are followed, but CSS/JS may reference offsite assets. Consider using a CSS post‑processor or mirror those domains separately.
Pages show garbled characters	Force UTF‑8 decoding with `--encoding utf-8` (coming soon) or file an issue.
Can I clone sites that need login?	Currently no – but you can proxy the session by editing `cloner.py` to inject cookies into `requests.Session()`.
Is JavaScript executed?	No. This is a static grabber. SPA sites that build HTML client‑side will download, but you’ll only get the bare JS/JSON, not the rendered pages.

Contributing

Pull requests are welcome! If you spot a bug or have a feature idea:

Open an issue with steps to reproduce.
Fork & create a topic branch.
Run black cloner.py && flake8 before pushing.
Submit a PR – CI will run unit tests automatically.

License

This project is licensed under the Apache License 2.0 – see LICENSE for full terms.

Project details

These details have not been verified by PyPI

Project links

Environment
- Console
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3
- Python :: 3.8
Topic
- Internet :: WWW/HTTP
- Utilities

Release history Release notifications | RSS feed

1.0.2

Jun 25, 2025

This version

1.0.1

Jun 25, 2025

1.0.0

Jun 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webcloner-1.0.1.tar.gz (14.4 kB view details)

Uploaded Jun 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

webcloner-1.0.1-py3-none-any.whl (12.0 kB view details)

Uploaded Jun 25, 2025 Python 3

File details

Details for the file webcloner-1.0.1.tar.gz.

File metadata

Download URL: webcloner-1.0.1.tar.gz
Upload date: Jun 25, 2025
Size: 14.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.4

File hashes

Hashes for webcloner-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`515da721535acdd80a3f40a9ed66dcbf73c69f4a5ba6a30a52bdb74aa62add7c`
MD5	`68198659adbb91be232dbc8597e14173`
BLAKE2b-256	`52e11585956ea7aff36cf45125d3546dd6316ea47123e0fb496c478112af3ca6`

See more details on using hashes here.

File details

Details for the file webcloner-1.0.1-py3-none-any.whl.

File metadata

Download URL: webcloner-1.0.1-py3-none-any.whl
Upload date: Jun 25, 2025
Size: 12.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.4

File hashes

Hashes for webcloner-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`92e7e458bad2804ee8b0ddf3fbf646c0b65a78296eeec4bad52f3d562a547d6b`
MD5	`8299d6fb76aad89bb96be046272f2770`
BLAKE2b-256	`1d5be5e42c7ab0018688ba2d0412d3e43e9841b028769c004eb84ac78284387b`

See more details on using hashes here.

webcloner 1.0.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

WebCloner

Features

Requirements

Installation

🔌 One‑liner (recommended)

🛠️ From source (for bleeding‑edge or hacking)

Quick Start

Detailed Command Guide

clone

run

update

savewcof

runwcof

Typical Workflows

Archiving a Documentation Site

Keeping a Local Mirror Fresh

How It Works

FAQ & Troubleshooting

Contributing

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`clone`

`run`

`update`

`savewcof`

`runwcof`