pywebarchive

Module for reading Apple's webarchive format

These details have not been verified by PyPI

Project links

Homepage

Development Status
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

pywebarchive is software for reading Apple's webarchive format. It consists of two pieces:

Webarchive Extractor converts webarchive files to standard pages you can open in any browser.
The webarchive Python module is the code "under the hood" that makes the Extractor work. It's available for other applications to use, too.

pywebarchive is open-source software released under the permissive MIT License. Development has stopped with version 0.5.2 as Apple appears to have deprecated the webarchive format.

Features

Runs on Windows, macOS, and Linux
Converts webarchive files to plain HTML
Handles images, scripts, and style sheets
Converted pages display just like they would in Safari (apart from normal cross-browser rendering differences)

Downloads

The final version is pywebarchive 0.5.2 (released September 24, 2023). See the changelog for what's new.

Note: If you're not reading this on GitHub, this section may be out of date. In that case, the latest version of pywebarchive is available at https://github.com/bmjcode/pywebarchive.

File	Size	Description
Webarchive.Extractor.exe	7.3 MB	Webarchive Extractor for 32-bit Windows
Webarchive.Extractor.x64.exe	8.0 MB	Webarchive Extractor for 64-bit Windows
pywebarchive-0.5.2.zip		source code (zip)
pywebarchive-0.5.2.tar.gz		source code (tar.gz)

The Windows version of Webarchive Extractor runs on Windows 7 and higher. It is a portable application -- it doesn't require installation, and won't write to Application Data or the Windows Registry.

On macOS and Linux (and Windows with Python installed), you can run Webarchive Extractor directly from the source code. Both command-line (extractor.py) and graphical (extractor-gui.py) versions are included.

If you're a Python developer, you can also install the webarchive module from PyPI using pip install pywebarchive. Note the module you import is just webarchive, but the package you install is pywebarchive; this is because an unrelated project already claimed the shorter package name.

Requirements

Python 3
Tkinter (only required by extractor-gui.py)
userpaths (optional; used by extractor-gui.py if available)

More information

Webarchive is the default format for the "Save As" command in Apple's Safari browser. (Other Apple software also uses it internally for various purposes.) Its main advantage is that it can save all the content on a webpage -- including external media like images, scripts, and style sheets -- in a single file. However, the webarchive format is proprietary and not publicly documented, and most other browsers cannot open webarchive files. pywebarchive solves this by converting webarchive files to standard HTML pages, which can be opened in any browser or editor.

The name "pywebarchive" simply reflects that this is webarchive-handling software written in the Python programming language.

pywebarchive follows the Unix philosophy of "do one thing and do it well". With that in mind, pywebarchive deliberately omits all features unrelated to its purpose of converting webarchive files so other browsers can open them. In particular, pywebarchive does not support writing webarchive files, and there are no plans to add this in a future release.

pywebarchive's internals are fairly well-documented. The code includes extensive comments explaining how it works and why it does various things the way it does. In addition, pywebarchive features dozens of unit tests to ensure the code actually does what we think it does, which is further confirmed by manual testing before each release.

References

Apple Developer Documentation:
- WebArchive class
- WebResource class

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.5.2

Sep 24, 2023

0.5.1

Oct 8, 2022

0.5.0

Apr 16, 2022

0.4.1

Mar 26, 2022

0.4.0

Mar 26, 2022

0.3.3

Nov 6, 2021

0.3.2

Sep 26, 2021

0.3.1

Sep 25, 2021

0.3.0

Jul 18, 2021

0.2.4

Feb 22, 2020

0.2.3

Sep 2, 2019

0.2.2

Oct 21, 2018

0.2.1

Oct 20, 2018

0.1.1

Oct 20, 2018

0.1.0

Oct 16, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pywebarchive-0.5.2.tar.gz (19.5 kB view details)

Uploaded Sep 24, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pywebarchive-0.5.2-py3-none-any.whl (22.7 kB view details)

Uploaded Sep 24, 2023 Python 3

File details

Details for the file pywebarchive-0.5.2.tar.gz.

File metadata

Download URL: pywebarchive-0.5.2.tar.gz
Upload date: Sep 24, 2023
Size: 19.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.8

File hashes

Hashes for pywebarchive-0.5.2.tar.gz
Algorithm	Hash digest
SHA256	`07fa4a0ce82fdd7b03c9378fe76edb6dc08b5e1898ca9105b7ff063246d60e07`
MD5	`1a0cf8ef270a8c8df524265a73383689`
BLAKE2b-256	`da27653b07c966fa9de1cba2325106b8b0b70885f107b5725b5c09ded06d8ffb`

See more details on using hashes here.

File details

Details for the file pywebarchive-0.5.2-py3-none-any.whl.

File metadata

Download URL: pywebarchive-0.5.2-py3-none-any.whl
Upload date: Sep 24, 2023
Size: 22.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.4 CPython/3.6.8

File hashes

Hashes for pywebarchive-0.5.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fec8cd9f5d74fd2b67e5daa62778744484ae62ba397d6a2b4e7a2a1e4dc16eba`
MD5	`8e3961c867d4eefa62bf1ab748e485e6`
BLAKE2b-256	`ef61e3f79dd3274c34e15107dfece693d4a21a5cdae20f68b37085fb044fc05a`

See more details on using hashes here.

pywebarchive 0.5.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Features

Downloads

Requirements

More information

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes