Skip to main content

A powerful CLI tool to download and archive historical versions of websites from the Wayback Machine.

Project description

Wayback Downloader

Wayback Downloader is a powerful and user-friendly command-line tool designed to retrieve and archive historical versions of websites from the Internet Archive's Wayback Machine. This Python-based utility empowers users to effortlessly capture and preserve web content across time, making it an invaluable resource for researchers, developers, and digital archivists.

Key Features:

  1. Efficient Retrieval: Quickly download multiple snapshots of a website within a specified date range.
  2. Selective Archiving: Save only unique content, avoiding duplicate snapshots to conserve storage space.
  3. Recursive Crawling: Automatically discover and download linked pages within the same domain.
  4. Flexible Date Range: Specify custom start and end dates for targeted historical content retrieval.
  5. Robust Error Handling: Implements retry mechanisms and comprehensive error reporting for reliable operation.
  6. User-Friendly CLI: Simple command-line interface for easy integration into workflows and scripts.
  7. Customizable Output: Option to specify the output directory for downloaded archives.
  8. Verbose Logging: Detailed progress and diagnostic information available with verbose mode.

Wayback Downloader simplifies the process of accessing and preserving web history, making it easier than ever to study website evolution, recover lost content, or create comprehensive web archives. Whether you're conducting academic research, performing due diligence, or simply curious about the past state of the web, Wayback Downloader provides a streamlined solution for accessing the vast archives of the Wayback Machine.

Get started with Wayback Downloader today and unlock the power of web history at your fingertips!

Installation

You can install Wayback Downloader using pip:

pip install wayback_downloader

Usage

After installation, you can use Wayback Downloader from the command line:

wayback-downloader [URL] [START_DATE] [END_DATE] [-o OUTPUT_DIR] [-v]

For example:

wayback-downloader http://example.com 20200101 20230101 -o /path/to/output -v

This will download archives for example.com from January 1, 2020, to January 1, 2023, save them to the specified output directory, and provide verbose output.

For more information on available options:

wayback-downloader --help

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wayback_downloader-0.1.2.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

wayback_downloader-0.1.2-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file wayback_downloader-0.1.2.tar.gz.

File metadata

  • Download URL: wayback_downloader-0.1.2.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for wayback_downloader-0.1.2.tar.gz
Algorithm Hash digest
SHA256 7510c5936905b7ad60ce33c06d8d7f9bae318ba42fbe763d182fee12b26774e6
MD5 62a5cee31244539eb097054a9292d4a2
BLAKE2b-256 9d659c0ec5575a923597784c37bf26ce46c428864c0cf88f5fc0301d5360c79d

See more details on using hashes here.

File details

Details for the file wayback_downloader-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for wayback_downloader-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ea9c8bdefc97953db349b88425740c0ddba1711f8b3c0eebed3560a2edcc87a9
MD5 dc942f89829eb0befe5bf6c5b2d48087
BLAKE2b-256 dfaa39988ea82d6f1074f4b68e2355d68822d2655f14f9ca16a3048febcf7824

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page