Download an entire website from the Internet Archive Wayback Machine.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

siarhei-m

These details have not been verified by PyPI

Project description

Wayback Machine Downloader

Download an entire website from the Internet Archive Wayback Machine.

Installation

You need Python >= 3.11 installed on your system.

From PyPI

pip install wayback-dl

From source (global CLI)

To install from a local clone so wayback-dl is available system-wide:

uv tool install /path/to/wayback-dl

Or with pip:

pip install /path/to/wayback-dl

For development

uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"

Basic Usage

Run wayback-dl with the base url of the website you want to retrieve as a parameter (e.g., http://example.com):

wayback-dl http://example.com

How it works

It will download the last version of every file present on Wayback Machine to ./websites/example.com/. It will also re-create a directory structure and auto-create index.html pages to work seamlessly with Apache and Nginx. All files downloaded are the original ones and not Wayback Machine rewritten versions. This way, URLs and links structure are the same as before.

The tool does not parse HTML or crawl pages. Instead, it queries the Wayback Machine CDX API to get a complete index of all archived files (HTML, JS, CSS, images, fonts, etc.) under a domain.

Advanced Usage

Usage: wayback-dl [OPTIONS] [URL]

Download an entire website from the Wayback Machine.

Options:
    -d, --directory PATH             Directory to save the downloaded files into
				     Default is ./websites/ plus the domain name
    -A, --all-timestamps             Download all snapshots/timestamps for a given website
    -f, --from DATE                  Only files on or after date (e.g., 2006-07-16 or 20060716231334)
    -t, --to DATE                    Only files on or before date (e.g., 2010-09-16 or 20100916231334)
    -e, --exact-url                  Download only the url provided and not the full site
    -o, --only ONLY_FILTER           Restrict downloading to urls that match this filter
				     (use // notation for the filter to be treated as a regex)
    -x, --exclude EXCLUDE_FILTER     Skip downloading of urls that match this filter
				     (use // notation for the filter to be treated as a regex)
    -a, --all                        Expand downloading to error files (40x and 50x) and redirections (30x)
    -c, --concurrency NUMBER         Number of concurrent downloads (default: 1)
    -p, --max-pages NUMBER           Maximum snapshot pages to consider (default: 100)
    -l, --list                       Only list file urls as JSON, don't download
    -s, --session ID                 Resume a previous download session by ID
    -S, --list-sessions              List all active/interrupted download sessions
    -r, --redo                       Force re-download all files, ignoring previous progress
    -v, --verbose                    Enable verbose/debug logging
    -V, --version                    Show version and exit
    -h, --help                       Show help and exit

Specify directory to save files to

-d, --directory PATH

Optional. By default, wayback-dl will download files to ./websites/ followed by the domain name of the website. You may want to save files in a specific directory using this option.

Example:

wayback-dl http://example.com --directory downloaded-backup/

All Timestamps

-A, --all-timestamps

Optional. This option will download all timestamps/snapshots for a given website. It will use the timestamp of each snapshot as directory.

Example:

wayback-dl http://example.com --all-timestamps

Will download:
	websites/example.com/20060715085250/index.html
	websites/example.com/20051120005053/index.html
	websites/example.com/20060111095815/img/logo.png
	...

From Date

-f, --from DATE

Optional. Only download files archived on or after the specified date. Accepts ISO 8601 format (2006-07-16, 2006-07-16T23:13:34) or raw Wayback Machine timestamps (20060716231334, 2006, 200607). Can be combined with --to.

Examples:

wayback-dl http://example.com --from 2006-07-16
wayback-dl http://example.com --from 2006-07-16T23:13:34
wayback-dl http://example.com --from 20060716231334

To Date

-t, --to DATE

Optional. Only download files archived on or before the specified date. Same format as --from. Can be combined with --from.

Examples:

wayback-dl http://example.com --to 2010-09-16
wayback-dl http://example.com --from 2006-01-01 --to 2010-12-31

Exact Url

-e, --exact-url

Optional. If you want to retrieve only the file matching exactly the url provided, you can use this flag. It will avoid downloading anything else.

For example, if you only want to download only the html homepage file of example.com:

wayback-dl http://example.com --exact-url

Only URL Filter

 -o, --only ONLY_FILTER

Optional. You may want to retrieve files which are of a certain type (e.g., .pdf, .jpg, .wrd...) or are in a specific directory. To do so, you can supply the --only flag with a string or a regex (using the '/regex/' notation) to limit which files wayback-dl will download.

For example, if you only want to download files inside a specific my_directory:

wayback-dl http://example.com --only my_directory

Or if you want to download every images without anything else:

wayback-dl http://example.com --only "/\.(gif|jpg|jpeg)$/i"

Exclude URL Filter

 -x, --exclude EXCLUDE_FILTER

Optional. You may want to retrieve files which aren't of a certain type (e.g., .pdf, .jpg, .wrd...) or aren't in a specific directory. To do so, you can supply the --exclude flag with a string or a regex (using the '/regex/' notation) to limit which files wayback-dl will download.

For example, if you want to avoid downloading files inside my_directory:

wayback-dl http://example.com --exclude my_directory

Or if you want to download everything except images:

wayback-dl http://example.com --exclude "/\.(gif|jpg|jpeg)$/i"

Expand downloading to all file types

 -a, --all

Optional. By default, wayback-dl limits itself to files that responded with 200 OK code. If you also need errors files (40x and 50x codes) or redirections files (30x codes), you can use the --all or -a flag and wayback-dl will download them in addition of the 200 OK files. It will also keep empty files that are removed by default.

Example:

wayback-dl http://example.com --all

Only list files without downloading

 -l, --list

It will just display the files to be downloaded with their snapshot timestamps, urls, mimetypes, and sizes. The output format is JSON. It won't download anything. It's useful for debugging or to connect to another application.

Example:

wayback-dl http://example.com --list

Maximum number of snapshot pages to consider

-p, --max-pages NUMBER

Optional. Specify the maximum number of snapshot pages to consider. Count an average of 150,000 snapshots per page. 100 is the default maximum number of snapshot pages and should be sufficient for most websites. Use a bigger number if you want to download a very large website.

Example:

wayback-dl http://example.com --max-pages 300

Download multiple files at a time

-c, --concurrency NUMBER

Optional. Specify the number of multiple files you want to download at the same time. Allows one to speed up the download of a website significantly. Default is to download one file at a time. Uses async I/O for efficient concurrent downloads. The progress display shows all active downloads.

Example:

wayback-dl http://example.com --concurrency 20

Resume interrupted downloads

Downloads are automatically resumable. Each download creates a session that tracks per-file completion status. If interrupted (Ctrl+C or crash), the tool prints a resume command:

Aborted. 45/168 files downloaded.

To resume:
  wayback-dl -s 1709571234

To list all active/interrupted sessions:

wayback-dl --list-sessions

Sessions are stored in ~/.wayback_dl/sessions/ and are automatically cleaned up after successful completion. Multiple downloads of different domains can run in parallel without conflicts.

Force re-download

-r, --redo

Force re-download all files, ignoring any previous session progress. Useful when you want a fresh copy.

wayback-dl http://example.com --redo

Verbose output

-v, --verbose

Show detailed output including CDX API requests, response timing, per-file download status, and file type breakdown.

wayback-dl http://example.com --verbose

Using the Docker image

As an alternative installation way, build the Docker image:

docker build -t wayback-dl .

Then, you should be able to use the Docker image to download websites. For example:

docker run --rm -it -v $PWD/websites:/websites wayback-dl http://example.com

Contributing

Contributions are welcome! Just submit a pull request via GitHub.

To run the tests:

uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"
pytest tests/ -v

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

siarhei-m

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

Mar 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wayback_dl-1.0.0.tar.gz (29.9 kB view details)

Uploaded Mar 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wayback_dl-1.0.0-py3-none-any.whl (28.6 kB view details)

Uploaded Mar 5, 2026 Python 3

File details

Details for the file wayback_dl-1.0.0.tar.gz.

File metadata

Download URL: wayback_dl-1.0.0.tar.gz
Upload date: Mar 5, 2026
Size: 29.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for wayback_dl-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`5e3d05d49f611390e61056e98c3df4162b6e66f5596d75e0b5c5f45a8fb68dbc`
MD5	`852d0c4640163b0f06fe63e99f455db8`
BLAKE2b-256	`2a63e5ab0b1eb8e0f9dffa1c791930481bcc8b015b4fc2f44f48d9fe408bcd69`

See more details on using hashes here.

Provenance

The following attestation bundles were made for wayback_dl-1.0.0.tar.gz:

Publisher: publish.yml on siarhei-m/wayback-dl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: wayback_dl-1.0.0.tar.gz
- Subject digest: 5e3d05d49f611390e61056e98c3df4162b6e66f5596d75e0b5c5f45a8fb68dbc
- Sigstore transparency entry: 1040913350
- Sigstore integration time: Mar 5, 2026
Source repository:
- Permalink: siarhei-m/wayback-dl@943b3e690ee66e65f44079f8357659b937a7409e
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/siarhei-m
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@943b3e690ee66e65f44079f8357659b937a7409e
- Trigger Event: release

File details

Details for the file wayback_dl-1.0.0-py3-none-any.whl.

File metadata

Download URL: wayback_dl-1.0.0-py3-none-any.whl
Upload date: Mar 5, 2026
Size: 28.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for wayback_dl-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`532817356303849606cb3a340b73e001a824cbeb132dd2e994acd9f6e5df8148`
MD5	`5f05827bdf4537aec7b8fc3b6c9ad95d`
BLAKE2b-256	`285aff77bc5eb4ce1bc3c11814a6a825ce2d91f2646b320f59e98384e8e9064c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for wayback_dl-1.0.0-py3-none-any.whl:

Publisher: publish.yml on siarhei-m/wayback-dl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: wayback_dl-1.0.0-py3-none-any.whl
- Subject digest: 532817356303849606cb3a340b73e001a824cbeb132dd2e994acd9f6e5df8148
- Sigstore transparency entry: 1040913387
- Sigstore integration time: Mar 5, 2026
Source repository:
- Permalink: siarhei-m/wayback-dl@943b3e690ee66e65f44079f8357659b937a7409e
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/siarhei-m
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@943b3e690ee66e65f44079f8357659b937a7409e
- Trigger Event: release

wayback-dl 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Wayback Machine Downloader

Installation

From PyPI

From source (global CLI)

For development

Basic Usage

How it works

Advanced Usage

Specify directory to save files to

All Timestamps

From Date

To Date

Exact Url

Only URL Filter

Exclude URL Filter

Expand downloading to all file types

Only list files without downloading

Maximum number of snapshot pages to consider

Download multiple files at a time

Resume interrupted downloads

Force re-download

Verbose output

Using the Docker image

Contributing

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance