Skip to main content

A Python package that interfaces with the Internet Archive's Wayback Machine API. Archive pages and retrieve archived pages easily.

Project description


Python package & CLI tool that interfaces with the Wayback Machine API

pypi Build Status Codacy Badge codecov Contributions Welcome Downloads GitHub lastest commit PyPI - Python Version


Installation

Using pip:

pip install waybackpy

Install directly from GitHub:

pip install git+https://github.com/akamhy/waybackpy.git

Supported Features

  • Archive webpage
  • Retrieve all archives of a webpage/domain
  • Retrieve archive close to a date or timestamp
  • Retrieve all archives which have a particular prefix
  • Get source code of the archive easily
  • CDX API support

Usage

As a Python package

>>> import waybackpy

>>> url = "https://en.wikipedia.org/wiki/Multivariable_calculus"
>>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"

>>> wayback = waybackpy.Url(url, user_agent)

>>> archive = wayback.save()
>>> archive.archive_url
'https://web.archive.org/web/20210104173410/https://en.wikipedia.org/wiki/Multivariable_calculus'

>>> archive.timestamp
datetime.datetime(2021, 1, 4, 17, 35, 12, 691741)

>>> oldest_archive = wayback.oldest()
>>> oldest_archive.archive_url
'https://web.archive.org/web/20050422130129/http://en.wikipedia.org:80/wiki/Multivariable_calculus'

>>> archive_close_to_2010_feb = wayback.near(year=2010, month=2)
>>> archive_close_to_2010_feb.archive_url
'https://web.archive.org/web/20100215001541/http://en.wikipedia.org:80/wiki/Multivariable_calculus'

>>> wayback.newest().archive_url
'https://web.archive.org/web/20210104173410/https://en.wikipedia.org/wiki/Multivariable_calculus'

Full Python package documentation can be found at https://github.com/akamhy/waybackpy/wiki/Python-package-docs.

As a CLI tool

$ waybackpy --save --url "https://en.wikipedia.org/wiki/Social_media" --user_agent "my-unique-user-agent"
https://web.archive.org/web/20200719062108/https://en.wikipedia.org/wiki/Social_media

$ waybackpy --oldest --url "https://en.wikipedia.org/wiki/Humanoid" --user_agent "my-unique-user-agent"
https://web.archive.org/web/20040415020811/http://en.wikipedia.org:80/wiki/Humanoid

$ waybackpy --newest --url "https://en.wikipedia.org/wiki/Remote_sensing" --user_agent "my-unique-user-agent"
https://web.archive.org/web/20201221130522/https://en.wikipedia.org/wiki/Remote_sensing

$ waybackpy --total --url "https://en.wikipedia.org/wiki/Linux_kernel" --user_agent "my-unique-user-agent"
1904

$ waybackpy --known_urls --url akamhy.github.io --user_agent "my-unique-user-agent" --file
https://akamhy.github.io
https://akamhy.github.io/assets/js/scale.fix.js
https://akamhy.github.io/favicon.ico
https://akamhy.github.io/robots.txt
https://akamhy.github.io/waybackpy/

'akamhy.github.io-urls-iftor2.txt' saved in current working directory

Full CLI documentation can be found at https://github.com/akamhy/waybackpy/wiki/CLI-docs.

License

License: MIT

Released under the MIT License. See license for details.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

waybackpy-2.4.4.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

waybackpy-2.4.4-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file waybackpy-2.4.4.tar.gz.

File metadata

  • Download URL: waybackpy-2.4.4.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for waybackpy-2.4.4.tar.gz
Algorithm Hash digest
SHA256 17c224d71ab874a91b40535e861ef6a5b0934a5d5197771f1765c3c4303df415
MD5 3fac80bb5187dfa4c7a26a4ccad2f396
BLAKE2b-256 37d673f82208fabf12514dde927b1b6203d4768f9a012605904057009c8a5cb3

See more details on using hashes here.

File details

Details for the file waybackpy-2.4.4-py3-none-any.whl.

File metadata

  • Download URL: waybackpy-2.4.4-py3-none-any.whl
  • Upload date:
  • Size: 21.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for waybackpy-2.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5739074b80aed4f025e1abd8469ebb3d9b6da63f1582f07104677bf805eb62cf
MD5 54088ac795e153b1fa4e7d05bb93390c
BLAKE2b-256 2d933cecec560f4f067c89983daac2f602dd852d3f5b8d39ea36e8b9682785bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page