Skip to main content

A Python package that interfaces with the Internet Archive's Wayback Machine API. Archive pages and retrieve archived pages easily.

Project description


Python package & CLI tool that interfaces with the Wayback Machine API

pypi Build Status Codacy Badge codecov Contributions Welcome Downloads GitHub lastest commit PyPI - Python Version


Installation

Using pip:

pip install waybackpy

Install directly from GitHub:

pip install git+https://github.com/akamhy/waybackpy.git

Supported Features

  • Archive webpage
  • Retrieve all archives of a webpage/domain
  • Retrieve archive close to a date or timestamp
  • Retrieve all archives which have a particular prefix
  • Get source code of the archive easily
  • CDX API support

Usage

As a Python package

>>> import waybackpy

>>> url = "https://en.wikipedia.org/wiki/Multivariable_calculus"
>>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"

>>> wayback = waybackpy.Url(url, user_agent)

>>> archive = wayback.save()
>>> archive.archive_url
'https://web.archive.org/web/20210104173410/https://en.wikipedia.org/wiki/Multivariable_calculus'

>>> archive.timestamp
datetime.datetime(2021, 1, 4, 17, 35, 12, 691741)

>>> oldest_archive = wayback.oldest()
>>> oldest_archive.archive_url
'https://web.archive.org/web/20050422130129/http://en.wikipedia.org:80/wiki/Multivariable_calculus'

>>> archive_close_to_2010_feb = wayback.near(year=2010, month=2)
>>> archive_close_to_2010_feb.archive_url
'https://web.archive.org/web/20100215001541/http://en.wikipedia.org:80/wiki/Multivariable_calculus'

>>> wayback.newest().archive_url
'https://web.archive.org/web/20210104173410/https://en.wikipedia.org/wiki/Multivariable_calculus'

Full Python package documentation can be found at https://github.com/akamhy/waybackpy/wiki/Python-package-docs.

As a CLI tool

$ waybackpy --save --url "https://en.wikipedia.org/wiki/Social_media" --user_agent "my-unique-user-agent"
https://web.archive.org/web/20200719062108/https://en.wikipedia.org/wiki/Social_media

$ waybackpy --oldest --url "https://en.wikipedia.org/wiki/Humanoid" --user_agent "my-unique-user-agent"
https://web.archive.org/web/20040415020811/http://en.wikipedia.org:80/wiki/Humanoid

$ waybackpy --newest --url "https://en.wikipedia.org/wiki/Remote_sensing" --user_agent "my-unique-user-agent"
https://web.archive.org/web/20201221130522/https://en.wikipedia.org/wiki/Remote_sensing

$ waybackpy --total --url "https://en.wikipedia.org/wiki/Linux_kernel" --user_agent "my-unique-user-agent"
1904

$ waybackpy --known_urls --url akamhy.github.io --user_agent "my-unique-user-agent"
https://akamhy.github.io
https://akamhy.github.io/assets/js/scale.fix.js
https://akamhy.github.io/favicon.ico
https://akamhy.github.io/robots.txt
https://akamhy.github.io/waybackpy/

'akamhy.github.io-10-urls-m2a24y.txt' saved in current working directory

Full CLI documentation can be found at https://github.com/akamhy/waybackpy/wiki/CLI-docs.

License

License: MIT

Released under the MIT License. See license for details.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

waybackpy-2.4.1.tar.gz (17.4 kB view hashes)

Uploaded Source

Built Distribution

waybackpy-2.4.1-py3-none-any.whl (18.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page