Skip to main content

A Python package that interfaces with the Internet Archive's Wayback Machine API. Archive pages and retrieve archived pages easily.

Project description


Python package & CLI tool that interfaces with the Wayback Machine API

pypi Build Status Codacy Badge codecov Contributions Welcome Downloads GitHub lastest commit PyPI - Python Version


Installation

Using pip:

pip install waybackpy

Install directly from GitHub:

pip install git+https://github.com/akamhy/waybackpy.git

Supported Features

  • Archive webpage
  • Retrieve all archives of a webpage/domain
  • Retrieve archive close to a date or timestamp
  • Retrieve all archives which have a particular prefix
  • Get source code of the archive easily
  • CDX API support

Usage

As a Python package

>>> import waybackpy

>>> url = "https://en.wikipedia.org/wiki/Multivariable_calculus"
>>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"

>>> wayback = waybackpy.Url(url, user_agent)

>>> archive = wayback.save()
>>> archive.archive_url
'https://web.archive.org/web/20210104173410/https://en.wikipedia.org/wiki/Multivariable_calculus'

>>> archive.timestamp
datetime.datetime(2021, 1, 4, 17, 35, 12, 691741)

>>> oldest_archive = wayback.oldest()
>>> oldest_archive.archive_url
'https://web.archive.org/web/20050422130129/http://en.wikipedia.org:80/wiki/Multivariable_calculus'

>>> archive_close_to_2010_feb = wayback.near(year=2010, month=2)
>>> archive_close_to_2010_feb.archive_url
'https://web.archive.org/web/20100215001541/http://en.wikipedia.org:80/wiki/Multivariable_calculus'

>>> wayback.newest().archive_url
'https://web.archive.org/web/20210104173410/https://en.wikipedia.org/wiki/Multivariable_calculus'

Full Python package documentation can be found at https://github.com/akamhy/waybackpy/wiki/Python-package-docs.

As a CLI tool

$ waybackpy --save --url "https://en.wikipedia.org/wiki/Social_media" --user_agent "my-unique-user-agent"
https://web.archive.org/web/20200719062108/https://en.wikipedia.org/wiki/Social_media

$ waybackpy --oldest --url "https://en.wikipedia.org/wiki/Humanoid" --user_agent "my-unique-user-agent"
https://web.archive.org/web/20040415020811/http://en.wikipedia.org:80/wiki/Humanoid

$ waybackpy --newest --url "https://en.wikipedia.org/wiki/Remote_sensing" --user_agent "my-unique-user-agent"
https://web.archive.org/web/20201221130522/https://en.wikipedia.org/wiki/Remote_sensing

$ waybackpy --total --url "https://en.wikipedia.org/wiki/Linux_kernel" --user_agent "my-unique-user-agent"
1904

$ waybackpy --known_urls --url akamhy.github.io --user_agent "my-unique-user-agent"
https://akamhy.github.io
https://akamhy.github.io/assets/js/scale.fix.js
https://akamhy.github.io/favicon.ico
https://akamhy.github.io/robots.txt
https://akamhy.github.io/waybackpy/

'akamhy.github.io-10-urls-m2a24y.txt' saved in current working directory

Full CLI documentation can be found at https://github.com/akamhy/waybackpy/wiki/CLI-docs.

License

License: MIT

Released under the MIT License. See license for details.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

waybackpy-2.4.1.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

waybackpy-2.4.1-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file waybackpy-2.4.1.tar.gz.

File metadata

  • Download URL: waybackpy-2.4.1.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for waybackpy-2.4.1.tar.gz
Algorithm Hash digest
SHA256 6d9f70c60f887851af57d1eb3d80bff74e490c36c3756d14d581926bf87c1b0b
MD5 6ae6eb06aa4c0e4eb5753aaa3b9a1246
BLAKE2b-256 086f2a661918e47d8d1fd415a5d83a2c572479537a13adf8f7a80b4e269253e4

See more details on using hashes here.

File details

Details for the file waybackpy-2.4.1-py3-none-any.whl.

File metadata

  • Download URL: waybackpy-2.4.1-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for waybackpy-2.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d263af5b52c84672737d89569d812c40e50fddeb0316deed16c464a70390ae7a
MD5 1130d1896162470d9c1262c982ff39dc
BLAKE2b-256 1956186311a9cebed8c95000091bca102dda7dab81d82cc7271011a2d185afdf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page