A Python package that interfaces with the Internet Archive's Wayback Machine API. Archive pages and retrieve archived pages easily.
Project description
Python package & CLI tool that interfaces with the Wayback Machine API
Installation
Using pip:
pip install waybackpy
Install directly from GitHub:
pip install git+https://github.com/akamhy/waybackpy.git
Supported Features
- Archive webpage
- Retrieve all archives of a webpage/domain
- Retrieve archive close to a date or timestamp
- Retrieve all archives which have a particular prefix
- Get source code of the archive easily
- CDX API support
Usage
As a Python package
>>> import waybackpy
>>> url = "https://en.wikipedia.org/wiki/Multivariable_calculus"
>>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
>>> wayback = waybackpy.Url(url, user_agent)
>>> archive = wayback.save()
>>> archive.archive_url
'https://web.archive.org/web/20210104173410/https://en.wikipedia.org/wiki/Multivariable_calculus'
>>> archive.timestamp
datetime.datetime(2021, 1, 4, 17, 35, 12, 691741)
>>> oldest_archive = wayback.oldest()
>>> oldest_archive.archive_url
'https://web.archive.org/web/20050422130129/http://en.wikipedia.org:80/wiki/Multivariable_calculus'
>>> archive_close_to_2010_feb = wayback.near(year=2010, month=2)
>>> archive_close_to_2010_feb.archive_url
'https://web.archive.org/web/20100215001541/http://en.wikipedia.org:80/wiki/Multivariable_calculus'
>>> wayback.newest().archive_url
'https://web.archive.org/web/20210104173410/https://en.wikipedia.org/wiki/Multivariable_calculus'
Full Python package documentation can be found at https://github.com/akamhy/waybackpy/wiki/Python-package-docs.
As a CLI tool
$ waybackpy --save --url "https://en.wikipedia.org/wiki/Social_media" --user_agent "my-unique-user-agent"
https://web.archive.org/web/20200719062108/https://en.wikipedia.org/wiki/Social_media
$ waybackpy --oldest --url "https://en.wikipedia.org/wiki/Humanoid" --user_agent "my-unique-user-agent"
https://web.archive.org/web/20040415020811/http://en.wikipedia.org:80/wiki/Humanoid
$ waybackpy --newest --url "https://en.wikipedia.org/wiki/Remote_sensing" --user_agent "my-unique-user-agent"
https://web.archive.org/web/20201221130522/https://en.wikipedia.org/wiki/Remote_sensing
$ waybackpy --total --url "https://en.wikipedia.org/wiki/Linux_kernel" --user_agent "my-unique-user-agent"
1904
$ waybackpy --known_urls --url akamhy.github.io --user_agent "my-unique-user-agent" --file
https://akamhy.github.io
https://akamhy.github.io/assets/js/scale.fix.js
https://akamhy.github.io/favicon.ico
https://akamhy.github.io/robots.txt
https://akamhy.github.io/waybackpy/
'akamhy.github.io-urls-iftor2.txt' saved in current working directory
Full CLI documentation can be found at https://github.com/akamhy/waybackpy/wiki/CLI-docs.
License
Released under the MIT License. See license for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file waybackpy-2.4.4.tar.gz.
File metadata
- Download URL: waybackpy-2.4.4.tar.gz
- Upload date:
- Size: 20.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17c224d71ab874a91b40535e861ef6a5b0934a5d5197771f1765c3c4303df415
|
|
| MD5 |
3fac80bb5187dfa4c7a26a4ccad2f396
|
|
| BLAKE2b-256 |
37d673f82208fabf12514dde927b1b6203d4768f9a012605904057009c8a5cb3
|
File details
Details for the file waybackpy-2.4.4-py3-none-any.whl.
File metadata
- Download URL: waybackpy-2.4.4-py3-none-any.whl
- Upload date:
- Size: 21.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5739074b80aed4f025e1abd8469ebb3d9b6da63f1582f07104677bf805eb62cf
|
|
| MD5 |
54088ac795e153b1fa4e7d05bb93390c
|
|
| BLAKE2b-256 |
2d933cecec560f4f067c89983daac2f602dd852d3f5b8d39ea36e8b9682785bf
|