Skip to main content

A command line utility for scraping wayback snapshots from archive.org.

Project description

wayback_scraper

Scraping urls from archive.org (Wayback Machine Snapshots)

#Command-Line Interface The usage information can be printed by running wayback-scraper -h.

usage: wayback-scraper [-h] -u URL [-l LIMIT] [-f Start TIMESTAMP]
                   [-t End TIMESTAMP] [-p DIRECTORY] [-n FILENAME] -o
                   Output type

optional arguments:
  -h, --help            show this help message and exit
  -u URL, --url URL     url for the wayback scraper to scrape. [eg:
                        http://www.example.com
  -l LIMIT, --limit LIMIT
                        limit the number of results to scrape
  -f Start TIMESTAMP, --from Start TIMESTAMP
                        start date, format - YYYYmmddHHMMSS. Example : -f 2011
  -t End TIMESTAMP, --to End TIMESTAMP
                        end date, format - YYYYmmddHHMMSS. Example : -t 201205
  -p DIRECTORY, --path DIRECTORY
                        Output file path
  -n FILENAME, --name FILENAME
                        filename for the output data
  -o Output type, --output Output type
                        output type

Examples

The usage can be perhaps be made more clear with a couple of concrete examples.

    wayback-scraper -u https://www.archive.org -l 2 -f 2011 -t 2012 -n output-o json

which produces:

    {
        'archived_urls':
            [
                'url':'http://www.archive.org:80/',
                'date':'1997-01-26',
                'timestamp':'19970126045828'
            ],
            [
                'url':'http://www.archive.org:80/',
                'date':'1997-10-01',
                'timestamp':'19971011050034'
            ]
    }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wayback_scraper-1.1.tar.gz (2.9 kB view details)

Uploaded Source

Built Distribution

wayback_scraper-1.1-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file wayback_scraper-1.1.tar.gz.

File metadata

  • Download URL: wayback_scraper-1.1.tar.gz
  • Upload date:
  • Size: 2.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for wayback_scraper-1.1.tar.gz
Algorithm Hash digest
SHA256 6e50a621f0e053cc721532b3a9df28c928f36495437aee91f3ae4a55d54c17fb
MD5 e0239ad81672d3bb76b9c6826784af65
BLAKE2b-256 ebe97e04b8cd3d6f74433cf83dac711e0d354ae5d1d3faa65e7f4715f1437589

See more details on using hashes here.

File details

Details for the file wayback_scraper-1.1-py3-none-any.whl.

File metadata

  • Download URL: wayback_scraper-1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for wayback_scraper-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e40fb68d1ce696a133df490566bb57e45bf042655e301cf0a71f1a9b843b83b8
MD5 81f65d8a66a96b12cf082c395d4deda3
BLAKE2b-256 707654601c204562173b35a53e529c0e6c7fa0f3f6769d70e2a4da471f620f12

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page