Skip to main content

A command line utility for scraping wayback snapshots from archive.org.

Project description

wayback_scraper

Scraping urls from archive.org (Wayback Machine Snapshots)

Command-Line Interface

The usage information can be printed by running wayback-scraper -h.

usage: wayback-scraper [-h] -u URL [-l LIMIT] [-f Start TIMESTAMP]
                   [-t End TIMESTAMP] [-p DIRECTORY] [-n FILENAME] -o
                   Output type

optional arguments:
  -h, --help            show this help message and exit
  -u URL, --url URL     url for the wayback scraper to scrape. [eg:
                        http://www.example.com
  -l LIMIT, --limit LIMIT
                        limit the number of results to scrape
  -f Start TIMESTAMP, --from Start TIMESTAMP
                        start date, format - YYYYmmddHHMMSS. Example : -f 2011
  -t End TIMESTAMP, --to End TIMESTAMP
                        end date, format - YYYYmmddHHMMSS. Example : -t 201205
  -p DIRECTORY, --path DIRECTORY
                        Output file path
  -n FILENAME, --name FILENAME
                        filename for the output data
  -o Output type, --output Output type
                        output type

Examples

The usage can be perhaps be made more clear with a couple of concrete examples.

    wayback-scraper -u https://www.archive.org -l 2 -f 2011 -t 2012 -n output-o json

which produces:

    {
        'archived_urls':
            [
                'url':'http://www.archive.org:80/',
                'date':'1997-01-26',
                'timestamp':'19970126045828'
            ],
            [
                'url':'http://www.archive.org:80/',
                'date':'1997-10-01',
                'timestamp':'19971011050034'
            ]
    }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wayback_scraper-1.2.tar.gz (2.9 kB view details)

Uploaded Source

Built Distribution

wayback_scraper-1.2-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file wayback_scraper-1.2.tar.gz.

File metadata

  • Download URL: wayback_scraper-1.2.tar.gz
  • Upload date:
  • Size: 2.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for wayback_scraper-1.2.tar.gz
Algorithm Hash digest
SHA256 bcb697d9153c6357951be28840978d460c76aec80ebd521438476384b4f191e6
MD5 e79a6294d826b06d5fa0882594b181d6
BLAKE2b-256 3e35b12f7886b064cc09ce522c239fe34ec08923f423e622658a28b5613cd4f2

See more details on using hashes here.

File details

Details for the file wayback_scraper-1.2-py3-none-any.whl.

File metadata

  • Download URL: wayback_scraper-1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for wayback_scraper-1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 09810552d864f52a9d13a0220575e25633b861d90bb22cc65b1a8325e1249ec9
MD5 0c5a91e2775eb53b98af268446015b35
BLAKE2b-256 d6ec51ee1d29032711d58bdafe4b5f2fa38273f99db0a6be0e5063f989f4aa48

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page