Download all the documents linked from a web page.
Project description
downlink
A python library and command line tool for scraping (and downloading) links on a web page.
library
- linkscraper.py
LinkScraper - class for scraping links from a page
- document_linkscraper.py
- DocumentLinkScraper - subclass of LinkScraper
class for scraping “document links,” which all end in a given file extension, such as “.pdf”
- __init__.py
imports library classes for cleaner importing
- __main__.py
main() - entrypoint for command line tool
command line tool
Basic usage:
$ downlink “https://www.ct.gov/doh/cwp/view.asp?a=4513&q=530462” output
The above will download all PDF documents to a folder called “output” which must exist and be writable.
To download files of a different extension, use the –ext option.
For more usage details, run downlink –help
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.