Download all the documents linked from a web page.
A python library and command line tool for scraping (and downloading) links on a web page.
- LinkScraper - class for scraping links from a page
- DocumentLinkScraper - subclass of LinkScraper
- class for scraping “document links,” which all end in a given file extension, such as “.pdf”
- imports library classes for cleaner importing
- main() - entrypoint for command line tool
command line tool
$ downlink “https://www.ct.gov/doh/cwp/view.asp?a=4513&q=530462” output
The above will download all PDF documents to a folder called “output” which must exist and be writable.
To download files of a different extension, use the –ext option.
For more usage details, run downlink –help