Download all the documents linked from a web page.
A python library and command line tool for scraping (and downloading) links on a web page.
- LinkScraper - class for scraping links from a page
- DocumentLinkScraper - subclass of LinkScraper
- class for scraping “document links,” which all end in a given file extension, such as “.pdf”
- imports library classes for cleaner importing
- main() - entrypoint for command line tool
command line tool
$ downlink “https://www.ct.gov/doh/cwp/view.asp?a=4513&q=530462” output
The above will download all PDF documents to a folder called “output” which must exist and be writable.
To download files of a different extension, use the –ext option.
For more usage details, run downlink –help
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.