Download all the documents linked from a web page.
Project description
downlink
A python library and command line tool for scraping (and downloading) links on a web page.
library
- linkscraper.py
LinkScraper - class for scraping links from a page
- document_linkscraper.py
- DocumentLinkScraper - subclass of LinkScraper
class for scraping “document links,” which all end in a given file extension, such as “.pdf”
- __init__.py
imports library classes for cleaner importing
- __main__.py
main() - entrypoint for command line tool
command line tool
Basic usage:
$ downlink “https://www.ct.gov/doh/cwp/view.asp?a=4513&q=530462” output
The above will download all PDF documents to a folder called “output” which must exist and be writable.
To download files of a different extension, use the –ext option.
For more usage details, run downlink –help
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file downlink-0.0.7.tar.gz
.
File metadata
- Download URL: downlink-0.0.7.tar.gz
- Upload date:
- Size: 3.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32fca787df16c525f266e93923093de4f801c67e4b6d1fee4a899e426e0c1000 |
|
MD5 | f16ddcb60c12df44bda26fc5aaf8857b |
|
BLAKE2b-256 | 315368ddc2b2964e30c183d0ce872a3dc4c14bf14055903a57d63f02e33a697e |