Download PDF links from a webpage
Project description
pdf_hunter
Search for and download PDF file links from a webpage.
Installation
This has been tested using Python 3 and Python 2.7.
pip install pdf_hunter
Usage
import pdf_hunter
url = "https://github.com/EbookFoundation/free-programming-books/blob/master/free-programming-books.md"
pdf_urls = pdf_hunter.get_pdf_urls(url)
pdf_urls[:10]
['https://people.gnome.org/~swilmet/glib-gtk-dev-platform.pdf', 'https://www.math.upenn.edu/~wilf/AlgoComp.pdf', 'http://cslibrary.stanford.edu/110/BinaryTrees.pdf', 'http://www-inst.eecs.berkeley.edu/~cs61b/fa14/book2/data-structures.pdf', 'http://lib.mdp.ac.id/ebook/Karya%20Umum/Dsa.pdf', 'http://cslibrary.stanford.edu/103/LinkedListBasics.pdf', 'http://cslibrary.stanford.edu/105/LinkedListProblems.pdf', 'http://www.jjj.de/fxt/fxtbook.pdf', 'http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf', 'http://igm.univ-mlv.fr/~mac/REC/text-algorithms.pdf']
We can download a single PDF file from a given url
pdf_url = pdf_urls[0]
pdf_url
'https://people.gnome.org/~swilmet/glib-gtk-dev-platform.pdf'
file_name = pdf_hunter.get_pdf_name(pdf_url)
file_name
'glib-gtk-dev-platform.pdf'
import os
os.path.isfile(file_name)
False
pdf_hunter.download_file(pdf_url, folder_path=os.getcwd())
os.path.isfile(file_name)
True
Or download all PDF files from the page
pdf_hunter.download_pdf_files(url, folder_path=os.getcwd())
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for pdf_hunter-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 390c846c9b379642628b759bf4b41ce2f20d8c8dacb85e7f7b86f331a20c5b7f |
|
MD5 | d2f2ecd1071e8eb4d59ed0021eced773 |
|
BLAKE2b-256 | 3bf24b18d70667750e613b9868d98596ff667839190cf9d7b4ce1f9737aeae00 |
Hashes for pdf_hunter-0.1.5-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc264cb868e709c2b89ae666ff71797507efc998d7eac4f1f224f42e70b11ee8 |
|
MD5 | d26d9101dfde3fe5abbe674ef2159354 |
|
BLAKE2b-256 | 17adce835db2f2eff2ab3ec280751967de6a9db787a2f771bf09af649dfe4d6f |