Download PDF links from a webpage
Project description
pdf_hunter
Search for and download PDF file links from a webpage.
Installation
This has been tested using Python 3 and Python 2.7.
pip install pdf_hunter
Usage
import pdf_hunter
url = "https://github.com/EbookFoundation/free-programming-books/blob/master/free-programming-books.md"
pdf_urls = pdf_hunter.get_pdf_urls(url)
pdf_urls[:10]
['https://people.gnome.org/~swilmet/glib-gtk-dev-platform.pdf', 'https://www.math.upenn.edu/~wilf/AlgoComp.pdf', 'http://cslibrary.stanford.edu/110/BinaryTrees.pdf', 'http://www-inst.eecs.berkeley.edu/~cs61b/fa14/book2/data-structures.pdf', 'http://lib.mdp.ac.id/ebook/Karya%20Umum/Dsa.pdf', 'http://cslibrary.stanford.edu/103/LinkedListBasics.pdf', 'http://cslibrary.stanford.edu/105/LinkedListProblems.pdf', 'http://www.jjj.de/fxt/fxtbook.pdf', 'http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf', 'http://igm.univ-mlv.fr/~mac/REC/text-algorithms.pdf']
We can download a single PDF file from a given url
pdf_url = pdf_urls[0]
pdf_url
'https://people.gnome.org/~swilmet/glib-gtk-dev-platform.pdf'
file_name = pdf_hunter.get_pdf_name(pdf_url)
file_name
'glib-gtk-dev-platform.pdf'
import os
os.path.isfile(file_name)
False
pdf_hunter.download_file(pdf_url, folder_path=os.getcwd())
os.path.isfile(file_name)
True
Or download all PDF files from the page
pdf_hunter.download_pdf_files(url, folder_path=os.getcwd())
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for pdf_hunter-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ef6f04238f6ad333f18cf01982a26a5409b961d32539bd7577b54d3037c7cfd |
|
MD5 | ad7d602fd6ec30ddede4d390a1efaf16 |
|
BLAKE2b-256 | c906b34c593c6f7319ef7b7b5e4bed8e00c77c7c7d1158023e7a05516d208e88 |
Hashes for pdf_hunter-0.1.3-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6799053a320a622c5c075cf96f8bafe8429ddb30dc2112ec2239676533f88e24 |
|
MD5 | a51f26601d0045469a2f6c202a08e576 |
|
BLAKE2b-256 | a70b98e85aacdd15886e1f821e2e2986130df4a32e13a0e6bc321a94185dcc07 |