Download PDF links from a webpage
Project description
pdf_hunter
Search for and download PDF file links from a webpage.
Installation
This has been tested using Python 3 and Python 2.7.
pip install pdf_hunter
Usage
import pdf_hunter
url = "https://github.com/EbookFoundation/free-programming-books/blob/master/free-programming-books.md"
pdf_urls = pdf_hunter.get_pdf_urls(url)
pdf_urls[:10]
['https://people.gnome.org/~swilmet/glib-gtk-dev-platform.pdf', 'https://www.math.upenn.edu/~wilf/AlgoComp.pdf', 'http://cslibrary.stanford.edu/110/BinaryTrees.pdf', 'http://www-inst.eecs.berkeley.edu/~cs61b/fa14/book2/data-structures.pdf', 'http://lib.mdp.ac.id/ebook/Karya%20Umum/Dsa.pdf', 'http://cslibrary.stanford.edu/103/LinkedListBasics.pdf', 'http://cslibrary.stanford.edu/105/LinkedListProblems.pdf', 'http://www.jjj.de/fxt/fxtbook.pdf', 'http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf', 'http://igm.univ-mlv.fr/~mac/REC/text-algorithms.pdf']
We can download a single PDF file from a given url
pdf_url = pdf_urls[0]
pdf_url
'https://people.gnome.org/~swilmet/glib-gtk-dev-platform.pdf'
file_name = pdf_hunter.get_pdf_name(pdf_url)
file_name
'glib-gtk-dev-platform.pdf'
import os
os.path.isfile(file_name)
False
pdf_hunter.download_file(pdf_url, folder_path=os.getcwd())
os.path.isfile(file_name)
True
Or download all PDF files from the page
pdf_hunter.download_pdf_files(url, folder_path=os.getcwd())
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for pdf_hunter-0.1.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7966142d321ba3ed6671fd1a65813d33c006835771547b8b41628f25ed1caf8a |
|
MD5 | 4f581be54f2f3688944b03c27fa18816 |
|
BLAKE2b-256 | ad16ef1f7639c8724aa26b3f4f7f2a1d0ba04ab2b70ce2268c3448f0879dcdd9 |
Hashes for pdf_hunter-0.1.6-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db53796b998b7231cc835c76aa2f3add334374e0b67ce1aead0d001b43fbdf6f |
|
MD5 | 31b1a57bdef3957f90e092360f338b4e |
|
BLAKE2b-256 | e81388b1338b781f1a72e399f8da0a9cbd6aed34129366a7cd2d341c75cc1cb1 |