Skip to main content

Download PDF links from a webpage

Project description

pdf_hunter

Search for and download PDF file links from a webpage.

Installation

This has been tested using Python 3 and Python 2.7.

pip install pdf_hunter

Usage

import pdf_hunter

url = "https://github.com/EbookFoundation/free-programming-books/blob/master/free-programming-books.md"
pdf_urls = pdf_hunter.get_pdf_urls(url)
pdf_urls[:10]

['https://people.gnome.org/~swilmet/glib-gtk-dev-platform.pdf', 'https://www.math.upenn.edu/~wilf/AlgoComp.pdf', 'http://cslibrary.stanford.edu/110/BinaryTrees.pdf', 'http://www-inst.eecs.berkeley.edu/~cs61b/fa14/book2/data-structures.pdf', 'http://lib.mdp.ac.id/ebook/Karya%20Umum/Dsa.pdf', 'http://cslibrary.stanford.edu/103/LinkedListBasics.pdf', 'http://cslibrary.stanford.edu/105/LinkedListProblems.pdf', 'http://www.jjj.de/fxt/fxtbook.pdf', 'http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf', 'http://igm.univ-mlv.fr/~mac/REC/text-algorithms.pdf']

We can download a single PDF file from a given url

pdf_url = pdf_urls[0]
pdf_url

'https://people.gnome.org/~swilmet/glib-gtk-dev-platform.pdf'

file_name = pdf_hunter.get_pdf_name(pdf_url)
file_name

'glib-gtk-dev-platform.pdf'

import os

os.path.isfile(file_name)

False

pdf_hunter.download_file(pdf_url, folder_path=os.getcwd())

os.path.isfile(file_name)

True

Or download all PDF files from the page

pdf_hunter.download_pdf_files(url, folder_path=os.getcwd())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_hunter-0.1.6.tar.gz (2.8 kB view details)

Uploaded Source

Built Distributions

pdf_hunter-0.1.6-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

pdf_hunter-0.1.6-py2-none-any.whl (4.2 kB view details)

Uploaded Python 2

File details

Details for the file pdf_hunter-0.1.6.tar.gz.

File metadata

  • Download URL: pdf_hunter-0.1.6.tar.gz
  • Upload date:
  • Size: 2.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for pdf_hunter-0.1.6.tar.gz
Algorithm Hash digest
SHA256 01e0ab46ece27b364842fc58a64342672e8301d95306f78c14b8bbb0c973af63
MD5 2b60ff1f523cf3230ad99053d2b5c378
BLAKE2b-256 53d72ffd17b5e581aa475bfa250841f25e5076c5898365b692ea07d6986631e7

See more details on using hashes here.

File details

Details for the file pdf_hunter-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: pdf_hunter-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 4.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for pdf_hunter-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 7966142d321ba3ed6671fd1a65813d33c006835771547b8b41628f25ed1caf8a
MD5 4f581be54f2f3688944b03c27fa18816
BLAKE2b-256 ad16ef1f7639c8724aa26b3f4f7f2a1d0ba04ab2b70ce2268c3448f0879dcdd9

See more details on using hashes here.

File details

Details for the file pdf_hunter-0.1.6-py2-none-any.whl.

File metadata

  • Download URL: pdf_hunter-0.1.6-py2-none-any.whl
  • Upload date:
  • Size: 4.2 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.4.2 requests/2.22.0 setuptools/42.0.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/2.7.15

File hashes

Hashes for pdf_hunter-0.1.6-py2-none-any.whl
Algorithm Hash digest
SHA256 db53796b998b7231cc835c76aa2f3add334374e0b67ce1aead0d001b43fbdf6f
MD5 31b1a57bdef3957f90e092360f338b4e
BLAKE2b-256 e81388b1338b781f1a72e399f8da0a9cbd6aed34129366a7cd2d341c75cc1cb1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page