Skip to main content

Python3 library to get urls from PDF files.

Project description

lemonpdf

PyPI - Downloads PyPI - License GitHub Tag

Python3 library to get urls from PDF files.

Install

sudo apt install tesseract-ocr poppler-utils
pip install lemonpdf

Quickstart

Command line interface use (CLI)

get urls

lemonpdf -u file.pdf

save urls list in file txt

lemonpdf -u file.pdf -o urls.txt -s

get domains

lemonpdf -d file.pdf

save domains in file txt

lemonpdf -d file.pdf -o domains.txt -s

scripts

get urls and save file txt

from lemonpdf import Extractor

pdf_path = 'file.pdf'
output_txt_path = 'out_file.txt'

extractor = Extractor(pdf_path=pdf_path, output_txt_path=output_txt_path)

urls = extractor.extract_urls(save=True)

print(urls)

get domains and save file txt

from lemonpdf import Extractor

pdf_path = 'file.pdf'
output_txt_path = 'domains.txt'

extractor = Extractor(pdf_path=pdf_path, output_txt_path=output_txt_path)

urls = extractor.extract_domains(save=True)

print(urls)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lemonpdf-2.0.0.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

lemonpdf-2.0.0-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file lemonpdf-2.0.0.tar.gz.

File metadata

  • Download URL: lemonpdf-2.0.0.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for lemonpdf-2.0.0.tar.gz
Algorithm Hash digest
SHA256 15852f44f492e9b5a2772349c7a7afa37e0c5bb25a10bad3bf343ab2b0b54a6b
MD5 8dafc160abc42c0527880ff19ac9db36
BLAKE2b-256 c6ab511613ea8e3f8b638c2bfa0d7b8110cf78949d2ce10677b55670c60845de

See more details on using hashes here.

File details

Details for the file lemonpdf-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: lemonpdf-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.6

File hashes

Hashes for lemonpdf-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c96d225be0257320209efb0beab58232e8021b570e0363498b6ca18099e853ea
MD5 eeb4d3aafdf1a726607295871cdb6d56
BLAKE2b-256 5422f99684bf557dfe3bf5cc6e5d855acacd3cbd696635d72bbafbf2492b161b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page