Python3 library to get urls from PDF files.
Project description
lemonpdf
Python3 library to get urls from PDF files.
Install
sudo apt install tesseract-ocr poppler-utils
pip install lemonpdf
Quickstart
Command line interface use (CLI)
lemonpdf file.pdf
save file
lemonpdf file.pdf --output urls.txt --save
scripts
from lemonpdf import Extractor
pdf_path = 'file.pdf'
output_txt_path = 'out_file.txt'
extractor = Extractor(pdf_path=pdf_path, output_txt_path=output_txt_path)
urls = extractor.extract_urls_from_pdf(save=True)
print(urls)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
lemonpdf-1.0rc5.tar.gz
(2.0 kB
view hashes)
Built Distribution
Close
Hashes for lemonpdf-1.0rc5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e808a2bfc184c8679c648e3b38c7dc1d361c67ef665ecebacd8823f1730a9393 |
|
MD5 | 41c489cd75e6c9c6c1969517e06a9298 |
|
BLAKE2b-256 | f6e14135d0b3e182ee20c62ac226c95b3d299c3d056fe1bc8191e366fd19ea0d |