Python3 library to get urls from PDF files.
Project description
lemonpdf
Python3 library to get urls from PDF files.
Install
sudo apt install tesseract-ocr poppler-utils
pip install lemonpdf
Quickstart
Command line interface use (CLI)
get urls
lemonpdf -u file.pdf
save urls list in file txt
lemonpdf -u file.pdf -o urls.txt -s
get domains
lemonpdf -d file.pdf
save domains in file txt
lemonpdf -d file.pdf -o domains.txt -s
scripts
get urls and save file txt
from lemonpdf import Extractor
pdf_path = 'file.pdf'
output_txt_path = 'out_file.txt'
extractor = Extractor(pdf_path=pdf_path, output_txt_path=output_txt_path)
urls = extractor.extract_urls(save=True)
print(urls)
get domains and save file txt
from lemonpdf import Extractor
pdf_path = 'file.pdf'
output_txt_path = 'domains.txt'
extractor = Extractor(pdf_path=pdf_path, output_txt_path=output_txt_path)
urls = extractor.extract_domains(save=True)
print(urls)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
lemonpdf-2.0.0.tar.gz
(3.9 kB
view details)
Built Distribution
File details
Details for the file lemonpdf-2.0.0.tar.gz
.
File metadata
- Download URL: lemonpdf-2.0.0.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15852f44f492e9b5a2772349c7a7afa37e0c5bb25a10bad3bf343ab2b0b54a6b |
|
MD5 | 8dafc160abc42c0527880ff19ac9db36 |
|
BLAKE2b-256 | c6ab511613ea8e3f8b638c2bfa0d7b8110cf78949d2ce10677b55670c60845de |
File details
Details for the file lemonpdf-2.0.0-py3-none-any.whl
.
File metadata
- Download URL: lemonpdf-2.0.0-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c96d225be0257320209efb0beab58232e8021b570e0363498b6ca18099e853ea |
|
MD5 | eeb4d3aafdf1a726607295871cdb6d56 |
|
BLAKE2b-256 | 5422f99684bf557dfe3bf5cc6e5d855acacd3cbd696635d72bbafbf2492b161b |