A family of LLM-enhanced PDF utilities
Project description
pdf-llm-tools
pdf-llm-tools
is a family of AI pdf utilities:
pdfllm-titler
renames a pdf with metadata parsed from the filename and contents. In particular it renames it asYEAR-AUTHOR-TITLE.pdf
.- (todo)
pdfllm-toccer
adds a bookmark structure parsed from the detected contents table of the pdf.
We currently use poppler/pdftotext for
layout-preserving text extraction and PyMuPDF to update outlines. OpenAI's
gpt-3.5-turbo-1106
is hardcoded as the LLM backend. The program requires an
OpenAI API key via option, envvar, or manual input.
Installation
pip install pdf-llm-tools
Usage
These utilities require all PDFs to have a correct OCR layer. Run something like OCRmyPDF if needed.
pdfllm-titler
pdfllm-titler a.pdf b.pdf c.pdf
pdfllm-titler --last-page 8 d.pdf
See --help
for full details.
Development
This project is made with Hatch.
- Build:
hatch build
- Test:
hatch run test:test
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdf_llm_tools-0.0.3.tar.gz
(292.4 kB
view hashes)
Built Distribution
Close
Hashes for pdf_llm_tools-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 180e39bdbb4286667b37734c12b6da09db21a235dae9c7a83a9c2305eeb68b53 |
|
MD5 | 1450e3555bf84c4147ffb96dd0ebed8a |
|
BLAKE2b-256 | 9a23466c958d468167268f705cdeefb9738566e0e85fb61ed4c98937568c0c5e |