A family of LLM-enhanced PDF utilities
Project description
pdf-llm-tools
pdf-llm-tools
is a family of AI PDF utilities:
pdfllm-titler
renames a PDF with metadata parsed from the filename and contents. In particular it renames it asYEAR-AUTHOR-TITLE.pdf
.- (todo)
pdfllm-toccer
adds a bookmark structure parsed from the detected contents table of the PDF.
We currently use poppler/pdftotext for
layout-preserving text extraction and PyMuPDF to update outlines. OpenAI's
gpt-4o-mini
is hardcoded as the LLM backend. The program requires an OpenAI
API key via option, envvar, or manual input.
Installation
pip install pdf-llm-tools
Usage
These utilities require all PDFs to have a correct OCR layer. Run something like OCRmyPDF if needed.
titler
pdfllm titler a.pdf b.pdf c.pdf
pdfllm titler --last-page 8 d.pdf
See --help
for full details.
Development
This project is made with Hatch.
- Build:
hatch build
- Test:
hatch run test:test_all [--openai-api-key KEY]
- The test system has the same API key handling as the main progam. The key
must be given either as an option in the
hatch run
invocation (which takes precedence) or as the envvarOPENAI_API_KEY
.
- The test system has the same API key handling as the main progam. The key
must be given either as an option in the
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdf_llm_tools-0.0.4.tar.gz
(294.6 kB
view hashes)
Built Distribution
Close
Hashes for pdf_llm_tools-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2cc6e7d2c3c1fb9efbd2b881dc32cdf91096053c888eb87c2248f818247ee386 |
|
MD5 | ee59127cd7ede3ea1d32168070c783eb |
|
BLAKE2b-256 | b13d3e83adda0c93ff9234cd14a7b5e5952acab06414c2cf02907d0d1481b730 |