Skip to main content

A family of LLM-enhanced PDF utilities

Project description

pdf-llm-tools

pdf-llm-tools is a family of AI pdf utilities:

  • pdfllm-titler renames a pdf with metadata parsed from the filename and contents. In particular it renames it as YEAR-AUTHOR-TITLE.pdf.
  • (todo) pdfllm-toccer adds a bookmark structure parsed from the detected contents table of the pdf.

We currently use poppler/pdftotext for layout-preserving text extraction and PyMuPDF to update outlines. OpenAI's gpt-3.5-turbo-1106 is hardcoded as the LLM backend. The program requires an OpenAI API key via option, envvar, or manual input.

Installation

pip install pdf-llm-tools

Usage

These utilities require all PDFs to have a correct OCR layer. Run something like OCRmyPDF if needed.

pdfllm-titler

pdfllm-titler a.pdf b.pdf c.pdf
pdfllm-titler --last-page 8 d.pdf

See --help for full details.

Development

This project is made with Hatch.

  • Build: hatch build
  • Test: hatch run test:test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_llm_tools-0.0.3.tar.gz (292.4 kB view hashes)

Uploaded Source

Built Distribution

pdf_llm_tools-0.0.3-py3-none-any.whl (5.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page