Skip to main content

A family of LLM-enhanced PDF utilities

Project description

pdf-llm-tools PyPI

pdf-llm-tools is a family of AI PDF utilities:

  • pdfllm-titler renames a PDF with metadata parsed from the filename and contents. In particular it renames it as YEAR-AUTHOR-TITLE.pdf.
  • (todo) pdfllm-toccer adds a bookmark structure parsed from the detected contents table of the PDF.

We currently use poppler/pdftotext for layout-preserving text extraction and PyMuPDF to update outlines. OpenAI's gpt-4o-mini is hardcoded as the LLM backend. The program requires an OpenAI API key via option, envvar, or manual input.

Installation

pip install pdf-llm-tools

Usage

These utilities require all PDFs to have a correct OCR layer. Run something like OCRmyPDF if needed.

titler

pdfllm titler a.pdf b.pdf c.pdf
pdfllm titler --last-page 8 d.pdf

See --help for full details.

Development

This project is made with Hatch.

  • Build: hatch build
  • Test: hatch run test:test_all [--openai-api-key KEY]
    • The test system has the same API key handling as the main progam. The key must be given either as an option in the hatch run invocation (which takes precedence) or as the envvar OPENAI_API_KEY.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_llm_tools-0.0.4.tar.gz (294.6 kB view hashes)

Uploaded Source

Built Distribution

pdf_llm_tools-0.0.4-py3-none-any.whl (7.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page