Skip to main content

A family of LLM-enhanced PDF utilities

Project description

pdf-llm-tools PyPI

pdf-llm-tools is a family of AI PDF utilities:

  • pdfllm-titler renames a PDF with metadata parsed from the filename and contents. In particular it renames it as YEAR-AUTHOR-TITLE.pdf.
  • (todo) pdfllm-toccer adds a bookmark structure parsed from the detected contents table of the PDF.

We currently use poppler/pdftotext for layout-preserving text extraction and PyMuPDF to update outlines. OpenAI's gpt-4o-mini is hardcoded as the LLM backend. The program requires an OpenAI API key via option, envvar, or manual input.

Installation

pip install pdf-llm-tools

Usage

These utilities require all PDFs to have a correct OCR layer. Run something like OCRmyPDF if needed.

titler

pdfllm titler a.pdf b.pdf c.pdf
pdfllm titler --last-page 8 d.pdf

See --help for full details.

Development

This project is made with Hatch.

  • Build: hatch build
  • Test: hatch run test:test_all [--openai-api-key KEY]
    • The test system has the same API key handling as the main progam. The key must be given either as an option in the hatch run invocation (which takes precedence) or as the envvar OPENAI_API_KEY.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_llm_tools-0.0.4.tar.gz (294.6 kB view details)

Uploaded Source

Built Distribution

pdf_llm_tools-0.0.4-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file pdf_llm_tools-0.0.4.tar.gz.

File metadata

  • Download URL: pdf_llm_tools-0.0.4.tar.gz
  • Upload date:
  • Size: 294.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.2

File hashes

Hashes for pdf_llm_tools-0.0.4.tar.gz
Algorithm Hash digest
SHA256 ea04394f65ef33976f601b44903b12229b1df90324aeef9f4623afd644f51e87
MD5 6fef27511a86b5b4be5605af0424d609
BLAKE2b-256 b50b3db627874c4979144a716584ffeffb89f126813a913c48a7e753a5f2d5ab

See more details on using hashes here.

File details

Details for the file pdf_llm_tools-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for pdf_llm_tools-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2cc6e7d2c3c1fb9efbd2b881dc32cdf91096053c888eb87c2248f818247ee386
MD5 ee59127cd7ede3ea1d32168070c783eb
BLAKE2b-256 b13d3e83adda0c93ff9234cd14a7b5e5952acab06414c2cf02907d0d1481b730

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page