Skip to main content

A Linux-specific tool for naming PDF files

Project description

PDF Namer

PDF Namer is a Python CLI application that processes PDF documents and renames them based on AI-generated descriptions. This tool is designed to help organize and manage collections of PDF documents by extracting meaningful information and creating standardized filenames.

Features

  • Process single PDF files or entire directories recursively
  • Generate meaningful filenames using OpenAI's GPT models
  • Multiprocessing support for faster batch processing
  • Customizable number of worker processes
  • Language selection for filename generation
  • Skip files that are already correctly named
  • Force processing of files even if they are already correctly named

Installation

  1. Clone this repository:

    git clone https://github.com/llabusch93/pdf-namer.git
    cd pdf-namer
    
  2. Create a virtual environment and activate it:

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
    
  3. Install the required dependencies:

    pip install -r requirements.txt
    
  4. Set up your OpenAI API key as an environment variable:

    export OPENAI_API_KEY=your_api_key_here
    

Usage

To process a single PDF file:

pdf-namer /path/to/your/file.pdf

To process all PDF files in a directory recursively:

pdf-namer /path/to/your/directory

To specify the number of worker processes:

pdf-namer /path/to/your/directory --workers 5

To specify the language for filename generation:

pdf-namer /path/to/your/file.pdf --language english

To force processing of files even if they are already correctly named:

pdf-namer /path/to/your/file.pdf --force

How it works

  1. The program checks if the input is a single file or a directory.
  2. For each PDF file: a. The program checks if the filename is already in the correct format (YYYY-MM-DD -- DOCUMENT_KIND - DOCUMENT_DESCRIPTION.pdf). b. If the filename is correct and the --force flag is not used, the file is skipped. c. If the filename is incorrect or the --force flag is used:
    • The text is extracted from the PDF.
    • The extracted text is sent to OpenAI's GPT model to generate a meaningful filename.
    • The file is renamed using the generated filename.
  3. If processing a directory, multiple files are processed concurrently using Python's multiprocessing module.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

Laurence Labusch (laurence.labusch@gmail.com)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_namer-0.3.1.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf_namer-0.3.1-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file pdf_namer-0.3.1.tar.gz.

File metadata

  • Download URL: pdf_namer-0.3.1.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for pdf_namer-0.3.1.tar.gz
Algorithm Hash digest
SHA256 d238d81538a52c318bda4f4861df2a6f1a05d8b479dc7f5b16741acf346bf0a4
MD5 2ef8b4015ee6eab64f6e4430efdc4a53
BLAKE2b-256 67ec55544ca7ee9a8086e68fc8be26eed2656e9214ac3e26c8669a30faa4b25d

See more details on using hashes here.

File details

Details for the file pdf_namer-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: pdf_namer-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for pdf_namer-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fe32eed6dbcbc8b1a1998f3f8c2f42c760f5ea5090729f21f824180d578df91d
MD5 4fa03de6e18526c80dc94767c1ce9181
BLAKE2b-256 a43efb57f627f6eddbf7fb38c6e6ac33c5ce910f5c54ee55aa08ba4744d1f6ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page