Skip to main content

A Linux-specific tool for naming PDF files

Project description

PDF Namer

PDF Namer is a Python CLI application that processes PDF documents and renames them based on AI-generated descriptions. This tool is designed to help organize and manage collections of PDF documents by extracting meaningful information and creating standardized filenames.

Features

  • Process single PDF files or entire directories recursively
  • Generate meaningful filenames using OpenAI's GPT models
  • Multiprocessing support for faster batch processing
  • Customizable number of worker processes
  • Language selection for filename generation
  • Skip files that are already correctly named
  • Force processing of files even if they are already correctly named

Installation

  1. Clone this repository:

    git clone https://github.com/llabusch93/pdf-namer.git
    cd pdf-namer
    
  2. Create a virtual environment and activate it:

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
    
  3. Install the required dependencies:

    pip install -r requirements.txt
    
  4. Set up your OpenAI API key. You have two options:

    a. Set it as an environment variable:

    export OPENAI_API_KEY=your_api_key_here
    

    b. Store it in a file:

    • Create a file named .openai in your home directory (~/.openai)
    • Add your API key to this file (just the key, without any quotes or additional text)

    The application will first check for the environment variable, and if not found, it will look for the .openai file in your home directory.

Usage

To process a single PDF file:

pdf-namer /path/to/your/file.pdf

To process all PDF files in a directory recursively:

pdf-namer /path/to/your/directory

To specify the number of worker processes:

pdf-namer /path/to/your/directory --workers 5

To specify the language for filename generation:

pdf-namer /path/to/your/file.pdf --language english

To force processing of files even if they are already correctly named:

pdf-namer /path/to/your/file.pdf --force

How it works

  1. The program checks if the input is a single file or a directory.
  2. For each PDF file: a. The program checks if the filename is already in the correct format (YYYY-MM-DD -- DOCUMENT_KIND - DOCUMENT_DESCRIPTION.pdf). b. If the filename is correct and the --force flag is not used, the file is skipped. c. If the filename is incorrect or the --force flag is used:
    • The text is extracted from the PDF.
    • The extracted text is sent to OpenAI's GPT model to generate a meaningful filename.
    • The file is renamed using the generated filename.
  3. If processing a directory, multiple files are processed concurrently using Python's multiprocessing module.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

Laurence Labusch (laurence.labusch@gmail.com)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_namer-0.3.4.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

pdf_namer-0.3.4-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file pdf_namer-0.3.4.tar.gz.

File metadata

  • Download URL: pdf_namer-0.3.4.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for pdf_namer-0.3.4.tar.gz
Algorithm Hash digest
SHA256 794b892dc2b2eb28d548b41cace6c101ec2f2a396ee78a3e0fcc9d7d58978516
MD5 9a013969279e247c9f1c881658c2b015
BLAKE2b-256 6a979357c1bbf5487cddc3ced2ccb875b995b9ea26c510e0d5530e5e1c4fb2cb

See more details on using hashes here.

File details

Details for the file pdf_namer-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: pdf_namer-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for pdf_namer-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f86aab9987667e39274b8bcadde0eb213d7fdcf2e4650f2fa396f671e7640d9e
MD5 56af9a6e3872221d2fdefc8606e4ea89
BLAKE2b-256 856ef980c5890beb481e45ffff7547b51bca3b9855408fe7b2b17e2035ff4ec0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page