Skip to main content

A Linux-specific tool for naming PDF files

Project description

PDF Namer

PDF Namer is a Python CLI application that processes PDF documents and renames them based on AI-generated descriptions. This tool is designed to help organize and manage collections of PDF documents by extracting meaningful information and creating standardized filenames.

Features

  • Process single PDF files or entire directories recursively
  • Generate meaningful filenames using OpenAI's GPT models
  • Multiprocessing support for faster batch processing
  • Customizable number of worker processes
  • Language selection for filename generation

Installation

  1. Clone this repository:

    git clone https://github.com/llabusch93/pdf-namer.git
    cd pdf-namer
    
  2. Create a virtual environment and activate it:

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
    
  3. Install the required dependencies:

    pip install -r requirements.txt
    
  4. Set up your OpenAI API key as an environment variable:

    export OPENAI_API_KEY=your_api_key_here
    

Usage

To process a single PDF file:

pdf-namer /path/to/your/file.pdf

To process all PDF files in a directory recursively:

pdf-namer /path/to/your/directory

To specify the number of worker processes:

pdf-namer /path/to/your/directory --workers 5

To specify the language for filename generation:

pdf-namer /path/to/your/file.pdf --language english

How it works

  1. The program checks if the input is a single file or a directory.
  2. For each PDF file: a. The text is extracted from the PDF. b. The extracted text is sent to OpenAI's GPT model to generate a meaningful filename. c. The file is renamed using the generated filename.
  3. If processing a directory, multiple files are processed concurrently using Python's multiprocessing module.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

Laurence Labusch (laurence.labusch@gmail.com)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_namer-0.1.0.tar.gz (6.2 kB view hashes)

Uploaded Source

Built Distribution

pdf_namer-0.1.0-py3-none-any.whl (6.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page