A Linux-specific tool for naming PDF files
Project description
PDF Namer
PDF Namer is a Python CLI application that processes PDF documents and renames them based on AI-generated descriptions. This tool is designed to help organize and manage collections of PDF documents by extracting meaningful information and creating standardized filenames.
Features
- Process single PDF files or entire directories recursively
- Generate meaningful filenames using OpenAI's GPT models
- Multiprocessing support for faster batch processing
- Customizable number of worker processes
- Language selection for filename generation
Installation
-
Clone this repository:
git clone https://github.com/llabusch93/pdf-namer.git cd pdf-namer
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up your OpenAI API key as an environment variable:
export OPENAI_API_KEY=your_api_key_here
Usage
To process a single PDF file:
pdf-namer /path/to/your/file.pdf
To process all PDF files in a directory recursively:
pdf-namer /path/to/your/directory
To specify the number of worker processes:
pdf-namer /path/to/your/directory --workers 5
To specify the language for filename generation:
pdf-namer /path/to/your/file.pdf --language english
How it works
- The program checks if the input is a single file or a directory.
- For each PDF file: a. The text is extracted from the PDF. b. The extracted text is sent to OpenAI's GPT model to generate a meaningful filename. c. The file is renamed using the generated filename.
- If processing a directory, multiple files are processed concurrently using Python's
multiprocessing
module.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Author
Laurence Labusch (laurence.labusch@gmail.com)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pdf_namer-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6987ad467556f3849b729cc3c6d32efb5d4446f9ef114138e82fd29d0d9ffe3 |
|
MD5 | 0db9e21e2b5551eb128cf968f82ebbf7 |
|
BLAKE2b-256 | 454bc2d38ac4ac667dac9fd58e17e6f28efafa0ba291ee5428a26bdf0344cbc9 |