A Linux-specific tool for naming PDF files
Project description
PDF Namer
PDF Namer is a Python CLI application that processes PDF documents and renames them based on AI-generated descriptions. This tool is designed to help organize and manage collections of PDF documents by extracting meaningful information and creating standardized filenames.
Features
- Process single PDF files or entire directories recursively
- Generate meaningful filenames using OpenAI's GPT models
- Multiprocessing support for faster batch processing
- Customizable number of worker processes
- Language selection for filename generation
- Skip files that are already correctly named
- Force processing of files even if they are already correctly named
Installation
-
Clone this repository:
git clone https://github.com/llabusch93/pdf-namer.git cd pdf-namer
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up your OpenAI API key. You have two options:
a. Set it as an environment variable:
export OPENAI_API_KEY=your_api_key_here
b. Store it in a file:
- Create a file named
.openai
in your home directory (~/.openai
) - Add your API key to this file (just the key, without any quotes or additional text)
The application will first check for the environment variable, and if not found, it will look for the
.openai
file in your home directory. - Create a file named
Usage
To process a single PDF file:
pdf-namer /path/to/your/file.pdf
To process all PDF files in a directory recursively:
pdf-namer /path/to/your/directory
To specify the number of worker processes:
pdf-namer /path/to/your/directory --workers 5
To specify the language for filename generation:
pdf-namer /path/to/your/file.pdf --language english
To force processing of files even if they are already correctly named:
pdf-namer /path/to/your/file.pdf --force
How it works
- The program checks if the input is a single file or a directory.
- For each PDF file:
a. The program checks if the filename is already in the correct format (YYYY-MM-DD -- DOCUMENT_KIND - DOCUMENT_DESCRIPTION.pdf).
b. If the filename is correct and the
--force
flag is not used, the file is skipped. c. If the filename is incorrect or the--force
flag is used:- The text is extracted from the PDF.
- The extracted text is sent to OpenAI's GPT model to generate a meaningful filename.
- The file is renamed using the generated filename.
- If processing a directory, multiple files are processed concurrently using Python's
multiprocessing
module.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Author
Laurence Labusch (laurence.labusch@gmail.com)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pdf_namer-0.3.4.tar.gz
.
File metadata
- Download URL: pdf_namer-0.3.4.tar.gz
- Upload date:
- Size: 7.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 794b892dc2b2eb28d548b41cace6c101ec2f2a396ee78a3e0fcc9d7d58978516 |
|
MD5 | 9a013969279e247c9f1c881658c2b015 |
|
BLAKE2b-256 | 6a979357c1bbf5487cddc3ced2ccb875b995b9ea26c510e0d5530e5e1c4fb2cb |
File details
Details for the file pdf_namer-0.3.4-py3-none-any.whl
.
File metadata
- Download URL: pdf_namer-0.3.4-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f86aab9987667e39274b8bcadde0eb213d7fdcf2e4650f2fa396f671e7640d9e |
|
MD5 | 56af9a6e3872221d2fdefc8606e4ea89 |
|
BLAKE2b-256 | 856ef980c5890beb481e45ffff7547b51bca3b9855408fe7b2b17e2035ff4ec0 |