A Linux-specific tool for naming PDF files
Project description
PDF Namer
PDF Namer is a Python CLI application that processes PDF documents and renames them based on AI-generated descriptions. This tool is designed to help organize and manage collections of PDF documents by extracting meaningful information and creating standardized filenames.
Features
- Process single PDF files or entire directories recursively
- Generate meaningful filenames using OpenAI's GPT models
- Multiprocessing support for faster batch processing
- Customizable number of worker processes
- Language selection for filename generation
Installation
-
Clone this repository:
git clone https://github.com/llabusch93/pdf-namer.git cd pdf-namer
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up your OpenAI API key as an environment variable:
export OPENAI_API_KEY=your_api_key_here
Usage
To process a single PDF file:
pdf-namer /path/to/your/file.pdf
To process all PDF files in a directory recursively:
pdf-namer /path/to/your/directory
To specify the number of worker processes:
pdf-namer /path/to/your/directory --workers 5
To specify the language for filename generation:
pdf-namer /path/to/your/file.pdf --language english
How it works
- The program checks if the input is a single file or a directory.
- For each PDF file: a. The text is extracted from the PDF. b. The extracted text is sent to OpenAI's GPT model to generate a meaningful filename. c. The file is renamed using the generated filename.
- If processing a directory, multiple files are processed concurrently using Python's
multiprocessing
module.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Author
Laurence Labusch (laurence.labusch@gmail.com)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pdf_namer-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22d30cbcaba812e11594ae3aac4d605a673bd9717618bc1fa95f75f75b033864 |
|
MD5 | 09b7e39312b13e482217d767d5e6ee7d |
|
BLAKE2b-256 | ff6c26c81bff1bb3a4ebfe88789366a615504d40a65e048bd88a4c69fc7a1504 |