Intelligent PDF renaming using LLMs
Project description
PDF Renamer
Intelligent PDF file renaming using LLMs. This tool analyzes PDF content and metadata to suggest descriptive, standardized filenames.
🚀 Works with OpenAI, Ollama, LM Studio, and any OpenAI-compatible API
Features
- DOI-based naming - Automatically extracts DOI and fetches authoritative metadata for academic papers
- Advanced PDF parsing using docling-parse for better structure-aware extraction
- OCR fallback for scanned PDFs with low text content
- Smart LLM prompting with multi-pass analysis for improved accuracy
- Hybrid approach - Uses DOI metadata when available, falls back to LLM analysis for other documents
- Suggests filenames in format:
Author-Topic-Year.pdf - Dry-run mode to preview changes before applying
- Enhanced interactive mode with options to accept, manually edit, retry, or skip each file
- Live progress display with concurrent processing for speed
- Configurable concurrency limits for API calls and PDF extraction
- Batch processing of multiple PDFs with optional output directory
Installation
Quick Start (No Installation Required)
# Run directly with uvx
uvx pdf-renamer --dry-run /path/to/pdfs
Install from PyPI
# Using pip
pip install pdf-file-renamer
# Using uv
uv pip install pdf-file-renamer
Install from Source
# Clone and install
git clone https://github.com/nostoslabs/pdf-renamer.git
cd pdf-renamer
uv sync
Configuration
Configure your LLM provider:
Option A: OpenAI (Cloud)
cp .env.example .env
# Edit .env and add your OPENAI_API_KEY
Option B: Ollama or other local models
# No API key needed for local models
# Either set LLM_BASE_URL in .env or use --url flag
echo "LLM_BASE_URL=http://patmos:11434/v1" > .env
Usage
Quick Start
# Preview renames (dry-run mode)
pdf-renamer --dry-run /path/to/pdf/directory
# Actually rename files
pdf-renamer --no-dry-run /path/to/pdf/directory
# Interactive mode - review each file
pdf-renamer --interactive --no-dry-run /path/to/pdf/directory
Using uvx (No Installation)
# Run directly without installing
uvx pdf-renamer --dry-run /path/to/pdfs
# Run from GitHub
uvx https://github.com/nostoslabs/pdf-renamer --dry-run /path/to/pdfs
Options
--dry-run/--no-dry-run: Show suggestions without renaming (default: True)--interactive, -i: Interactive mode with rich options:- Accept - Use the suggested filename
- Edit - Manually modify the filename
- Retry - Ask the LLM to generate a new suggestion
- Skip - Skip this file and move to the next
--model: Model to use (default: llama3.2, works with any OpenAI-compatible API)--url: Custom base URL for OpenAI-compatible APIs (default: http://localhost:11434/v1)--pattern: Glob pattern for files (default: *.pdf)--output-dir, -o: Move renamed files to a different directory--max-concurrent-api: Maximum concurrent API calls (default: 3)--max-concurrent-pdf: Maximum concurrent PDF extractions (default: 10)
Examples
Using OpenAI:
# Preview all PDFs in current directory
uvx pdf-renamer --dry-run .
# Rename PDFs in specific directory
uvx pdf-renamer --no-dry-run ~/Documents/Papers
# Use a different OpenAI model
uvx pdf-renamer --model gpt-4o --dry-run .
Using Ollama (or other local models):
# Using Ollama on patmos server with gemma model
uvx pdf-renamer --url http://patmos:11434/v1 --model gemma3:latest --dry-run .
# Using local Ollama with qwen model
uvx pdf-renamer --url http://localhost:11434/v1 --model qwen2.5 --dry-run .
# Set URL in environment and just use model flag
export LLM_BASE_URL=http://patmos:11434/v1
uvx pdf-renamer --model gemma3:latest --dry-run .
Other examples:
# Process only specific files
uvx pdf-renamer --pattern "*2020*.pdf" --dry-run .
# Interactive mode with local model
uvx pdf-renamer --url http://patmos:11434/v1 --model gemma3:latest --interactive --no-dry-run .
# Run directly from GitHub
uvx https://github.com/nostoslabs/pdf-renamer --no-dry-run ~/Documents/Papers
Interactive Mode
When using --interactive mode, you'll be presented with each file one at a time with detailed options:
================================================================================
Original: 2024-research-paper.pdf
Suggested: Smith-Machine-Learning-Applications-2024.pdf
Confidence: high
Reasoning: Clear author and topic identified from abstract
================================================================================
Options:
y / yes / Enter - Accept suggested name
e / edit - Manually edit the filename
r / retry - Ask LLM to generate a new suggestion
n / no / skip - Skip this file
What would you like to do? [y]:
This mode is perfect for:
- Reviewing suggestions before applying them
- Fine-tuning filenames that are close but not quite right
- Retrying when the LLM suggestion isn't good enough
- Building confidence in the tool before batch processing
You can use interactive mode with --dry-run to preview without actually renaming files, or with --no-dry-run to apply changes immediately after confirmation.
How It Works
Intelligent Hybrid Approach
The tool uses a multi-strategy approach to generate accurate filenames:
-
DOI Detection (for academic papers)
- Searches PDF for DOI identifiers using pdf2doi
- If found, queries authoritative metadata (title, authors, year, journal)
- Generates filename with very high confidence from validated metadata
- Saves API costs - no LLM call needed for papers with DOIs
-
LLM Analysis (fallback for non-academic PDFs)
- Extract: Uses docling-parse to read first 5 pages with structure-aware parsing, falls back to PyMuPDF if needed
- OCR: Automatically applies OCR for scanned PDFs with minimal text
- Metadata Enhancement: Extracts focused hints (years, emails, author sections) to supplement unreliable PDF metadata
- Analyze: Sends full content excerpt to LLM with enhanced metadata and detailed extraction instructions
- Multi-pass Review: Low-confidence results trigger a second analysis pass with focused prompts
- Suggest: LLM returns filename in
Author-Topic-Yearformat with confidence level and reasoning
-
Interactive Review (optional): User can accept, edit, retry, or skip each suggestion
-
Rename: Applies suggestions (if not in dry-run mode)
Benefits of DOI Integration
- Accuracy: DOI metadata is canonical and verified
- Speed: Instant lookup vs. LLM processing time
- Cost: Free DOI lookups save on API costs for academic papers
- Reliability: Works even when PDF text extraction is poor
Cost Considerations
DOI-based Naming (Academic Papers):
- Completely free - No API costs
- No LLM needed - Direct metadata lookup
- Works for most academic papers with embedded DOIs
OpenAI (Fallback):
- Uses
gpt-4o-miniby default (very cost-effective) - Only called when DOI not found
- Processes first ~4500 characters per PDF
- Typical cost: ~$0.001-0.003 per PDF
Ollama/Local Models:
- Completely free (runs on your hardware)
- Works with any Ollama model (llama3, qwen2.5, mistral, etc.)
- Also compatible with LM Studio, vLLM, and other OpenAI-compatible endpoints
Filename Format
The tool generates filenames in this format:
Smith-Kalman-Filtering-Applications-2020.pdfAdamy-Electronic-Warfare-Modeling-Techniques.pdfBlair-Monopulse-Processing-Unresolved-Targets.pdf
Guidelines:
- First author's last name
- 3-6 word topic description (prioritizes clarity over brevity)
- Year (if identifiable)
- Hyphens between words
- Target ~80 characters (can be longer if needed for clarity)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_file_renamer-0.6.1.tar.gz.
File metadata
- Download URL: pdf_file_renamer-0.6.1.tar.gz
- Upload date:
- Size: 4.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3aa0d6c5f160f16e3bbc35dec86bcf28a4a1021a789b84c769be6d5e00f0fda6
|
|
| MD5 |
6a6a3e0c70533d620befd7a24dea9c09
|
|
| BLAKE2b-256 |
9e8279378118c603c601467ba3f2de39290cdd73914f3399724d8dd8c8c2f1a0
|
File details
Details for the file pdf_file_renamer-0.6.1-py3-none-any.whl.
File metadata
- Download URL: pdf_file_renamer-0.6.1-py3-none-any.whl
- Upload date:
- Size: 31.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
277c0a29a314a4457d855e4304aa7542d1e0469f2cb8d2de23623f584aea086d
|
|
| MD5 |
d87d3967db95102c314597747a97437f
|
|
| BLAKE2b-256 |
d1a245d9f8a8da6ecac8402bff6108715ae3303055ef282319ed9165f476338f
|