Clean and standardize messy book filenames using LLM + Google Books
Project description
📚 CleanMyBooks
Clean and standardize messy book filenames using LLM + Google Books API.
CleanMyBooks takes chaotic ebook filenames like python.crash.course.2ndEd_FINAL_v2.pdf and renames them to a clean, consistent format:
Eric Matthes - Python Crash Course (2019).pdf
Features
- 🧠 LLM-powered parsing via OpenRouter (GPT-4o-mini by default)
- 🔍 Google Books verification for authoritative metadata
- 📊 Confidence scoring — falls back to LLM if Google match is weak
- ⚡ Parallel processing with configurable thread workers
- 💾 JSON caching — avoids re-processing the same file twice
- 🛡️ Safe renaming — dry-run mode, collision-safe, no overwrites
- 📝 Supports:
.pdf,.epub,.mobi,.azw,.azw3
Installation
From source
git clone https://github.com/yourusername/cleanmybooks.git
cd cleanmybooks
pip install -e .
From PyPI (once published)
pip install cleanmybooks
Setup
- Copy the example env file:
cp .env.example .env
- Add your OpenRouter API key to
.env:
OPENROUTER_API_KEY=sk-or-v1-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Get a free API key at openrouter.ai/keys.
Usage
Basic — rename all books in a folder
cleanmybooks /path/to/my/ebooks
Dry run — preview changes without renaming
cleanmybooks /path/to/my/ebooks --dry-run
Verbose output with more workers
cleanmybooks /path/to/my/ebooks --workers 8 --verbose
Adjust confidence threshold
cleanmybooks /path/to/my/ebooks --confidence-threshold 0.75
Lower threshold = trust LLM more. Higher threshold = require stronger Google Books match.
Clear the cache
cleanmybooks --clear-cache
CLI Options
| Option | Default | Description |
|---|---|---|
folder |
(required) | Directory containing book files |
--dry-run |
False |
Preview changes without renaming |
--workers N |
4 |
Number of parallel threads |
--confidence-threshold FLOAT |
0.6 |
Min similarity score to use Google result |
--verbose |
False |
Enable debug logging |
--cache-file PATH |
~/.cleanmybooks_cache.json |
Custom cache file location |
--clear-cache |
— | Clear cache and exit |
Example Input/Output
| Original Filename | Cleaned Filename |
|---|---|
python.crash.course.2ndEd.pdf |
Eric Matthes - Python Crash Course (2019).pdf |
DUNE_frank_herbert_scanned.epub |
Frank Herbert - Dune (1965).epub |
clean_code_uncle_bob.pdf |
Robert C. Martin - Clean Code (2008).pdf |
atomic_habits_james_clear_2018.epub |
James Clear - Atomic Habits (2018).epub |
unknown_book_v3_FINAL.pdf |
Unknown Author - Unknown Title.pdf (graceful fallback) |
Output Format
Author - Title (Year).ext
Multi-author books are collapsed to:
First Author et al. - Title (Year).ext
How It Works
filename.pdf
│
▼
[LLM via OpenRouter]
│ Parse: title, authors, year
▼
[Google Books API]
│ Verify and enrich metadata
▼
[Confidence Score]
│ Token-overlap similarity (Jaccard)
│ ≥ threshold → use Google result
│ < threshold → fall back to LLM result
▼
[Rename]
│ Sanitize characters
│ Resolve collisions
│ Author - Title (Year).ext
▼
[Cache] → skip on next run
Environment Variables
| Variable | Description |
|---|---|
OPENROUTER_API_KEY |
Required. Your OpenRouter API key |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cleanmybooks-0.1.0.tar.gz.
File metadata
- Download URL: cleanmybooks-0.1.0.tar.gz
- Upload date:
- Size: 13.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
444269c660d73818a83825adf5601c1efa6b0b357ded9a0de04943efda0ba4c2
|
|
| MD5 |
f7de6fea83d3db69efad31992db7a46e
|
|
| BLAKE2b-256 |
eadc4482aa73de6334a1a3fffed57ed541f90dcef8ffda585d9c8ad59a1a9038
|
File details
Details for the file cleanmybooks-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cleanmybooks-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76c7544347c842db7ebd6a1fecb0a4cd392bf91a3994d6afbcff748672de15a4
|
|
| MD5 |
170219d98d38636999e87cccc6075017
|
|
| BLAKE2b-256 |
c2df1a4bd556421a08b5ff9018a85778254555cc447ed5033bf142248366f2f3
|