Automated pipeline for Ebook metadata enrichment, conversion, and cloud upload.
Project description
📚 Epub Pipeline
The ultimate automated tool for curating your Ebook library.
This pipeline extracts metadata from your EPUB files, attempts to find better metadata online (Google Books, OpenLibrary), standardizes filenames, converts to KEPUB (for Kobo e-readers), and uploads the results to Google Drive or a local folder.
Key Features
- Smart Metadata Enrichment:
- Waterfall Search Strategy: Prioritizes ISBN lookups (high precision) but falls back to a "relaxed" text search (Title/Author/Publisher) if no ISBN is found.
- Confidence Scoring: Calculates a reliability score (0-100%) for each match based on title similarity, author overlap, and result uniqueness.
- Safety First:
- Interactive Review: By default, low-confidence matches require your confirmation.
- Granular Control (-i): Optionally review every single field change (Title, Author, Description, etc.) before applying.
- Non-Destructive: Processes files in a temporary workspace; original files are never modified in place unless output to the same directory.
- Media Management:
- High-Res Covers: Automatically downloads and optimizes covers for e-ink screens (resizing to max 1600x2400, grayscale optimized JPEG).
- Kobo Optimization:
- Native integration with kepubify to convert EPUBs to KEPUB for faster page turns and better formatting on Kobo devices.
- Cloud Sync:
- Direct upload to Google Drive (ideal for use with KoboCloud).
- Resumable uploads for large files.
Installation
1. Prerequisites
- Python 3.12+
- Kepubify: Required for Kobo conversion.
- Download the binary from pgaskin/kepubify.
- Place it in your system
PATH(recommended). - Rename it to
kepubify(Windows:kepubify.exe) and ensure it is executable.
2. Install Package
Clone the repository and install it in editable mode:
git clone https://github.com/your-repo/epub-pipeline.git
cd epub-pipeline
pip install -e .
This will install the epubpipe command globally in your Python environment.
3. Configuration (.env)
Copy the template and edit your settings:
cp .env.example .env
Note: The tool looks for .env in the directory where you run the command.
4. Google Drive (Optional)
To enable Cloud Upload:
- Create a project in Google Cloud Console.
- Enable the Google Drive API.
- Create OAuth 2.0 Client IDs (Desktop App).
- Download the JSON, rename it to
credentials.json, and place it in your working directory. - Set
GOOGLE_CREDENTIALS_PATH=credentials.jsonin.env.
Usage
Basic Usage
Process a single file or an entire directory using the CLI command:
# Process all .epub files in the data/ folder
epubpipe data/
# Process a specific file
epubpipe data/dune.epub
CLI Options
| Flag | Description |
|---|---|
-i, --interactive |
Granular Review Mode: Ask for confirmation for each field (Title, Date, Cover...) that differs. |
--auto |
Batch Mode: Automatically accept changes if confidence > 80%, skip others. |
--no-kepub |
Disable KEPUB conversion for this run. |
--no-rename |
Keep original filenames. |
--no-upload |
Process locally only (files remain in output/ or temp). |
--isbn <ISBN> |
Force a specific ISBN for the search (works only with single file). |
-v, --verbose |
Enable debug logs. |
-s <source> |
Limit search to google or openlibrary. |
Examples
1. Interactive Review (Recommended for new books)
epubpipe data/new_books/ -i
2. Force specific ISBN Useful if the automatic search finds the wrong edition.
epubpipe data/unknown_book.epub --isbn 9780441172719
3. Offline / Local Only Just clean metadata, rename, and convert, without uploading.
epubpipe data/ --no-upload --no-kepub
Debugging Tools
The tools/ directory contains standalone scripts to diagnose issues. You can run them as modules from the project root:
- Inspector: See exactly what metadata exists inside a file.
python -m tools.inspect data/book.epub --full
- Search Tester: Test the search logic and see confidence scores without changing files.
python -m tools.search data/book.epub
- Dry Run: Simulate the whole process (including renaming/conversion logic) without writing to disk.
python -m tools.dry_run data/
- Manual Upload: Upload a file or folder to Google Drive immediately.
python -m tools.upload data/book.epub
Development
Setup
# Install in editable mode with dev dependencies
pip install -e .[dev]
# Install pre-commit hooks
pre-commit install
Running Tests
pytest
Manual Linting
ruff check .
mypy .
Credits
- kepubify by pgaskin.
- Google Books API & OpenLibrary API.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file epub_pipeline-1.0.1.tar.gz.
File metadata
- Download URL: epub_pipeline-1.0.1.tar.gz
- Upload date:
- Size: 37.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a02fdce43217eee25503dd2602d4f1e6d9b5a3ad0709f6a9be319b1ac0bbdea
|
|
| MD5 |
114e5da305f5803a833a3a1e213c431f
|
|
| BLAKE2b-256 |
5841d3df53dd32898e1c6fbd4ba9b150a86c5ab4c629075ee284fbb0cfb48416
|
File details
Details for the file epub_pipeline-1.0.1-py3-none-any.whl.
File metadata
- Download URL: epub_pipeline-1.0.1-py3-none-any.whl
- Upload date:
- Size: 47.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8aa0e1c620c9dadd316e984d9a21acbae9769b27715674f3ba6ce8cd7403c3d6
|
|
| MD5 |
08ba51c04f81ca488e0e689289056a96
|
|
| BLAKE2b-256 |
d95fac55d0bd84305c15b4931e9ff7c66649bf07c80eee81aae2b7035540b52e
|