Skip to main content

Automated pipeline for Ebook metadata enrichment, conversion, and cloud upload.

Project description

📚 Epub Pipeline

PyPI Version Python Version License Build Status Code Style: ruff

The ultimate automated tool for curating your Ebook library.

This pipeline extracts metadata from your EPUB files, attempts to find better metadata online (Google Books, OpenLibrary), standardizes filenames, converts to KEPUB (for Kobo e-readers), and uploads the results to Google Drive or a local folder.

🚀 Key Features

  • Smart Metadata Enrichment:
    • Waterfall Search Strategy: Prioritizes ISBN lookups (high precision) but falls back to a "relaxed" text search (Title/Author/Publisher) if no ISBN is found.
    • Confidence Scoring: Calculates a reliability score (0-100%) for each match based on title similarity, author overlap, and result uniqueness.
  • Safety First:
    • Interactive Review: By default, low-confidence matches require your confirmation.
    • Granular Control (-i): Optionally review every single field change (Title, Author, Description, etc.) before applying.
    • Non-Destructive: Processes files in a temporary workspace; original files are never modified in place unless output to the same directory.
  • Media Management:
    • High-Res Covers: Automatically downloads and optimizes covers for e-ink screens (resizing to max 1600x2400, grayscale optimized JPEG).
  • Kobo Optimization:
    • Native integration with kepubify to convert EPUBs to KEPUB for faster page turns and better formatting on Kobo devices.
  • Cloud Sync:
    • Direct upload to Google Drive (ideal for use with KoboCloud).
    • Resumable uploads for large files.

🛠️ Installation

1. Prerequisites

  • Python 3.12+
  • Kepubify: Required for Kobo conversion.
    1. Download the binary from pgaskin/kepubify.
    2. Place it in your system PATH (recommended).
    3. Rename it to kepubify (Windows: kepubify.exe) and ensure it is executable.

2. Install Package

Clone the repository and install it in editable mode:

git clone https://github.com/your-repo/epub-pipeline.git
cd epub-pipeline
pip install -e .

This will install the epubpipe command globally in your Python environment.

3. Configuration (.env)

Copy the template and edit your settings:

cp .env.example .env

Note: The tool looks for .env in the directory where you run the command.

4. Google Drive (Optional)

To enable Cloud Upload:

  1. Create a project in Google Cloud Console.
  2. Enable the Google Drive API.
  3. Create OAuth 2.0 Client IDs (Desktop App).
  4. Download the JSON, rename it to credentials.json, and place it in your working directory.
  5. Set GOOGLE_CREDENTIALS_PATH=credentials.json in .env.

🎮 Usage

Basic Usage

Process a single file or an entire directory using the CLI command:

# Process all .epub files in the data/ folder
epubpipe data/

# Process a specific file
epubpipe data/dune.epub

CLI Options

Flag Description
-i, --interactive Granular Review Mode: Ask for confirmation for each field (Title, Date, Cover...) that differs.
--auto Batch Mode: Automatically accept changes if confidence > 80%, skip others.
--no-kepub Disable KEPUB conversion for this run.
--no-rename Keep original filenames.
--no-upload Process locally only (files remain in output/ or temp).
--isbn <ISBN> Force a specific ISBN for the search (works only with single file).
-v, --verbose Enable debug logs.
-s <source> Limit search to google or openlibrary.

Examples

1. Interactive Review (Recommended for new books)

epubpipe data/new_books/ -i

2. Force specific ISBN Useful if the automatic search finds the wrong edition.

epubpipe data/unknown_book.epub --isbn 9780441172719

3. Offline / Local Only Just clean metadata, rename, and convert, without uploading.

epubpipe data/ --no-upload --no-kepub

🛠️ Debugging Tools

The tools/ directory contains standalone scripts to diagnose issues. You can run them as modules from the project root:

  • Inspector: See exactly what metadata exists inside a file.
    python -m tools.inspect data/book.epub --full
    
  • Search Tester: Test the search logic and see confidence scores without changing files.
    python -m tools.search data/book.epub
    
  • Dry Run: Simulate the whole process (including renaming/conversion logic) without writing to disk.
    python -m tools.dry_run data/
    
  • Manual Upload: Upload a file or folder to Google Drive immediately.
    python -m tools.upload data/book.epub
    

💻 Development

Setup

# Install in editable mode with dev dependencies
pip install -e .[dev]

# Install pre-commit hooks
pre-commit install

Running Tests

pytest

Manual Linting

ruff check .
mypy .

🔗 Credits

  • kepubify by pgaskin.
  • Google Books API & OpenLibrary API.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epub_pipeline-0.1.0a1.tar.gz (37.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

epub_pipeline-0.1.0a1-py3-none-any.whl (47.0 kB view details)

Uploaded Python 3

File details

Details for the file epub_pipeline-0.1.0a1.tar.gz.

File metadata

  • Download URL: epub_pipeline-0.1.0a1.tar.gz
  • Upload date:
  • Size: 37.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for epub_pipeline-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 e1a6838cc4c93d39d8b09c7529fea6d32f267d44f372f0501653372df813f7f5
MD5 4af8189ca5dc17c2c650199a92b78d1d
BLAKE2b-256 186017cbc0c0b57ae2549bd196345888df361cf97fffc0bcb4a82eb2e87f2eac

See more details on using hashes here.

File details

Details for the file epub_pipeline-0.1.0a1-py3-none-any.whl.

File metadata

File hashes

Hashes for epub_pipeline-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 0d4967b27039216306369246a3549e8539f2ffc9848d0896747f3a4e16c1aa7c
MD5 8739f46fc812a07f77c0067c8d314db4
BLAKE2b-256 31591e1d5b1d9b58d311f3f1c04c3312132e9e8499cec5f776a8ca22d37833d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page