Automated pipeline for Ebook metadata enrichment, conversion, and cloud upload.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- End Users/Desktop
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.12

Project description

📚 Epub Pipeline

The ultimate automated tool for curating your Ebook library.

This pipeline extracts metadata from your EPUB files, attempts to find better metadata online (Google Books, OpenLibrary), standardizes filenames, converts to KEPUB (for Kobo e-readers), and uploads the results to Google Drive or a local folder.

🚀 Key Features

Smart Metadata Enrichment:
- Waterfall Search Strategy: Prioritizes ISBN lookups (high precision) but falls back to a "relaxed" text search (Title/Author/Publisher) if no ISBN is found.
- Confidence Scoring: Calculates a reliability score (0-100%) for each match based on title similarity, author overlap, and result uniqueness.
Safety First:
- Interactive Review: By default, low-confidence matches require your confirmation.
- Granular Control (-i): Optionally review every single field change (Title, Author, Description, etc.) before applying.
- Non-Destructive: Processes files in a temporary workspace; original files are never modified in place unless output to the same directory.
Media Management:
- High-Res Covers: Automatically downloads and optimizes covers for e-ink screens (resizing to max 1600x2400, grayscale optimized JPEG).
Kobo Optimization:
- Native integration with kepubify to convert EPUBs to KEPUB for faster page turns and better formatting on Kobo devices.
Cloud Sync:
- Direct upload to Google Drive (ideal for use with KoboCloud).
- Resumable uploads for large files.

🛠️ Installation

1. Prerequisites

Python 3.12+
Kepubify: Required for Kobo conversion.
1. Download the binary from pgaskin/kepubify.
2. Place it in your system PATH (recommended).
3. Rename it to kepubify (Windows: kepubify.exe) and ensure it is executable.

2. Install Package

Clone the repository and install it in editable mode:

git clone https://github.com/your-repo/epub-pipeline.git
cd epub-pipeline
pip install -e .

This will install the epubpipe command globally in your Python environment.

3. Configuration (.env)

Copy the template and edit your settings:

cp .env.example .env

Note: The tool looks for .env in the directory where you run the command.

4. Google Drive (Optional)

To enable Cloud Upload:

Create a project in Google Cloud Console.
Enable the Google Drive API.
Create OAuth 2.0 Client IDs (Desktop App).
Download the JSON, rename it to credentials.json, and place it in your working directory.
Set GOOGLE_CREDENTIALS_PATH=credentials.json in .env.

🎮 Usage

Basic Usage

Process a single file or an entire directory using the CLI command:

# Process all .epub files in the data/ folder
epubpipe data/

# Process a specific file
epubpipe data/dune.epub

CLI Options

Flag	Description
`-i`, `--interactive`	Granular Review Mode: Ask for confirmation for each field (Title, Date, Cover...) that differs.
`--auto`	Batch Mode: Automatically accept changes if confidence > 80%, skip others.
`--no-kepub`	Disable KEPUB conversion for this run.
`--no-rename`	Keep original filenames.
`--no-upload`	Process locally only (files remain in `output/` or temp).
`--isbn <ISBN>`	Force a specific ISBN for the search (works only with single file).
`-v`, `--verbose`	Enable debug logs.
`-s <source>`	Limit search to `google` or `openlibrary`.

Examples

1. Interactive Review (Recommended for new books)

epubpipe data/new_books/ -i

2. Force specific ISBN Useful if the automatic search finds the wrong edition.

epubpipe data/unknown_book.epub --isbn 9780441172719

3. Offline / Local Only Just clean metadata, rename, and convert, without uploading.

epubpipe data/ --no-upload --no-kepub

🛠️ Debugging Tools

The tools/ directory contains standalone scripts to diagnose issues. You can run them as modules from the project root:

Inspector: See exactly what metadata exists inside a file.
```
python -m tools.inspect data/book.epub --full
```
Search Tester: Test the search logic and see confidence scores without changing files.
```
python -m tools.search data/book.epub
```
Dry Run: Simulate the whole process (including renaming/conversion logic) without writing to disk.
```
python -m tools.dry_run data/
```
Manual Upload: Upload a file or folder to Google Drive immediately.
```
python -m tools.upload data/book.epub
```

💻 Development

Setup

# Install in editable mode with dev dependencies
pip install -e .[dev]

# Install pre-commit hooks
pre-commit install

Running Tests

pytest

Manual Linting

ruff check .
mypy .

🔗 Credits

kepubify by pgaskin.
Google Books API & OpenLibrary API.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- End Users/Desktop
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.12

Release history Release notifications | RSS feed

1.0.1

Dec 23, 2025

This version

0.1.0a1 pre-release

Dec 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epub_pipeline-0.1.0a1.tar.gz (37.4 kB view details)

Uploaded Dec 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

epub_pipeline-0.1.0a1-py3-none-any.whl (47.0 kB view details)

Uploaded Dec 22, 2025 Python 3

File details

Details for the file epub_pipeline-0.1.0a1.tar.gz.

File metadata

Download URL: epub_pipeline-0.1.0a1.tar.gz
Upload date: Dec 22, 2025
Size: 37.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for epub_pipeline-0.1.0a1.tar.gz
Algorithm	Hash digest
SHA256	`e1a6838cc4c93d39d8b09c7529fea6d32f267d44f372f0501653372df813f7f5`
MD5	`4af8189ca5dc17c2c650199a92b78d1d`
BLAKE2b-256	`186017cbc0c0b57ae2549bd196345888df361cf97fffc0bcb4a82eb2e87f2eac`

See more details on using hashes here.

File details

Details for the file epub_pipeline-0.1.0a1-py3-none-any.whl.

File metadata

Download URL: epub_pipeline-0.1.0a1-py3-none-any.whl
Upload date: Dec 22, 2025
Size: 47.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for epub_pipeline-0.1.0a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0d4967b27039216306369246a3549e8539f2ffc9848d0896747f3a4e16c1aa7c`
MD5	`8739f46fc812a07f77c0067c8d314db4`
BLAKE2b-256	`31591e1d5b1d9b58d311f3f1c04c3312132e9e8499cec5f776a8ca22d37833d8`

See more details on using hashes here.

epub-pipeline 0.1.0a1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

📚 Epub Pipeline

🚀 Key Features

🛠️ Installation

1. Prerequisites

2. Install Package

3. Configuration (.env)

4. Google Drive (Optional)

🎮 Usage

Basic Usage

CLI Options

Examples

🛠️ Debugging Tools

💻 Development

Setup

Running Tests

Manual Linting

🔗 Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes