Skip to main content

Convert PDF files to nicely structured Markdown and EPUB format

Project description

Epubify

CI PyPI version Python 3.10+ License: MIT

Convert PDF files to nicely structured Markdown and EPUB format with intelligent layout detection.

Features

  • Smart layout detection for books and academic papers
  • Advanced text extraction and OCR capabilities
  • Table detection and formatting
  • Image extraction and optimization
  • Clean markdown output with preserved structure
  • EPUB generation with customizable styling
  • Multi-language support
  • GPU acceleration support (NVIDIA, AMD, Apple Silicon)

Installation

From PyPI (recommended)

pip install epubify

Using uv

uv tool install epubify

Using pipx

pipx install epubify

From source

git clone https://github.com/mustafa-zidan/epubify.git
cd epubify
uv sync

Homebrew (planned)

A Homebrew tap is planned for future releases:

# Coming soon
brew install mustafa-zidan/tap/epubify

For GPU support (NVIDIA/AMD/Apple Silicon), follow the official PyTorch installation guide.

Dependencies

  • Python 3.10+
  • uv (recommended for dependency management)
  • PyTorch (with CUDA/ROCm/MPS support)
  • marker-pdf, transformers, markdown

Usage

Command Line

epubify input.pdf

Or via uv:

uv run epubify input.pdf

Options:

Option Description
--max-pages INT Maximum number of pages to process
--start-page INT Page number to start from
--skip-epub Skip EPUB generation, only create markdown
--skip-md Skip markdown generation, use existing markdown files

As a Library

from pathlib import Path
from epubify.pdf2md import convert_pdf
from epubify.mark2epub import convert_to_epub

# Convert PDF to Markdown
convert_pdf("input.pdf", Path("./output/input"))

# Convert Markdown to EPUB
convert_to_epub(Path("./output/input"), Path("./output"))

Output Structure

output_directory/
├── document_name/
│   ├── document_name.md
│   ├── document_name.epub
│   ├── document_name_metadata.json
│   └── images/
│       ├── image1.png
│       ├── image2.jpg
│       └── ...

Development

Setup

git clone https://github.com/mustafa-zidan/epubify.git
cd epubify
uv sync --group dev

Running tests

uv run pytest

CI/CD

This project uses GitHub Actions for:

  • CI (ci.yml) - Runs tests across Python 3.10-3.13 on every push/PR
  • Qodana (qodana_code_quality.yml) - Static code analysis via JetBrains Qodana
  • Publish (publish.yml) - Automatically publishes to PyPI on GitHub releases using trusted publishing

Publishing a new release

  1. Update the version in pyproject.toml
  2. Create a GitHub release with a tag matching the version (e.g., v0.1.0)
  3. The publish workflow will automatically build and upload to PyPI

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a new branch for your feature
  3. Commit your changes
  4. Push to your branch
  5. Create a Pull Request

Known Issues

  • Some image embedding might need manual adjustment
  • Some complex mathematical equations might not be perfectly converted
  • Certain PDF layouts with multiple columns may require manual adjustment
  • Font detection might be imperfect in some cases

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epubify-0.1.0.tar.gz (187.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

epubify-0.1.0-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file epubify-0.1.0.tar.gz.

File metadata

  • Download URL: epubify-0.1.0.tar.gz
  • Upload date:
  • Size: 187.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epubify-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e0192bac74d488e77ba4ad6f4737032a3ed42cfd1c99b709b2cae95b14a839e2
MD5 7733066bd737379780f2a4408e27b26a
BLAKE2b-256 66effa85a5acd8dacbfff3574d0afd27fc5802ae6cd98f3550e3b8b268dd090b

See more details on using hashes here.

Provenance

The following attestation bundles were made for epubify-0.1.0.tar.gz:

Publisher: release.yml on mustafa-zidan/epubify

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file epubify-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: epubify-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epubify-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 60adfbd345ccc4936e3d9df4803f817ea0bc45a5a9409764a014301dbbc057e3
MD5 3b3f53fb8172998763fc6ac57ecdcf58
BLAKE2b-256 e6b4e5954565205364511951fdc2bb7c0c788958beed3e5dd37dd0b58fc77815

See more details on using hashes here.

Provenance

The following attestation bundles were made for epubify-0.1.0-py3-none-any.whl:

Publisher: release.yml on mustafa-zidan/epubify

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page