Convert PDF files to nicely structured Markdown and EPUB format
Project description
Epubify
Convert PDF files to nicely structured Markdown and EPUB format with intelligent layout detection.
Features
- Smart layout detection for books and academic papers
- Advanced text extraction and OCR capabilities
- Table detection and formatting
- Image extraction and optimization
- Clean markdown output with preserved structure
- EPUB generation with customizable styling
- Multi-language support
- GPU acceleration support (NVIDIA, AMD, Apple Silicon)
Installation
From PyPI (recommended)
pip install epubify
Using uv
uv tool install epubify
Using pipx
pipx install epubify
From source
git clone https://github.com/mustafa-zidan/epubify.git
cd epubify
uv sync
Homebrew (planned)
A Homebrew tap is planned for future releases:
# Coming soon
brew install mustafa-zidan/tap/epubify
For GPU support (NVIDIA/AMD/Apple Silicon), follow the official PyTorch installation guide.
Dependencies
- Python 3.10+
- uv (recommended for dependency management)
- PyTorch (with CUDA/ROCm/MPS support)
- marker-pdf, transformers, markdown
Usage
Command Line
epubify input.pdf
Or via uv:
uv run epubify input.pdf
Options:
| Option | Description |
|---|---|
--max-pages INT |
Maximum number of pages to process |
--start-page INT |
Page number to start from |
--skip-epub |
Skip EPUB generation, only create markdown |
--skip-md |
Skip markdown generation, use existing markdown files |
As a Library
from pathlib import Path
from epubify.pdf2md import convert_pdf
from epubify.mark2epub import convert_to_epub
# Convert PDF to Markdown
convert_pdf("input.pdf", Path("./output/input"))
# Convert Markdown to EPUB
convert_to_epub(Path("./output/input"), Path("./output"))
Output Structure
output_directory/
├── document_name/
│ ├── document_name.md
│ ├── document_name.epub
│ ├── document_name_metadata.json
│ └── images/
│ ├── image1.png
│ ├── image2.jpg
│ └── ...
Development
Setup
git clone https://github.com/mustafa-zidan/epubify.git
cd epubify
uv sync --group dev
Running tests
uv run pytest
CI/CD
This project uses GitHub Actions for:
- CI (
ci.yml) - Runs tests across Python 3.10-3.13 on every push/PR - Qodana (
qodana_code_quality.yml) - Static code analysis via JetBrains Qodana - Publish (
publish.yml) - Automatically publishes to PyPI on GitHub releases using trusted publishing
Publishing a new release
- Update the version in
pyproject.toml - Create a GitHub release with a tag matching the version (e.g.,
v0.1.0) - The publish workflow will automatically build and upload to PyPI
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a new branch for your feature
- Commit your changes
- Push to your branch
- Create a Pull Request
Known Issues
- Some image embedding might need manual adjustment
- Some complex mathematical equations might not be perfectly converted
- Certain PDF layouts with multiple columns may require manual adjustment
- Font detection might be imperfect in some cases
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- marker-pdf for PDF processing
- PyTorch for GPU acceleration
- Transformers for advanced text processing
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file epubify-0.1.0.tar.gz.
File metadata
- Download URL: epubify-0.1.0.tar.gz
- Upload date:
- Size: 187.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0192bac74d488e77ba4ad6f4737032a3ed42cfd1c99b709b2cae95b14a839e2
|
|
| MD5 |
7733066bd737379780f2a4408e27b26a
|
|
| BLAKE2b-256 |
66effa85a5acd8dacbfff3574d0afd27fc5802ae6cd98f3550e3b8b268dd090b
|
Provenance
The following attestation bundles were made for epubify-0.1.0.tar.gz:
Publisher:
release.yml on mustafa-zidan/epubify
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
epubify-0.1.0.tar.gz -
Subject digest:
e0192bac74d488e77ba4ad6f4737032a3ed42cfd1c99b709b2cae95b14a839e2 - Sigstore transparency entry: 1280749041
- Sigstore integration time:
-
Permalink:
mustafa-zidan/epubify@3801a7392eff1d892d484717a74a4593efc412c8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mustafa-zidan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3801a7392eff1d892d484717a74a4593efc412c8 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file epubify-0.1.0-py3-none-any.whl.
File metadata
- Download URL: epubify-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60adfbd345ccc4936e3d9df4803f817ea0bc45a5a9409764a014301dbbc057e3
|
|
| MD5 |
3b3f53fb8172998763fc6ac57ecdcf58
|
|
| BLAKE2b-256 |
e6b4e5954565205364511951fdc2bb7c0c788958beed3e5dd37dd0b58fc77815
|
Provenance
The following attestation bundles were made for epubify-0.1.0-py3-none-any.whl:
Publisher:
release.yml on mustafa-zidan/epubify
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
epubify-0.1.0-py3-none-any.whl -
Subject digest:
60adfbd345ccc4936e3d9df4803f817ea0bc45a5a9409764a014301dbbc057e3 - Sigstore transparency entry: 1280749045
- Sigstore integration time:
-
Permalink:
mustafa-zidan/epubify@3801a7392eff1d892d484717a74a4593efc412c8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/mustafa-zidan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@3801a7392eff1d892d484717a74a4593efc412c8 -
Trigger Event:
workflow_dispatch
-
Statement type: