Pure-Python EPUB → markdown converter plugin for dikw client import.
Project description
dikw-converter-epub
Pure-Python EPUB → markdown converter plugin for dikw-core's
dikw client import. Once installed alongside dikw-core, running
dikw client import book.epub
parses the EPUB locally and commits the converted markdown + assets into
<base>/sources/book/.
What it produces
Given book.epub, the plugin writes:
<base>/sources/book/
├── book.md # H1 book title (if any), italic author, chapter H2s
└── assets/
├── book.epub # original, kept as provenance
└── <opf-relative-path>/ # extracted images, named by their OPF manifest href
├── images/cover.jpg
└── images/figure-1.png
Asset paths inside assets/ match each image's href in the EPUB's OPF
manifest — i.e. the path relative to the OPF file's directory. So a
Calibre-produced EPUB whose cover lives at zip path OEBPS/images/cover.jpg
lands at assets/images/cover.jpg (the OEBPS/ publication-root prefix
is stripped automatically by the EPUB href-resolution model). A Pandoc
EPUB whose images live under EPUB/media/ produces assets/media/....
Design choices (v0.1)
- No third-party dependencies. Uses only
zipfile,xml.etree.ElementTree, andhtml.parserfrom the Python stdlib. No ebooklib, no markdownify. Trade-off: ~5% of edge-case EPUBs (non-standard OPF layouts, exotic inline XHTML) may need follow-up patches. - Asset references use wikilink syntax (
![[path|alt]]). dikw-core's md_inspect accepts bothand the wikilink form; the wikilink form is the only one that handles asset paths containing(or)— common in user-named EPUB files (book(1).epub) — and alt text containing]. - Fresh
output_dirassumed. dikw-core's importer creates a fresh temp directory and hands it toconvert(). If you're calling this plugin directly, pass an empty path you control — reusing a dirty directory will leave stale assets from a previous run. - One markdown file per EPUB. Chapters become H2 sections in a single
<stem>.md. Per-chapter splitting is deferred to a future minor version. - Deterministic output. The same EPUB bytes produce byte-identical markdown + assets on every run (no timestamps, no random IDs).
<nav>/<header>/<footer>/<aside>/<script>/<style>are stripped during XHTML walk. Repeats-on-every-chapter nav blocks don't survive into the markdown.- Heading levels are shifted so that the book title is the only H1,
chapter titles are H2, and a chapter's internal XHTML headings sit
under that. If the EPUB has no
<dc:title>metadata, the H1 line is skipped and chapters become the top level. - Non-UTF-8 XHTML is decoded with
errors="replace". XHTML in the wild lies about its encoding often enough that strict decoding causes more pain than the occasional�replacement character in output.
Install
# In a real dikw client environment:
pip install dikw-converter-epub
# Upgrade later:
pip install --upgrade dikw-converter-epub
# Pin a specific version:
pip install 'dikw-converter-epub==0.1.0'
# Uninstall — the entry-point disappears on next discovery.
pip uninstall dikw-converter-epub
# For local development from this monorepo:
pip install -e packages/dikw-converter-epub
Changelog
See CHANGELOG.md for the per-release history. Each
GitHub Release also carries the same notes; published wheels and
sdists are attached there for offline / air-gapped installs.
Run the tests
uv run pytest packages/dikw-converter-epub
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dikw_converter_epub-0.1.0.tar.gz.
File metadata
- Download URL: dikw_converter_epub-0.1.0.tar.gz
- Upload date:
- Size: 22.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29c515f25e99b2bdfd62a1cfdd90482bfb24850ed8ae0dd19c92e54732c01f7f
|
|
| MD5 |
065b63e647d1eb2c8fbd758c53feca95
|
|
| BLAKE2b-256 |
7e267081eb0a11ce34c6b78691904fa822faf0ecb66e85696dde20d01b83f61a
|
Provenance
The following attestation bundles were made for dikw_converter_epub-0.1.0.tar.gz:
Publisher:
release.yml on OpenDIKW/dikw-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dikw_converter_epub-0.1.0.tar.gz -
Subject digest:
29c515f25e99b2bdfd62a1cfdd90482bfb24850ed8ae0dd19c92e54732c01f7f - Sigstore transparency entry: 1553352322
- Sigstore integration time:
-
Permalink:
OpenDIKW/dikw-plugins@184ceb9552454096de264522e04dd3698768e8c2 -
Branch / Tag:
refs/tags/dikw-converter-epub-v0.1.0 - Owner: https://github.com/OpenDIKW
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@184ceb9552454096de264522e04dd3698768e8c2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dikw_converter_epub-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dikw_converter_epub-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b24062f1e82578627584bc488dac5764ecbdb27f0c8cea2fb7e3770b957e9e08
|
|
| MD5 |
9ac5a05b5361112c0207a86bc5bc71df
|
|
| BLAKE2b-256 |
0dc99737f6b88554c04944d6d7d762cfb1d2951eab091747ecc4595b287e757f
|
Provenance
The following attestation bundles were made for dikw_converter_epub-0.1.0-py3-none-any.whl:
Publisher:
release.yml on OpenDIKW/dikw-plugins
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dikw_converter_epub-0.1.0-py3-none-any.whl -
Subject digest:
b24062f1e82578627584bc488dac5764ecbdb27f0c8cea2fb7e3770b957e9e08 - Sigstore transparency entry: 1553352336
- Sigstore integration time:
-
Permalink:
OpenDIKW/dikw-plugins@184ceb9552454096de264522e04dd3698768e8c2 -
Branch / Tag:
refs/tags/dikw-converter-epub-v0.1.0 - Owner: https://github.com/OpenDIKW
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@184ceb9552454096de264522e04dd3698768e8c2 -
Trigger Event:
push
-
Statement type: