Convert ZIM files to EPUB format
Project description
ZIM to EPUB Converter
A Python command-line tool to convert ZIM files (used by Kiwix and others for offline content) to EPUB format for e-readers.
Features
- Convert ZIM files to EPUB format with robust error handling
- Option to include or exclude images
- Automatic table of contents generation based on article names
- Limit the number of articles to include
- Preserves metadata from the ZIM file
- Clean, readable formatting for e-readers
- Handles URL-encoded paths and special characters
- Supports various ZIM file structures and formats
- Extracts content from main entry when standard article paths aren't available
- Avoids duplicate images in the output EPUB
- Full crawl mode for problematic ZIM files
Platform Support
This package is compatible with:
- Linux (Debian, Ubuntu, Fedora, etc.)
- macOS
Note: Windows is not currently supported due to limitations with the libzim library.
Recent Updates
- Package Structure: Reorganized into a proper Python package structure for better maintainability
- Improved URL handling: Added support for URL-encoded paths and special characters
- Enhanced image processing: Fixed issues with duplicate images and improved mimetype detection
- Better article extraction: Added multiple methods to extract articles from different ZIM file structures
- Robust error handling: Added comprehensive error handling and fallback mechanisms
- Detailed logging: Added verbose logging to help diagnose issues
- CI/CD Pipeline: Added GitHub Actions for automated testing and releases
- Full crawl mode: Added option to crawl through all entries in the ZIM file
Installation
Prerequisites
- Python 3.6 or higher
- C++ libzim library (required for the Python bindings)
- Linux or macOS operating system
Installing C++ libzim
macOS
brew install libzim
Debian/Ubuntu
apt-get install libzim-dev
Fedora
dnf install libzim-devel
Installing from PyPI
You can install the package directly from PyPI:
USE_SYSTEM_LIBZIM=1 pip install zim2epub
Installing from Source
-
Clone this repository:
git clone https://github.com/izzoa/zim2epub.git cd zim2epub
-
Install the required dependencies:
USE_SYSTEM_LIBZIM=1 pip install -r requirements.txt
-
Install the package:
pip install .
Or in development mode:
pip install -e .
Usage
Basic usage
python -m zim2epub.cli path/to/your/file.zim
This will create an EPUB file with the same name as the input file in the current directory.
Command-line Options
usage: python -m zim2epub.cli [-h] [-o OUTPUT] [--no-images] [--no-toc] [--max-articles MAX_ARTICLES] [-v] [--full-crawl] zim_file
Convert ZIM files to EPUB format
positional arguments:
zim_file Path to the ZIM file to convert
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Path for the output EPUB file (default: same as input with .epub extension)
--no-images Do not include images in the EPUB (default: False)
--no-toc Do not generate a table of contents (default: False)
--max-articles MAX_ARTICLES
Maximum number of articles to include (default: None)
-v, --verbose Show verbose output (default: False)
--full-crawl Use full crawl mode to extract all articles (default: False)
Examples
Convert a ZIM file without images (useful for smaller file size):
python -m zim2epub.cli wikipedia.zim --no-images
Convert a ZIM file with a custom output path:
python -m zim2epub.cli wikipedia.zim -o my-wikipedia.epub
Convert only the first 100 articles of a ZIM file:
python -m zim2epub.cli wikipedia.zim --max-articles 100
Enable verbose output for debugging:
python -m zim2epub.cli wikipedia.zim -v
Enable full crawl mode for problematic ZIM files:
python -m zim2epub.cli wikipedia.zim --full-crawl
Using as a Library
You can also use the ZimToEpub class directly in your Python code:
from zim2epub import ZimToEpub
converter = ZimToEpub(
zim_path="path/to/file.zim",
output_path="output.epub",
include_images=True,
generate_toc=True,
max_articles=None,
verbose=True,
full_crawl=False # Set to True to use full crawl mode
)
output_path = converter.convert()
print(f"EPUB created at: {output_path}")
Development
Running Tests
pytest
Building the Package
python -m build
Creating a Release
- Update the version in
zim2epub/__init__.py - Create a new tag:
git tag -a v0.1.0 -m "Release v0.1.0"
- Push the tag:
git push origin v0.1.0
The GitHub Actions workflow will automatically build and publish the release to PyPI.
Troubleshooting
If you encounter issues:
- Try running with the
-vflag to see detailed logs - Make sure you have the C++ libzim library installed
- Check that your ZIM file is valid and not corrupted
- For image issues, try using the
--no-imagesflag - For problematic ZIM files, try using the
--full-crawlflag
Requirements
- Python 3.6 or higher
- libzim (Python bindings for the ZIM file format)
- EbookLib (for EPUB creation)
- BeautifulSoup4 (for HTML parsing)
- tqdm (for progress bars)
- lxml (for XML processing)
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zim2epub-0.1.4.tar.gz.
File metadata
- Download URL: zim2epub-0.1.4.tar.gz
- Upload date:
- Size: 23.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af05ce72de84ce66b6587255d8dd0e2cab419adfd72382f77eb94b16d8ebf4ea
|
|
| MD5 |
f8b2faeda1b439624eb65daca2588bf7
|
|
| BLAKE2b-256 |
b6ecc2bf89543f9d0017cf19588d9f32211bf208877550f7b552a405305d0671
|
File details
Details for the file zim2epub-0.1.4-py3-none-any.whl.
File metadata
- Download URL: zim2epub-0.1.4-py3-none-any.whl
- Upload date:
- Size: 27.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9f41a24d5e1d663772462231f1abe3369d0e789e1f3b73a3e26b24b4b89d8f1
|
|
| MD5 |
724f1c3c51196370c85ef2ed0a923bca
|
|
| BLAKE2b-256 |
bb6f73da7ef125c4c5d96be2bdbdab0c9f78017497febf5383258f4d1b7d0615
|