Skip to main content

Convert ZIM files to EPUB format

Project description

ZIM to EPUB Converter

A Python command-line tool to convert ZIM files (used by Kiwix and others for offline content) to EPUB format for e-readers.

Features

  • Convert ZIM files to EPUB format with robust error handling
  • Option to include or exclude images
  • Automatic table of contents generation based on article names
  • Limit the number of articles to include
  • Preserves metadata from the ZIM file
  • Clean, readable formatting for e-readers
  • Handles URL-encoded paths and special characters
  • Supports various ZIM file structures and formats
  • Extracts content from main entry when standard article paths aren't available
  • Avoids duplicate images in the output EPUB
  • Full crawl mode for problematic ZIM files

Platform Support

This package is compatible with:

  • Linux (Debian, Ubuntu, Fedora, etc.)
  • macOS

Note: Windows is not currently supported due to limitations with the libzim library.

Recent Updates

  • Package Structure: Reorganized into a proper Python package structure for better maintainability
  • Improved URL handling: Added support for URL-encoded paths and special characters
  • Enhanced image processing: Fixed issues with duplicate images and improved mimetype detection
  • Better article extraction: Added multiple methods to extract articles from different ZIM file structures
  • Robust error handling: Added comprehensive error handling and fallback mechanisms
  • Detailed logging: Added verbose logging to help diagnose issues
  • CI/CD Pipeline: Added GitHub Actions for automated testing and releases
  • Full crawl mode: Added option to crawl through all entries in the ZIM file

Installation

Prerequisites

  • Python 3.6 or higher
  • C++ libzim library (required for the Python bindings)
  • Linux or macOS operating system

Installing C++ libzim

macOS

brew install libzim

Debian/Ubuntu

apt-get install libzim-dev

Fedora

dnf install libzim-devel

Installing from PyPI

You can install the package directly from PyPI:

USE_SYSTEM_LIBZIM=1 pip install zim2epub

Installing from Source

  1. Clone this repository:

    git clone https://github.com/izzoa/zim2epub.git
    cd zim2epub
    
  2. Install the required dependencies:

    USE_SYSTEM_LIBZIM=1 pip install -r requirements.txt
    
  3. Install the package:

    pip install .
    

    Or in development mode:

    pip install -e .
    

Usage

Basic usage

python -m zim2epub.cli path/to/your/file.zim

This will create an EPUB file with the same name as the input file in the current directory.

Command-line Options

usage: python -m zim2epub.cli [-h] [-o OUTPUT] [--no-images] [--no-toc] [--max-articles MAX_ARTICLES] [-v] [--full-crawl] zim_file

Convert ZIM files to EPUB format

positional arguments:
  zim_file              Path to the ZIM file to convert

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        Path for the output EPUB file (default: same as input with .epub extension)
  --no-images           Do not include images in the EPUB (default: False)
  --no-toc              Do not generate a table of contents (default: False)
  --max-articles MAX_ARTICLES
                        Maximum number of articles to include (default: None)
  -v, --verbose         Show verbose output (default: False)
  --full-crawl          Use full crawl mode to extract all articles (default: False)

Examples

Convert a ZIM file without images (useful for smaller file size):

python -m zim2epub.cli wikipedia.zim --no-images

Convert a ZIM file with a custom output path:

python -m zim2epub.cli wikipedia.zim -o my-wikipedia.epub

Convert only the first 100 articles of a ZIM file:

python -m zim2epub.cli wikipedia.zim --max-articles 100

Enable verbose output for debugging:

python -m zim2epub.cli wikipedia.zim -v

Enable full crawl mode for problematic ZIM files:

python -m zim2epub.cli wikipedia.zim --full-crawl

Using as a Library

You can also use the ZimToEpub class directly in your Python code:

from zim2epub import ZimToEpub

converter = ZimToEpub(
    zim_path="path/to/file.zim",
    output_path="output.epub",
    include_images=True,
    generate_toc=True,
    max_articles=None,
    verbose=True,
    full_crawl=False  # Set to True to use full crawl mode
)

output_path = converter.convert()
print(f"EPUB created at: {output_path}")

Development

Running Tests

pytest

Building the Package

python -m build

Creating a Release

  1. Update the version in zim2epub/__init__.py
  2. Create a new tag:
    git tag -a v0.1.0 -m "Release v0.1.0"
    
  3. Push the tag:
    git push origin v0.1.0
    

The GitHub Actions workflow will automatically build and publish the release to PyPI.

Troubleshooting

If you encounter issues:

  1. Try running with the -v flag to see detailed logs
  2. Make sure you have the C++ libzim library installed
  3. Check that your ZIM file is valid and not corrupted
  4. For image issues, try using the --no-images flag
  5. For problematic ZIM files, try using the --full-crawl flag

Requirements

  • Python 3.6 or higher
  • libzim (Python bindings for the ZIM file format)
  • EbookLib (for EPUB creation)
  • BeautifulSoup4 (for HTML parsing)
  • tqdm (for progress bars)
  • lxml (for XML processing)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • The OpenZIM project for the libzim library
  • EbookLib for EPUB creation functionality

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zim2epub-0.1.4.tar.gz (23.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zim2epub-0.1.4-py3-none-any.whl (27.0 kB view details)

Uploaded Python 3

File details

Details for the file zim2epub-0.1.4.tar.gz.

File metadata

  • Download URL: zim2epub-0.1.4.tar.gz
  • Upload date:
  • Size: 23.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for zim2epub-0.1.4.tar.gz
Algorithm Hash digest
SHA256 af05ce72de84ce66b6587255d8dd0e2cab419adfd72382f77eb94b16d8ebf4ea
MD5 f8b2faeda1b439624eb65daca2588bf7
BLAKE2b-256 b6ecc2bf89543f9d0017cf19588d9f32211bf208877550f7b552a405305d0671

See more details on using hashes here.

File details

Details for the file zim2epub-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: zim2epub-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 27.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for zim2epub-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d9f41a24d5e1d663772462231f1abe3369d0e789e1f3b73a3e26b24b4b89d8f1
MD5 724f1c3c51196370c85ef2ed0a923bca
BLAKE2b-256 bb6f73da7ef125c4c5d96be2bdbdab0c9f78017497febf5383258f4d1b7d0615

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page