Skip to main content

A python-facing API for creating and interacting with ZIM files

Project description

python-libzim

libzim module allows you to read and write ZIM files in Python. It provides a shallow python interface on top of the C++ libzim library.

It is primarily used in openZIM scrapers like sotoki or youtube2zim.

Build Status CodeFactor License: GPL v3 PyPI version shields.io codecov

Installation

pip install libzim

Our PyPI wheels bundle a recent release of the C++ libzim and are available for the following platforms:

  • macOS for x86_64 and arm64
  • GNU/Linux for x86_64, armhf and aarch64
  • Linux+musl for x86_64 and aarch64

Wheels are available for both CPython and PyPy.

Users on other platforms can install the source distribution (see Building below).

Contributions

git clone git@github.com:openzim/python-libzim.git && cd python-libzim
# python -m venv env && source env/bin/activate
pip install -U setuptools invoke
invoke download-libzim install-dev build-ext test
# invoke --list for available development helpers

See CONTRIBUTING.md for additional details then Open a ticket or submit a Pull Request on Github 🤗!

Usage

Read a ZIM file

from libzim.reader import Archive
from libzim.search import Query, Searcher
from libzim.suggestion import SuggestionSearcher

zim = Archive("test.zim")
print(f"Main entry is at {zim.main_entry.get_item().path}")
entry = zim.get_entry_by_path("home/fr")
print(f"Entry {entry.title} at {entry.path} is {entry.get_item().size}b.")
print(bytes(entry.get_item().content).decode("UTF-8"))

# searching using full-text index
search_string = "Welcome"
query = Query().set_query(search_string)
searcher = Searcher(zim)
search = searcher.search(query)
search_count = search.getEstimatedMatches()
print(f"there are {search_count} matches for {search_string}")
print(list(search.getResults(0, search_count)))

# accessing suggestions
search_string = "kiwix"
suggestion_searcher = SuggestionSearcher(zim)
suggestion = suggestion_searcher.suggest(search_string)
suggestion_count = suggestion.getEstimatedMatches()
print(f"there are {suggestion_count} matches for {search_string}")
print(list(suggestion.getResults(0, suggestion_count)))

Write a ZIM file

from libzim.writer import Creator, Item, StringProvider, FileProvider, Hint


class MyItem(Item):
    def __init__(self, title, path, content = "", fpath = None):
        super().__init__()
        self.path = path
        self.title = title
        self.content = content
        self.fpath = fpath

    def get_path(self):
        return self.path

    def get_title(self):
        return self.title

    def get_mimetype(self):
        return "text/html"

    def get_contentprovider(self):
        if self.fpath is not None:
            return FileProvider(self.fpath)
        return StringProvider(self.content)

    def get_hints(self):
        return {Hint.FRONT_ARTICLE: True}


content = """<html><head><meta charset="UTF-8"><title>Web Page Title</title></head>
<body><h1>Welcome to this ZIM</h1><p>Kiwix</p></body></html>"""

item = MyItem("Hello Kiwix", "home", content)
item2 = MyItem("Bonjour Kiwix", "home/fr", None, "home-fr.html")

with Creator("test.zim").config_indexing(True, "eng") as creator:
    creator.set_mainpath("home")
    creator.add_item(item)
    creator.add_item(item2)
    for name, value in {
        "creator": "python-libzim",
        "description": "Created in python",
        "name": "my-zim",
        "publisher": "You",
        "title": "Test ZIM",
    }.items():

        creator.add_metadata(name.title(), value)

Building

libzim package building offers different behaviors via environment variables

Variable Example Use case
LIBZIM_DL_VERSION 8.1.1 or 2023-04-14 Specify the C++ libzim binary version to download and bundle. Either a release version string or a date, in which case it downloads a nightly
USE_SYSTEM_LIBZIM 1 Uses LDFLAG and CFLAGS to find the libzim to link against. Resulting wheel won't bundle C++ libzim.
DONT_DOWNLOAD_LIBZIM 1 Disable downloading of C++ libzim. Place headers in include/ and libzim dylib/so in libzim/ if no using system libzim. It will be bundled in wheel.
PROFILE 1 Enable profile tracing in Cython extension. Required for Cython code coverage reporting.
SIGN_APPLE 1 Set to sign and notarize the extension for macOS. Requires following informations
APPLE_SIGNING_IDENTITY Developer ID Application: OrgName (ID) Required for signing on macOS
APPLE_SIGNING_KEYCHAIN_PATH /tmp/build.keychain Path to the Keychain containing the certificate to sign for macOS with
APPLE_SIGNING_KEYCHAIN_PROFILE build Name of the profile in the specified Keychain

Examples

Default: downloading and bundling most appropriate libzim release binary
python3 -m build

Using system libzim (brew, debian or manually installed) - not bundled

# using system-installed C++ libzim
brew install libzim  # macOS
apt-get install libzim-devel  # debian
dnf install libzim-dev  # fedora
USE_SYSTEM_LIBZIM=1 python3 -m build --wheel

# using a specific C++ libzim
USE_SYSTEM_LIBZIM=1 \
CFLAGS="-I/usr/local/include" \
LDFLAGS="-L/usr/local/lib"
DYLD_LIBRARY_PATH="/usr/local/lib" \
LD_LIBRARY_PATH="/usr/local/lib" \
python3 -m build --wheel

Other platforms

On platforms for which there is no official binary available, you'd have to compile C++ libzim from source first then either use DONT_DOWNLOAD_LIBZIM or USE_SYSTEM_LIBZIM.

License

GPLv3 or later, see LICENSE for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

libzim-3.2.0.tar.gz (203.9 kB view details)

Uploaded Source

Built Distributions

libzim-3.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.27+ x86-64 manylinux: glibc 2.28+ x86-64

libzim-3.2.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.24+ ARM64 manylinux: glibc 2.28+ ARM64

libzim-3.2.0-cp311-cp311-macosx_11_0_x86_64.whl (7.9 MB view details)

Uploaded CPython 3.11 macOS 11.0+ x86-64

libzim-3.2.0-cp311-cp311-macosx_11_0_arm64.whl (21.7 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

libzim-3.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.27+ x86-64 manylinux: glibc 2.28+ x86-64

libzim-3.2.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.24+ ARM64 manylinux: glibc 2.28+ ARM64

libzim-3.2.0-cp310-cp310-macosx_11_0_x86_64.whl (7.9 MB view details)

Uploaded CPython 3.10 macOS 11.0+ x86-64

libzim-3.2.0-cp310-cp310-macosx_11_0_arm64.whl (21.7 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

libzim-3.2.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.27+ x86-64 manylinux: glibc 2.28+ x86-64

libzim-3.2.0-cp39-cp39-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.24+ ARM64 manylinux: glibc 2.28+ ARM64

libzim-3.2.0-cp39-cp39-macosx_11_0_x86_64.whl (7.9 MB view details)

Uploaded CPython 3.9 macOS 11.0+ x86-64

libzim-3.2.0-cp39-cp39-macosx_11_0_arm64.whl (21.7 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

libzim-3.2.0-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (9.6 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.27+ x86-64 manylinux: glibc 2.28+ x86-64

libzim-3.2.0-cp38-cp38-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl (9.3 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.24+ ARM64 manylinux: glibc 2.28+ ARM64

libzim-3.2.0-cp38-cp38-macosx_11_0_x86_64.whl (7.9 MB view details)

Uploaded CPython 3.8 macOS 11.0+ x86-64

libzim-3.2.0-cp38-cp38-macosx_11_0_arm64.whl (21.7 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

File details

Details for the file libzim-3.2.0.tar.gz.

File metadata

  • Download URL: libzim-3.2.0.tar.gz
  • Upload date:
  • Size: 203.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for libzim-3.2.0.tar.gz
Algorithm Hash digest
SHA256 571a720e120eb91b35ba057260415ff5164c77e43e8cb484853df35f46e515c8
MD5 5918ad73f9dadce5883583795fbfac81
BLAKE2b-256 4ce212c25a8e20aa5f90589930f9f2eeb10a86c1402255797986db29f19fa02e

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0c10bfb1e3342c8a60d5727073e5389d514ce59e1722fabc1ac6ac8bff3f0c6a
MD5 a4a2205f3137e0e602d474d58da2acf6
BLAKE2b-256 ac15b34e514a032f272ff58a1befa19284c8f327b9de90dcb9d4a5eff3d3b6fa

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f4ec7fa34cdc9abdc211c8a775db46c8222aa40d4365046a75aa1d56edd0e361
MD5 49415caf4c548215fe2461a990380504
BLAKE2b-256 52f287f8e50a23182fea70fdf77360695a593b206a2b9bec7c76010b16a77e5f

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp311-cp311-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp311-cp311-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 09e0e360e7c657c384037c3a49ca47af843db70863f134612c9024754c5e02a5
MD5 0557d98bc1102d77b938ffdc2a2a3532
BLAKE2b-256 d466f66c59b24f08a76bac5bcc0e8baf2dd4234488f209dd06489b6a0a74a96c

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f41186a845045b4a1778e0485f05aa1907dcb86ae0ccccd7e5438c51b7baed63
MD5 98e21928f9cc1942906c05dfc57b05f4
BLAKE2b-256 b606c769f7215a3791f5247c1b71f64b86489402832af90ceae6812344c3fa1d

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 b0cd17bcf8d161f23f7c3e0ef3f3b980d83941313b65c2cbf6b3ce2fd144ff5b
MD5 6a91f9245f468c8586baa6b560a7e039
BLAKE2b-256 4b0112b23e08fa312ca145909677f4165557d62e5d14202046a81d1c42e77f81

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 884092878ebe41d9112ba0ccf262ff18d14263668cd534159226a8c02e94ef36
MD5 605ca21b7eb304a43fafd668dd5f7a4e
BLAKE2b-256 2beefa8e936efe320257cf2bb835d806928bd0c258ed24a9bdfd1aedb336125e

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp310-cp310-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 f0515fca91693704b5d518041b12319799d11d805a7b31310eb55cd4d9f4188a
MD5 0f2e1044ea6122eec78e56bc99cb4e07
BLAKE2b-256 551e431ac0c2e3626d2f8d2cd613e00d115ffd4c6778833553c4003c0fe77926

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 116d60ca2681f1ef2f4eb28b16f0e2052654513073e36aa6729f7b7bbf647a46
MD5 64f6cb3b6bd3610d0f791a9aa17b5b0b
BLAKE2b-256 4e4eb6ceed5349e350a90e5fbe87af505793f8da1d6d452e7b50cde57378fd6f

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 dec225543809b7879b692ef5951c74eaa4a7b9caa509fcf3571fb1828cc4f7f3
MD5 bd77b816bc66f3d186ef4470aab47150
BLAKE2b-256 7352aec5f880336acfe201b5af642891a5a48c7651b97e5b6fc7dc505189c333

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp39-cp39-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp39-cp39-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 1d7eee217de160ccdd3a82062b88c380e40a59eaa570d7fc817aac22ded6b7f2
MD5 6c3280e60edd1fc90fca8a3ad83a3a7e
BLAKE2b-256 0f9ac58918205aac3608fb3534331cd0549b4f38dc97fb544af82451cb346fa4

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp39-cp39-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 9d3786dc6254eb14ea865dabc83c77ed9e8a8008c84adbd7c6e053f8a9e1bd32
MD5 57f4e7469575e7a4f59f90bed9f1e369
BLAKE2b-256 49a58d38714d342fea387776090b6a3dd19768244a6d3cd87ffa2f5d4e7113f4

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f11daef983a4bc3b9b1bd2a54b090145970642f16bbbb2bb6a17d5911f01ba39
MD5 ad5f43f137c094f039907067c7c83171
BLAKE2b-256 6b2ed5cb34dd799641b9aeba0db468775eb4ed441aa397e45b8264fd5a48a20a

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 51397e707089dbbd31ae926c88fd1aab5b577fb83ae007a1e0d636dc6ab62f52
MD5 164c71d18f90d41847be98b205ca0d44
BLAKE2b-256 842b9bdb1f6c04ea33aba7002d1eb58718141de80bc05cf4e5cdc1367014a4a7

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp38-cp38-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp38-cp38-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d56c171de01350b6f46276b226741746d4290f68616253a8f2273562d825c0a7
MD5 c058eeca0dbbec31e737fcb31a972f7a
BLAKE2b-256 579eb88d9a11367fa6c9e8a757ba9703a142e1d5b42e743ed2912ff6919b5331

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp38-cp38-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp38-cp38-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 fc967f5d3cc8ac2bf10394f20d9a0fc7a158d4e036bc4bf0340fba05118833e6
MD5 a231745ec76117e02b0fa12e059194e8
BLAKE2b-256 ca808e4e2b4fee41c1298e500ca431992d2dc9ec8ec69962b935693199e7039c

See more details on using hashes here.

File details

Details for the file libzim-3.2.0-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for libzim-3.2.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0565b1d3cbaab9f8f5009cdb06575e2ab79a0fc172fe6b27a7829163705724c7
MD5 cf4ad987880c2ff9cd618e8c071d4f64
BLAKE2b-256 9cfadbd3a899e50f8b7b9da8a22c553581faa708d5edd2ac8d8abfa34dff4fe6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page