A python-facing API for creating and interacting with ZIM files
Project description
python-libzim
libzim
module allows you to read and write ZIM
files in Python. It provides a shallow python
interface on top of the C++ libzim
library.
It is primarily used in openZIM scrapers like sotoki
or youtube2zim
.
Installation
pip install libzim
Our PyPI wheels bundle a recent release of the C++ libzim and are available for the following platforms:
- macOS for
x86_64
andarm64
- GNU/Linux for
x86_64
,armhf
andaarch64
- Linux+musl for
x86_64
andaarch64
Wheels are available for both CPython and PyPy.
Users on other platforms can install the source distribution (see Building below).
Contributions
git clone git@github.com:openzim/python-libzim.git && cd python-libzim
# python -m venv env && source env/bin/activate
pip install -U setuptools invoke
invoke download-libzim install-dev build-ext test
# invoke --list for available development helpers
See CONTRIBUTING.md for additional details then Open a ticket or submit a Pull Request on Github 🤗!
Usage
Read a ZIM file
from libzim.reader import Archive
from libzim.search import Query, Searcher
from libzim.suggestion import SuggestionSearcher
zim = Archive("test.zim")
print(f"Main entry is at {zim.main_entry.get_item().path}")
entry = zim.get_entry_by_path("home/fr")
print(f"Entry {entry.title} at {entry.path} is {entry.get_item().size}b.")
print(bytes(entry.get_item().content).decode("UTF-8"))
# searching using full-text index
search_string = "Welcome"
query = Query().set_query(search_string)
searcher = Searcher(zim)
search = searcher.search(query)
search_count = search.getEstimatedMatches()
print(f"there are {search_count} matches for {search_string}")
print(list(search.getResults(0, search_count)))
# accessing suggestions
search_string = "kiwix"
suggestion_searcher = SuggestionSearcher(zim)
suggestion = suggestion_searcher.suggest(search_string)
suggestion_count = suggestion.getEstimatedMatches()
print(f"there are {suggestion_count} matches for {search_string}")
print(list(suggestion.getResults(0, suggestion_count)))
Write a ZIM file
from libzim.writer import Creator, Item, StringProvider, FileProvider, Hint
class MyItem(Item):
def __init__(self, title, path, content = "", fpath = None):
super().__init__()
self.path = path
self.title = title
self.content = content
self.fpath = fpath
def get_path(self):
return self.path
def get_title(self):
return self.title
def get_mimetype(self):
return "text/html"
def get_contentprovider(self):
if self.fpath is not None:
return FileProvider(self.fpath)
return StringProvider(self.content)
def get_hints(self):
return {Hint.FRONT_ARTICLE: True}
content = """<html><head><meta charset="UTF-8"><title>Web Page Title</title></head>
<body><h1>Welcome to this ZIM</h1><p>Kiwix</p></body></html>"""
item = MyItem("Hello Kiwix", "home", content)
item2 = MyItem("Bonjour Kiwix", "home/fr", None, "home-fr.html")
with Creator("test.zim").config_indexing(True, "eng") as creator:
creator.set_mainpath("home")
creator.add_item(item)
creator.add_item(item2)
for name, value in {
"creator": "python-libzim",
"description": "Created in python",
"name": "my-zim",
"publisher": "You",
"title": "Test ZIM",
}.items():
creator.add_metadata(name.title(), value)
Building
libzim
package building offers different behaviors via environment variables
Variable | Example | Use case |
---|---|---|
LIBZIM_DL_VERSION |
8.1.1 or 2023-04-14 |
Specify the C++ libzim binary version to download and bundle. Either a release version string or a date, in which case it downloads a nightly |
USE_SYSTEM_LIBZIM |
1 |
Uses LDFLAG and CFLAGS to find the libzim to link against. Resulting wheel won't bundle C++ libzim. |
DONT_DOWNLOAD_LIBZIM |
1 |
Disable downloading of C++ libzim. Place headers in include/ and libzim dylib/so in libzim/ if no using system libzim. It will be bundled in wheel. |
PROFILE |
1 |
Enable profile tracing in Cython extension. Required for Cython code coverage reporting. |
SIGN_APPLE |
1 |
Set to sign and notarize the extension for macOS. Requires following informations |
APPLE_SIGNING_IDENTITY |
Developer ID Application: OrgName (ID) |
Required for signing on macOS |
APPLE_SIGNING_KEYCHAIN_PATH |
/tmp/build.keychain |
Path to the Keychain containing the certificate to sign for macOS with |
APPLE_SIGNING_KEYCHAIN_PROFILE |
build |
Name of the profile in the specified Keychain |
Examples
Default: downloading and bundling most appropriate libzim release binary
python3 -m build
Using system libzim (brew, debian or manually installed) - not bundled
# using system-installed C++ libzim
brew install libzim # macOS
apt-get install libzim-devel # debian
dnf install libzim-dev # fedora
USE_SYSTEM_LIBZIM=1 python3 -m build --wheel
# using a specific C++ libzim
USE_SYSTEM_LIBZIM=1 \
CFLAGS="-I/usr/local/include" \
LDFLAGS="-L/usr/local/lib"
DYLD_LIBRARY_PATH="/usr/local/lib" \
LD_LIBRARY_PATH="/usr/local/lib" \
python3 -m build --wheel
Other platforms
On platforms for which there is no official binary available, you'd have to compile C++ libzim from source first then either use DONT_DOWNLOAD_LIBZIM
or USE_SYSTEM_LIBZIM
.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for libzim-3.3.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9eb28c2baf28cee7b416ccdb727af44ec07bfc40323adabb852e1b01385f4c5 |
|
MD5 | 61de99eb97be21795644d26eb2dd5016 |
|
BLAKE2b-256 | 2f5c0cbfadbdd5e3a052b40f42813a26817439c4373d020de8e591d46cd86578 |
Hashes for libzim-3.3.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f9209b5b2a876465565f1ea8303d702bc8a658eb4c9e1a1c3b9228b02442d814 |
|
MD5 | 74d16bb6631456e8f406de060ae7d37d |
|
BLAKE2b-256 | a2bd118a2e860142f3fb135ea92b9582e8d95407c479a0fa0dc26ba5957a9745 |
Hashes for libzim-3.3.0-cp312-cp312-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad4e6951f61774501a055d65974b7c3ec00f39f5204118322f3ae5098b95627f |
|
MD5 | 294f63fa02b05f63a5378f3799a75677 |
|
BLAKE2b-256 | c6259542326e2ae94c538205e20c6e088daf262077deef2bf84457370aac9275 |
Hashes for libzim-3.3.0-cp312-cp312-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34e42e896307f5289e6d65e946c0cfe8094c74c15933c707e8935d9b903c7fbf |
|
MD5 | e7d6c958261439778fd6cc52bb7e2348 |
|
BLAKE2b-256 | 900ad2a28ef147179669b5dcaf35799993a15feb83275292e6d9533ba0988bef |
Hashes for libzim-3.3.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c47922df765801068371b25954f752ff7243ba803c1e26141901fb4f033357ad |
|
MD5 | 0a1d60d441ee46d1d512815aa0ffa2be |
|
BLAKE2b-256 | ebcecd3d0ef9ca2e70af775ec6a296794c01db8f9c57f4b2a8312acfd3a2ef96 |
Hashes for libzim-3.3.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 151cc5bc71f38b97f54b3dfd951a93eb4a89ca7a374a1c0b296f2ba06346a987 |
|
MD5 | cdca9f47f2982b70c23af8a5ff8a9572 |
|
BLAKE2b-256 | bae11fdf2de582348da396d03954f9e7e0309157ae67e5df2b274974fbd3a42a |
Hashes for libzim-3.3.0-cp311-cp311-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db9cfff3d8145d5361b241fdf2187f38152a2d1e45bbd9c6039170245c09c2f6 |
|
MD5 | 9dd3d60078a0d08eb96ec7a8a31cb371 |
|
BLAKE2b-256 | 84d50af7aea07c39055e2799ee7e4e7f5626111677c9a9306a73327a319943b2 |
Hashes for libzim-3.3.0-cp311-cp311-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5254422bc6664577a17a49833493318d6c9eee8b507d3d268b2d727d6b555ae9 |
|
MD5 | 753ff48d7e92ad100587f6138f7ff249 |
|
BLAKE2b-256 | ae2789bfd31a38f7a1998183305507690d209d63286947935742c200c81163aa |
Hashes for libzim-3.3.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9b2f6809f6107672402e476a5ba4a96df4f413ee5441158e6b4d72baa63ab41 |
|
MD5 | 882bb8f6b6985328e155d60ec87b09d7 |
|
BLAKE2b-256 | d394c57939d54c7b057cd46c1a7e1f4a17fb90a620e4472d46bd0115ecbb2cb0 |
Hashes for libzim-3.3.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81b00d9adb757e0e7b530a5de871432e905526870c13d6abad99dde599a41f3f |
|
MD5 | 2fd3b725aea81bbfc530772334d63026 |
|
BLAKE2b-256 | e0482f8b77e154da902d5ddf7bb95fd6059ee45d32ade67840fe584e0632f0cc |
Hashes for libzim-3.3.0-cp310-cp310-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7f0bb6b0dfc33fd8e42cc89bd067ff2010cf2d3b028cba2589c649b8ba79d9b |
|
MD5 | ca5b3961735a54721c175468dae17b90 |
|
BLAKE2b-256 | 7d55aee4bdd87f804e2c5070545051798decddef9643df1db3a69c89ef91bcc6 |
Hashes for libzim-3.3.0-cp310-cp310-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7fe71bd1aaf4595635c9a806e838271cec4fbc7e3d6dce7eb5e87c189357a59 |
|
MD5 | 1f1b67d6305e1212c882f0239cda14ca |
|
BLAKE2b-256 | 63078b0dbd7765c9839925aa29f0c9548126215c7348a23d07b91f45d2251e65 |
Hashes for libzim-3.3.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a831b23d2701f9eb9fd5f1f4394e907947338f56caa5ceec76e6b720e812ac4 |
|
MD5 | 5e52c3260a4f5f3dd1c16812341166bb |
|
BLAKE2b-256 | 1fea230a8f7da7138ca5d46cea04f9b5d4bffb003675b1bb544389cce25e1d86 |
Hashes for libzim-3.3.0-cp39-cp39-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12e80635950a9ccfa750e3d01e26ac973ea7b22573c4e396573d0dde421e4b90 |
|
MD5 | 113b811a22dae057fb2944b4f538ff56 |
|
BLAKE2b-256 | 2c1c771878916668ad0028c6a2ae035793dcaf5c69455d7847001e8eda12cf35 |
Hashes for libzim-3.3.0-cp39-cp39-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79308942cfd7990d971f24bb40ec957ccf4d52f0d088eb929121aa066787f6ce |
|
MD5 | 7a47eb313679648e8ffe376fbec19172 |
|
BLAKE2b-256 | a2f40abd3ced509efe5652e8b64a516455d3f0d053796df80d6c48fe5a4efef3 |
Hashes for libzim-3.3.0-cp39-cp39-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92b52e4b3fba1e9b9bd388a106fd4652593b48cba4d123f9882604151a86b934 |
|
MD5 | eb6a646ee9a5153a47c61241ba8d10c8 |
|
BLAKE2b-256 | 3828300177787fc1a52d049451c28ef759ac37e41d8f649113b4896c72852775 |
Hashes for libzim-3.3.0-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ec8332b265941469f2b3110d0acc2d73a6c8ee3c84937e34593ad45dcbcd2b8 |
|
MD5 | 819cc7a315adfd7b29d0b50b1d0ac50d |
|
BLAKE2b-256 | 8490a02372648ca8f7dec706e4ddf6a535f54f50d4352aba6fc79bcf467424da |
Hashes for libzim-3.3.0-cp38-cp38-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c65ab47a5777f1945a430556ce880e37136298201af2c74077009348cce959b9 |
|
MD5 | 211a09f9b731da8a8ff8b8dfde10eb4b |
|
BLAKE2b-256 | ff4843e816c0b669cb7601a92130693b46e01678b49e53f91893f584e4487f73 |
Hashes for libzim-3.3.0-cp38-cp38-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7fd058ebb9a7e7890ede846cdc110700a161b13e7ff67053b4dac90ba709e7f0 |
|
MD5 | d5e1502e8a4d265b946c77480512ca11 |
|
BLAKE2b-256 | cb4af6710a540c5a738d0fc0e5fb91636ce232ec44c4ee34400cf31ffdfa0951 |
Hashes for libzim-3.3.0-cp38-cp38-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64bac298d40202dbc98220efa7c72734658e22d6ad822f31e4506eee49d27b8c |
|
MD5 | eed0f0ea6b7b836d61d7e4ed4ff7559d |
|
BLAKE2b-256 | a2aced1cd223f2a7759a735b20293ff6b479dccbe57ea2be84daaaddc6223d01 |