A python-facing API for creating and interacting with ZIM files
Project description
python-libzim
libzim
module allows you to read and write ZIM
files in Python. It provides a shallow python
interface on top of the C++ libzim
library.
It is primarily used in openZIM scrapers like sotoki
or youtube2zim
.
Installation
pip install libzim
Our PyPI wheels bundle a recent release of the C++ libzim and are available for the following platforms:
- macOS for
x86_64
andarm64
- GNU/Linux for
x86_64
,armhf
andaarch64
- Linux+musl for
x86_64
andaarch64
Wheels are available for both CPython and PyPy.
Users on other platforms can install the source distribution (see Building below).
Contributions
git clone git@github.com:openzim/python-libzim.git && cd python-libzim
# python -m venv env && source env/bin/activate
pip install -U setuptools invoke
invoke download-libzim install-dev build-ext test
# invoke --list for available development helpers
See CONTRIBUTING.md for additional details then Open a ticket or submit a Pull Request on Github 🤗!
Usage
Read a ZIM file
from libzim.reader import Archive
from libzim.search import Query, Searcher
from libzim.suggestion import SuggestionSearcher
zim = Archive("test.zim")
print(f"Main entry is at {zim.main_entry.get_item().path}")
entry = zim.get_entry_by_path("home/fr")
print(f"Entry {entry.title} at {entry.path} is {entry.get_item().size}b.")
print(bytes(entry.get_item().content).decode("UTF-8"))
# searching using full-text index
search_string = "Welcome"
query = Query().set_query(search_string)
searcher = Searcher(zim)
search = searcher.search(query)
search_count = search.getEstimatedMatches()
print(f"there are {search_count} matches for {search_string}")
print(list(search.getResults(0, search_count)))
# accessing suggestions
search_string = "kiwix"
suggestion_searcher = SuggestionSearcher(zim)
suggestion = suggestion_searcher.suggest(search_string)
suggestion_count = suggestion.getEstimatedMatches()
print(f"there are {suggestion_count} matches for {search_string}")
print(list(suggestion.getResults(0, suggestion_count)))
Write a ZIM file
from libzim.writer import Creator, Item, StringProvider, FileProvider, Hint
class MyItem(Item):
def __init__(self, title, path, content = "", fpath = None):
super().__init__()
self.path = path
self.title = title
self.content = content
self.fpath = fpath
def get_path(self):
return self.path
def get_title(self):
return self.title
def get_mimetype(self):
return "text/html"
def get_contentprovider(self):
if self.fpath is not None:
return FileProvider(self.fpath)
return StringProvider(self.content)
def get_hints(self):
return {Hint.FRONT_ARTICLE: True}
content = """<html><head><meta charset="UTF-8"><title>Web Page Title</title></head>
<body><h1>Welcome to this ZIM</h1><p>Kiwix</p></body></html>"""
item = MyItem("Hello Kiwix", "home", content)
item2 = MyItem("Bonjour Kiwix", "home/fr", None, "home-fr.html")
with Creator("test.zim").config_indexing(True, "eng") as creator:
creator.set_mainpath("home")
creator.add_item(item)
creator.add_item(item2)
for name, value in {
"creator": "python-libzim",
"description": "Created in python",
"name": "my-zim",
"publisher": "You",
"title": "Test ZIM",
}.items():
creator.add_metadata(name.title(), value)
Building
libzim
package building offers different behaviors via environment variables
Variable | Example | Use case |
---|---|---|
LIBZIM_DL_VERSION |
8.1.1 or 2023-04-14 |
Specify the C++ libzim binary version to download and bundle. Either a release version string or a date, in which case it downloads a nightly |
USE_SYSTEM_LIBZIM |
1 |
Uses LDFLAG and CFLAGS to find the libzim to link against. Resulting wheel won't bundle C++ libzim. |
DONT_DOWNLOAD_LIBZIM |
1 |
Disable downloading of C++ libzim. Place headers in include/ and libzim dylib/so in libzim/ if no using system libzim. It will be bundled in wheel. |
PROFILE |
1 |
Enable profile tracing in Cython extension. Required for Cython code coverage reporting. |
SIGN_APPLE |
1 |
Set to sign and notarize the extension for macOS. Requires following informations |
APPLE_SIGNING_IDENTITY |
Developer ID Application: OrgName (ID) |
Required for signing on macOS |
APPLE_SIGNING_KEYCHAIN_PATH |
/tmp/build.keychain |
Path to the Keychain containing the certificate to sign for macOS with |
APPLE_SIGNING_KEYCHAIN_PROFILE |
build |
Name of the profile in the specified Keychain |
Examples
Default: downloading and bundling most appropriate libzim release binary
python3 -m build
Using system libzim (brew, debian or manually installed) - not bundled
# using system-installed C++ libzim
brew install libzim # macOS
apt-get install libzim-devel # debian
dnf install libzim-dev # fedora
USE_SYSTEM_LIBZIM=1 python3 -m build --wheel
# using a specific C++ libzim
USE_SYSTEM_LIBZIM=1 \
CFLAGS="-I/usr/local/include" \
LDFLAGS="-L/usr/local/lib"
DYLD_LIBRARY_PATH="/usr/local/lib" \
LD_LIBRARY_PATH="/usr/local/lib" \
python3 -m build --wheel
Other platforms
On platforms for which there is no official binary available, you'd have to compile C++ libzim from source first then either use DONT_DOWNLOAD_LIBZIM
or USE_SYSTEM_LIBZIM
.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for libzim-3.4.0-cp312-cp312-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 29fc720623f3925476aa1dd711b9a09ad361442ce292bdb9e80cb1c48ae152bc |
|
MD5 | 2501b3df1f175a5c88265efd742fd4e0 |
|
BLAKE2b-256 | 97eb44e0bad207768b7f7dff66593cac0e77b466e300294196cc1124011aae8d |
Hashes for libzim-3.4.0-cp312-cp312-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5f94ffe253b671cbe289ae67c5a8dfbf3ec6a2ee4425be0f5a80bc9edd50b9c |
|
MD5 | 0a2c3b700d88d7fb8ebb7a0589400c9f |
|
BLAKE2b-256 | 607e4a5edf0ef24daff97b8ec06e25af7a59772953e6674963d619ea550d9f8f |
Hashes for libzim-3.4.0-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 976dd07c42e3245a8f740562ca97649e4052286cd39207fef4e61e5f87ef803f |
|
MD5 | cf1a215627f266b8877071326a694d60 |
|
BLAKE2b-256 | 5f6365f134e69cdf0c2eb9d23d5eff49fc37085e8f249b9bf9667cfb97aac98a |
Hashes for libzim-3.4.0-cp312-cp312-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f66e75571c391315f2715c977828f37da3d1b8fa4d3899344687c49e416f7ec3 |
|
MD5 | 6f1bd15a2f54cf219299398bbc541b49 |
|
BLAKE2b-256 | 5c9d5aaef617518135f78a27417fe8767ec91a58b669cccaac772fab96adf1de |
Hashes for libzim-3.4.0-cp312-cp312-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aaebba52d5b98b3a553c7cb01b81d99efaa19a8ed1eb8485a01662a48cadfe23 |
|
MD5 | 692ac0237229041b91cc25dd45f2c2ca |
|
BLAKE2b-256 | 7a53c3335f3b4de493605c318e8a05df86eb84f409a7fc44abc4483246fb4a57 |
Hashes for libzim-3.4.0-cp312-cp312-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb0c2387139d92e464c5d073d96f311730b0163c24221a184a4f0e8e41c9ed15 |
|
MD5 | 71db5fac064ba835c941f3ccbbf6bd93 |
|
BLAKE2b-256 | d715c3964e375d934e9f648cdc13b9a802d0f3055df3680ebc5975e2b831ac0c |
Hashes for libzim-3.4.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 867d3d009e3e16843cdcaea551a6739c4f0367809b484470c0938c445aa51d0d |
|
MD5 | 4c727643eff52dd7bb6911bd7c240f4a |
|
BLAKE2b-256 | eb06518f135fb8e50fbe559d7a4d8724025a4c8dc96ad624c26f9e912fbae022 |
Hashes for libzim-3.4.0-cp311-cp311-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd91d478ba7c9bb5fa6e890bc060e9c5c7fa1052227d7b17622830adae9f30ad |
|
MD5 | d4818e2b07e98c7e095f7d2bb273a046 |
|
BLAKE2b-256 | 9251dde2a031c50dd9ac1bf1646f2f5b338f28deb4d392b6124b1c2467a5514e |
Hashes for libzim-3.4.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9698684d2b50ac80c0e118dd2f05be0aa3867aef6c7addb5e65bb9787d3f1caf |
|
MD5 | 25267b318530ecc890ad3967f744ec8d |
|
BLAKE2b-256 | 42283b9da1f952966ed534e3e8828a9a710a5f40b6d92a66827ae0300016db05 |
Hashes for libzim-3.4.0-cp311-cp311-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5996b5f6e70555e131453db9c060d1fd51613af436a160794f0c0e318a42274d |
|
MD5 | a695024239e398e6620f8de1731a787c |
|
BLAKE2b-256 | c1096dfba997628e14beff3dda26d0ed33afdc73c409739a451ad9f8c4e5a1ea |
Hashes for libzim-3.4.0-cp311-cp311-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c81e97c7cbfd574318199301d7f0f70d44badb0d043f3e6669737e42b9c32e7 |
|
MD5 | 63228cd0d3ce64200dec65ea8e5c29f5 |
|
BLAKE2b-256 | 026ab38f8e14cd6f175820c66b7c3e4368147342f56efef722922f88df3a82fe |
Hashes for libzim-3.4.0-cp311-cp311-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f17698a0db89fd67972e1252874efd3903ce5cdc1b9993115b1b36db7ef31b4 |
|
MD5 | d9b408f71577c5b90d5762fcfac1abe4 |
|
BLAKE2b-256 | 4926071ee83a643e9dd0a6ce5702d3ce3b4227077b3f150aec75c91223db9908 |
Hashes for libzim-3.4.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d53fa14568dcc3b68a896c3c9e24be7a51516c6c30e5131cc3eb47c9786b1b81 |
|
MD5 | e4ef9218f87beef7fe86d5917a41acbd |
|
BLAKE2b-256 | 242590974315d8a8ae4e9021bc903d99fd39dfd03e1783170ef578efab04c586 |
Hashes for libzim-3.4.0-cp310-cp310-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fecf2d3a719d1fed78dcef87305d3601d756a3cc0996f18e9782821c01a43b79 |
|
MD5 | aaabfa7303d30a343425dff1e5c18406 |
|
BLAKE2b-256 | 1622935b90bf333870479dfa04ac30a3297dce3fe8eceb75f1e314de459f3000 |
Hashes for libzim-3.4.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2eeb27cc505fcb00504f14ffd9006662c7d93945d64dae81fe0207f1e7d7cffd |
|
MD5 | d0d55b126097c5a02664c13fadcc824b |
|
BLAKE2b-256 | dacc19f8b0a75f455011ede51dd370bb16f4d5d5a35d968fffd43cc57080b2b8 |
Hashes for libzim-3.4.0-cp310-cp310-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 220f91816341246beb09530c583ba8ae2294e0522533bee6f7b931b658be5c1f |
|
MD5 | 6f8cb6128aea4f88db24a42bea0c4b53 |
|
BLAKE2b-256 | bddf7588509664d3f7387dfc8fc9c4bd1c6dce8fda7b9372ef62e42796f628fd |
Hashes for libzim-3.4.0-cp310-cp310-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1cb4431162c22ad39b102c6009a648eca74a670a91fee5737dd01569b07296f4 |
|
MD5 | 0722975aeed45eefd6214932c4b36b15 |
|
BLAKE2b-256 | 881ed8c5fd595edd054ed764c904d2dd8fcf39434876ba7bfa735ea9a8a59cd8 |
Hashes for libzim-3.4.0-cp310-cp310-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f140702fa15583bac4595fc3e7144a941515dc7e9f995b9a87bf40935a99886 |
|
MD5 | 33f74ad5113e697aad62f52ddab81505 |
|
BLAKE2b-256 | fd1331e1cb1abb499d644a51e64c8da1e831f4f231201b27d0fdc79f650e618d |
Hashes for libzim-3.4.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4004e587e016a977844e1a8036df97262128ae5473e15eed1e09e016e0ee811 |
|
MD5 | 4432d34dbc1bc5f119424e817cc8c6c9 |
|
BLAKE2b-256 | c1551383a398648445da9b3bcac07791902b9d7a9295d952a60d1e8fa3b16e19 |
Hashes for libzim-3.4.0-cp39-cp39-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e987c034b524aafab60115d51e2647721da5e479caf7cf077739d73a636b35b4 |
|
MD5 | 963f8b2783491d80d180aed4243dba99 |
|
BLAKE2b-256 | c0790e554261d6a763bd48aba8e1e525363c87f83b1bdf5c0854ed12e05f2b06 |
Hashes for libzim-3.4.0-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e43d3240fb0f2238b0a102b035572296275d51bd6341159cb2534bf07826eb3 |
|
MD5 | 863695129e7f3e1347a39d7dcb42633c |
|
BLAKE2b-256 | 538b4589d271633e5e95dff194659a78ae3a9856d177bd8697c82ea95a5cf467 |
Hashes for libzim-3.4.0-cp39-cp39-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 604ec189fd72cf1539d9443b2fc170755571db466f08d93d11a6d3a5cef41f5f |
|
MD5 | af99f42c8adacb71003a0a75fd606a3f |
|
BLAKE2b-256 | dff804ee6eecfcd2f54ee0369a8c2ffd43f1f66a6b21481bf4f632f028f0c741 |
Hashes for libzim-3.4.0-cp39-cp39-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f9cffdda6faf19e2f783cafc6107e6761f4f8611453a525e6c71dfc4fb660dd4 |
|
MD5 | 66c9691f0e09d4900165539d73cf6ef8 |
|
BLAKE2b-256 | e75be63185337539274e9c7ee60384fde7280d931e89e63c2e40cf3e9fc0ffc5 |
Hashes for libzim-3.4.0-cp39-cp39-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 304516ffd723b8959c53212aa687bcd20b397e03c0377192565f51587232ba44 |
|
MD5 | da3cccccad7acf35527db576d7738918 |
|
BLAKE2b-256 | f61e9f5029fcc40943de61d1570b12c2a2c84469364d409198c59607184a9af0 |
Hashes for libzim-3.4.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5157206d805f2c5f374a2d60b9b803e01f5bf96807d8cd5985cbd0774c2275f |
|
MD5 | a37840c3c4930de19795b969900f94aa |
|
BLAKE2b-256 | 0e2ee98b7bdfab99a543af5439ba9bf80fbef1186b05fae75434ac09e72f091f |
Hashes for libzim-3.4.0-cp38-cp38-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d688553674c6d140661fd4ff45eff8c790d36c9cfc643f346192cb2d88b01a40 |
|
MD5 | d0117843586909eb4b74e4063a18a418 |
|
BLAKE2b-256 | 3f8bdf76c7604bd78f7824aa159aa279bcf5da9a7c006e2b5faa376567b72c4b |
Hashes for libzim-3.4.0-cp38-cp38-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ab9d8fa1a3f8af134d4849b843a7b099a75f996ded9f3926e42d8eb2e82c989 |
|
MD5 | 81fc21a80f0267ff6536803ad8f99d8b |
|
BLAKE2b-256 | c71c24d7c65d9cc05e7ec14be693da21d484a0e0e356225ae6974f08c5c5936b |
Hashes for libzim-3.4.0-cp38-cp38-manylinux_2_24_aarch64.manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf076d89ad659c948b743854cdc95dfdfb7622a6c3b5fd8a9e29f97df0a8710f |
|
MD5 | 924a965cb9a70357bc701b98feab0620 |
|
BLAKE2b-256 | 999b9ab4b157d73b72058b00c1f4d6a216e30ec211c14dc1802eb3c648133822 |
Hashes for libzim-3.4.0-cp38-cp38-macosx_13_0_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab3a2515c0de3b7ada9ca0829a29902c77e9da12a7fafb7d15f3154a936d094c |
|
MD5 | dd8955455189b9756b62d0485d29a4fa |
|
BLAKE2b-256 | a383ed0de4d54e2f0f5b1eb6a6a3cbf1196c58296165211f0ee57b781862d33c |
Hashes for libzim-3.4.0-cp38-cp38-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 900145487aed489ada2ee8a1cd47d79f73ff01cbc17a51bf94a0eae10fea3bf5 |
|
MD5 | 21961ad0d81f79cc26af8ad38897eabf |
|
BLAKE2b-256 | 5a6845d2ccee1f90e8a056d2d59e5a8a40d97a911d76c3cf6706c5f86c79b69f |