Skip to main content

openZIM hatch plugin to set metadata automatically and download files at build time

Project description

hatch-openzim

Code Quality Status Tests Status CodeFactor License: GPL v3 codecov

This provides a Hatch(ling) plugin for common openZIM operations:

  • automatically populate common project metadatas
  • install static files (e.g. external JS dependencies) at build time

This plugin intentionally has few dependencies, using the Python standard library whenever possible and hence limiting footprint to a minimum.

hatch-openzim adheres to openZIM's Contribution Guidelines.

hatch-openzim has implemented openZIM's Python bootstrap, conventions and policies v1.0.1.

Quick start

Assuming you have an openZIM project, you could use such a configuration in your pyproject.toml

# Use the hatchling build backend, with the hatch-openzim plugin.
[build-system]
requires = ["hatchling", "hatch-openzim"]
build-backend = "hatchling.build"

[project]
name = "MyAwesomeScraper"
requires-python = ">=3.11,<3.12"
description = "Awesome scraper"
readme = "README.md"

# These project metadatas are dynamic because they will be generated from hatch-openzim
# and version plugins.
dynamic = ["authors", "classifiers", "keywords", "license", "version", "urls"]

# Enable the hatch-openzim metadata hook to generate default openZIM metadata.
[tool.hatch.metadata.hooks.openzim-metadata]
additional-keywords = ["awesome"] # some additional keywords
kind = "scraper" # indicate this is a scraper, so that additional keywords are added

# Additional author #1
[[tool.hatch.metadata.hooks.openzim-metadata.additional-authors]]
name="Bob"
email="bob@acme.com"

# Additional author #2
[[tool.hatch.metadata.hooks.openzim-metadata.additional-authors]]
name="Alice"
email="alice@acme.com"

# Enable the hatch-openzim build hook to install files (e.g. JS libs) at build time.
[tool.hatch.build.hooks.openzim-build]
toml-config = "openzim.toml" # optional location of the configuration file
dependencies = [ "zimscraperlib==3.1.0" ] # optional dependencies needed for file installations

NOTA: the dependencies attribute is not specific to our hook(s), it is a generic hatch(ling) feature.

Metadata hook usage

Configuration (in pyproject.toml)

Variable Required Description
additional-authors N List of authors that will be appended to the automatic one
additional-classifiers N List of classifiers that will be appended to the automatic ones
additional-keywords N List of keywords that will be appended to the automatic ones
kind N If set to scraper, scrapers keywords will be automatically added as well
organization N Override organization (otherwise detected from Github repository to set author and keyword appropriately). Case-insentive. Supported values are openzim, kiwix and offspot
preserve-authors N Boolean indicating that we do not want to set authors metadata but use the ones of pyproject.toml
preserve-classifiers N Boolean indicating that we do not want to set classifiers metadata but use the ones of pyproject.toml
preserve-keywords N Boolean indicating that we do not want to set keywords metadata but use the ones of pyproject.toml
preserve-license N Boolean indicating that we do not want to set license metadata but use the one of pyproject.toml
preserve-urls N Boolean indicating that we do not want to set urls metadata but use the ones of pyproject.toml

Behavior

The metadata hook will set:

  • authors to [{"email": "dev@kiwix.org", "name": "Kiwix"}]
  • classifiers will contain:
    • License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
    • all Programming Language :: Python :: x and Programming Language :: Python :: x.y matching the required-versions
  • keywords will contain:
    • at least kiwix
    • if kind is scraper, it will add zim and offline
    • and additional-keywords passed in the configuration
  • license to {"text": "GPL-3.0-or-later"}
  • urls to
    • Donate: https://www.kiwix.org/en/support-us/
    • Homepage: Github repository URL (e.g. https://github.com/openzim/hatch-openzim) if code is a git clone, otherwise https://www.kiwix.org

Build hook usage

High-level configuration (in pyproject.toml)

Variable Required Description
toml-config N Location of the configuration, default to openzim.toml

Details configuration (in openzim.toml)

The build hook detailed configuration is done in a TOML file named openzim.toml (if not customized via toml-config, see above). This file must be placed your project root folder, next to your pyproject.toml.

The build hook supports to download web resources at various location at build time.

To configure, this you first have to create a files section in the openzim.toml configuration and declare its config configuration. Name of the section (assets in example below) is free (do not forgot to escape it if you want to use special chars like . in the name).

[files.assets.config]
target_dir="src/hatch_openzim/templates/assets"
execute_after=[
    "touch somewhere/something.txt"
]
Variable Required Description
target_dir Y Base directory where all downloaded content will be placed
execute_after N List of shell commands to execute once all actions (see below) have been executed; actions are executed with target_dir as current working directory

Important: The execute_after commands are always executed, no matter how many action are present or how many actions have been ignored (see below for details about why an action might be ignored).

Nota: The example execute_after command (touch) is not representative of what you would usually do ^^

Once this section configuration is done, you will then declare multiple actions. All actions in a given section share the same base configuration declared above.

Three kinds of actions are supported:

  • get_file: downloads a file to a location
  • extract_all: extracts all content of a zip file to a location
  • extract_items: extracts some items of a zip file to some locations

Each action is declared in its own TOML table. Action names are free.

[files.assets.actions.some_name]
action=...

get_file action configuration (in openzim.toml)

This action downloads a file to a location.

Important: If target_file is already present, the action is not executed, it is simply ignored.

Variable Required Description
action Y Must be "get_file"
source Y URL of the online resource to download
target_file Y Relative path to the file target location, relative to the section target_dir
execute_after N List of shell commands to execute once file installation is completed; actions are executed with the section target_dir as current working directory

You will find a sample below.

[files.assets.actions."jquery.min.js"]
action="get_file"
source="https://code.jquery.com/jquery-3.5.1.min.js"
target_file="jquery.min.js"

extract_all action configuration (in openzim.toml)

This action downloads a ZIP and extracts it to a location. Some items in the Zip content can be removed afterwards.

Important: If target_dir is already present, the action is not executed, it is simply ignored.

Variable Required Description
action Y Must be "extract_all"
source Y URL of the online ZIP to download
target_dir Y Relative path of the directory where ZIP content will be extracted, relative to the section target_dir
remove N List of glob patterns of ZIP content to remove after extraction (relative to action target_dir)
execute_after N List of shell commands to execute once files extraction is completed; actions are executed with the section target_dir as current working directory

You will find a sample below.

Nota:

  • the ZIP is first saved to a temporary location before extraction, consuming some disk space
[files.assets.actions.chosen]
action="extract_all"
source="https://github.com/harvesthq/chosen/releases/download/v1.8.7/chosen_v1.8.7.zip"
target_dir="chosen"
remove=["docsupport", "chosen.proto.*", "*.html", "*.md"]

extract_items action configuration (in openzim.toml)

This action extracts a ZIP to a temporary directory, and move selected items to some locations. Some sub-items in the Zip content can be removed afterwards.

Important: If any target_paths is already present, the action is not executed, it is simply ignored.

Variable Required Description
action Y Must be "extract_all"
source Y URL of the online ZIP to download
zip_paths Y List of relative path in ZIP to select
target_paths Y Relative path of the target directory where selected items will be moved (relative to ZIP home folder)
remove N List of glob patterns of ZIP content to remove after extraction (must include the necessary target_paths, they are relative to the section target_dir)
execute_after N List of shell commands to execute once ZIP extraction is completed; actions are executed with the section target_dir as current working directory

Nota:

  • the zip_paths and target_paths are matched one-by-one, and must hence have the same length.
  • the ZIP is first saved to a temporary location before extraction, consuming some disk space
  • all content is extracted before selected items are moved, and the rest is deleted

You will find a sample below.

[files.assets.actions.ogvjs]
action="extract_items"
source="https://github.com/brion/ogv.js/releases/download/1.8.9/ogvjs-1.8.9.zip"
zip_paths=["ogvjs-1.8.9"]
target_paths=["ogvjs"]
remove=["ogvjs/COPYING", "ogvjs/*.txt", "ogvjs/README.md"]

Full sample

A full example with two distinct sections and three actions in total is below.

Nota: The touch command in execute_after is not representative of what you would usually do ^^

[files.assets.config]
target_dir="src/hatch_openzim/templates/assets"
execute_after=[
    "fix_ogvjs_dist .",
]

[files.assets.actions."jquery.min.js"]
action="get_file"
source="https://code.jquery.com/jquery-3.5.1.min.js"
target_file="jquery.min.js"
execute_after=[
    "touch done.txt",
]

[files.assets.actions.chosen]
action="extract_all"
source="https://github.com/harvesthq/chosen/releases/download/v1.8.7/chosen_v1.8.7.zip"
target_dir="chosen"
remove=["docsupport", "chosen.proto.*", "*.html", "*.md"]

[files.videos.config]
target_dir="src/hatch_openzim/templates/videos"

[files.videos.actions.ogvjs]
action="extract_items"
source="https://github.com/brion/ogv.js/releases/download/1.8.9/ogvjs-1.8.9.zip"
zip_paths=["ogvjs-1.8.9"]
target_paths=["ogvjs"]
remove=["ogvjs/COPYING", "ogvjs/*.txt", "ogvjs/README.md"]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hatch_openzim-0.2.3.tar.gz (31.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hatch_openzim-0.2.3-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file hatch_openzim-0.2.3.tar.gz.

File metadata

  • Download URL: hatch_openzim-0.2.3.tar.gz
  • Upload date:
  • Size: 31.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for hatch_openzim-0.2.3.tar.gz
Algorithm Hash digest
SHA256 3e2dd8b3beae67af16a8aca5be94672b5975b118e5be6f1ab06fdedbf479771b
MD5 696f9877e9d41e823b8a6377e8677235
BLAKE2b-256 06a3da0dce7de4e497922619e421339b0de789071cd43d5548bc5fb41135a6df

See more details on using hashes here.

Provenance

The following attestation bundles were made for hatch_openzim-0.2.3.tar.gz:

Publisher: Publish.yaml on openzim/hatch-openzim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hatch_openzim-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: hatch_openzim-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 24.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for hatch_openzim-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a4586b564b46828ef19a437b67de3f01e838003aeb15143bfca89b773a99637d
MD5 95c85569c1dd6d794ec0d0a5482e9780
BLAKE2b-256 1c59a8cd5ea884b386caeff2fd76b6ac846f59df9528bdf146a02a40e7fe1073

See more details on using hashes here.

Provenance

The following attestation bundles were made for hatch_openzim-0.2.3-py3-none-any.whl:

Publisher: Publish.yaml on openzim/hatch-openzim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page