Skip to main content

Collection of python tools to re-use common code across scrapers

Project description

zimscraperlib

QA Status Tests Status CodeFactor License: GPL v3 PyPI version shields.io PyPI - Python Version codecov Read the Docs

Collection of python code to re-use across python-based scrapers

Usage

  • This library is meant to be installed via PyPI (zimscraperlib).
  • Make sure to reference it using a version code as the API is subject to frequent changes.
  • API should remain the same only within the same minor version.

Example usage:

zimscraperlib>=1.1,<1.2

See documentation at Read the Docs for details.

[!WARNING] While this library brings support for downloading videos with yt-dlp, recent changes in Youtube have forced yt-dlp team to require new dependencies for youtube videos (see https://github.com/yt-dlp/yt-dlp/issues/15012). These dependencies are significantly big and not needed for all other backend supported by yt-dlp (only youtube needs it). These dependencies are hence not included in this library dependencies (yet, see https://github.com/openzim/python-scraperlib/issues/268), you have to install them on your own if you intend to download videos from Youtube.

Dependencies

Most dependencies are installed automatically by pip (from PyPI by default). The following system packages may be required depending on which features you use:

  • libmagic — required for file type detection (used in most scrapers)
  • wget — required only for zimscraperlib.download functions
  • FFmpeg — required only for video processing functions
  • gifsicle (>=1.92) — required only for GIF optimization
  • libcairo — required only for SVG-to-PNG conversion
  • libzim — auto-installed via PyPI, not available on Windows
  • Pillow — auto-installed via PyPI; pre-built wheels are used by default and no system image libraries are needed. Only if you need to build Pillow from source should you install additional system libraries — see Pillow's build documentation for details.

    Note: To run the full test suite, all system dependencies listed above must be installed.

macOS

brew install libmagic wget ffmpeg gifsicle cairo

Linux

sudo apt install libmagic1 wget ffmpeg gifsicle libcairo2

Alpine

apk add ffmpeg gifsicle libmagic wget cairo

Contribution

This project adheres to openZIM's Contribution Guidelines.

This project has implemented openZIM's Python bootstrap, conventions and policies v2.0.0.

All instructions below must be run from the root of your local clone of this repository.

If you do not already have it on your system, install hatch:

pip install hatch

Start a hatch shell — this will install all dependencies including dev in an isolated virtual environment:

hatch shell

Set up the pre-commit Git hook (runs linters automatically before each commit):

pre-commit install

Run tests with coverage:

invoke coverage

Users

Non-exhaustive list of scrapers using it (check status when updating API):

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zimscraperlib-5.4.0.tar.gz (6.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zimscraperlib-5.4.0-py3-none-any.whl (130.0 kB view details)

Uploaded Python 3

File details

Details for the file zimscraperlib-5.4.0.tar.gz.

File metadata

  • Download URL: zimscraperlib-5.4.0.tar.gz
  • Upload date:
  • Size: 6.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for zimscraperlib-5.4.0.tar.gz
Algorithm Hash digest
SHA256 c7edae5b7f9554c4bb5319f5ebee502186ebe24ff46690d5ce4ee918d10d0c35
MD5 5ff390e9d2a3ecf94422fbeec4401504
BLAKE2b-256 40cb4ade5e02157cd6290c665b4045464afe56647c124fd0e04cf07afee3a376

See more details on using hashes here.

Provenance

The following attestation bundles were made for zimscraperlib-5.4.0.tar.gz:

Publisher: Publish.yaml on openzim/python-scraperlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zimscraperlib-5.4.0-py3-none-any.whl.

File metadata

  • Download URL: zimscraperlib-5.4.0-py3-none-any.whl
  • Upload date:
  • Size: 130.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for zimscraperlib-5.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ebd84bcf88c3a9be5cf555bcc5a9206e75013ad786f62876932220f831361bc
MD5 c06536950e55bfb30b5c9f5aeacffa22
BLAKE2b-256 62608aea4f26e21e6f405e8bfb2e9e59a70116856b8aad4eb71bf97dc1cdf9da

See more details on using hashes here.

Provenance

The following attestation bundles were made for zimscraperlib-5.4.0-py3-none-any.whl:

Publisher: Publish.yaml on openzim/python-scraperlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page