Skip to main content

Collection of python tools to re-use common code across scrapers

Project description

zimscraperlib

Build Status CodeFactor License: GPL v3 PyPI version shields.io PyPI - Python Version codecov Read the Docs

Collection of python code to re-use across python-based scrapers

Usage

  • This library is meant to be installed via PyPI (zimscraperlib).
  • Make sure to reference it using a version code as the API is subject to frequent changes.
  • API should remain the same only within the same minor version.

Example usage:

zimscraperlib>=1.1,<1.2

See documentation at Read the Docs for details.

[!WARNING] While this library brings support for downloading videos with yt-dlp, recent changes in Youtube have forced yt-dlp team to require new dependencies for youtube videos (see https://github.com/yt-dlp/yt-dlp/issues/15012). These dependencies are significantly big and not needed for all other backend supported by yt-dlp (only youtube needs it). These dependencies are hence not included in this library dependencies (yet, see https://github.com/openzim/python-scraperlib/issues/268), you have to install them on your own if you intend to download videos from Youtube.

Dependencies

  • libmagic
  • wget
  • libzim (auto-installed, not available on Windows)
  • Pillow
  • FFmpeg
  • gifsicle (>=1.92)
  • libcairo (if you use the image manipulation, this is used for svg conversion)

macOS

brew install libmagic wget libtiff libjpeg webp little-cms2 ffmpeg gifsicle

Linux

sudo apt install libmagic1 wget ffmpeg \
    libtiff5-dev libjpeg8-dev libopenjp2-7-dev zlib1g-dev \
    libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python3-tk \
    libharfbuzz-dev libfribidi-dev libxcb1-dev gifsicle

Alpine

apk add ffmpeg gifsicle libmagic wget libjpeg

Contribution

This project adheres to openZIM's Contribution Guidelines.

This project has implemented openZIM's Python bootstrap, conventions and policies v1.0.2.

pip install hatch
pip install ".[dev]"
pre-commit install
# For tests
invoke coverage

Users

Non-exhaustive list of scrapers using it (check status when updating API):

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zimscraperlib-5.3.0.tar.gz (6.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zimscraperlib-5.3.0-py3-none-any.whl (127.2 kB view details)

Uploaded Python 3

File details

Details for the file zimscraperlib-5.3.0.tar.gz.

File metadata

  • Download URL: zimscraperlib-5.3.0.tar.gz
  • Upload date:
  • Size: 6.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for zimscraperlib-5.3.0.tar.gz
Algorithm Hash digest
SHA256 d096121a6790d5a7817311fa4683f69d786789b971ab5fa8f989f126ad706fb7
MD5 42312d342ca1e1cb5c8376bee69c37df
BLAKE2b-256 3e1cfded61208699a7d1dc356b7be1c745bc63ebc7943978adfd821cce4dbc6b

See more details on using hashes here.

Provenance

The following attestation bundles were made for zimscraperlib-5.3.0.tar.gz:

Publisher: Publish.yaml on openzim/python-scraperlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file zimscraperlib-5.3.0-py3-none-any.whl.

File metadata

  • Download URL: zimscraperlib-5.3.0-py3-none-any.whl
  • Upload date:
  • Size: 127.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for zimscraperlib-5.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d5eb7160b4b59205fecfcefab06fc8659f128e1fc91543112fae45679bf4a90f
MD5 22fd3b4cf690b5cf1bd83715a16c1b08
BLAKE2b-256 37140cbc8a23b81a8cbb9bc5201165b94686c01df7f551a5e296733b2dc46b3c

See more details on using hashes here.

Provenance

The following attestation bundles were made for zimscraperlib-5.3.0-py3-none-any.whl:

Publisher: Publish.yaml on openzim/python-scraperlib

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page