Skip to main content

Make ZIM files from DevDocs.io

Project description

Devdocs scraper

This scraper downloads devdocs.io documentation databases and puts them in ZIM files, a clean and user friendly format for storing content for offline usage.

CodeFactor License: GPL v3 codecov PyPI version shields.io PyPI - Python Version Docker

Installation

There are three main ways to install and use devdocs2zim from most recommended to least:

Install using a pre-built container
  1. Download the image using docker:

    docker pull ghcr.io/openzim/devdocs
    
Build your own container
  1. Clone the repository locally:

    git clone https://github.com/openzim/devdocs.git && cd devdocs
    
  2. Build the image:

    docker build -t ghcr.io/openzim/devdocs .
    
Run the software locally using Hatch
  1. Clone the repository locally:

    git clone https://github.com/openzim/devdocs.git && cd devdocs
    
  2. Install Hatch:

    pip3 install hatch
    
  3. Start a hatch shell to install software and dependencies in an isolated virtual environment.

    hatch shell
    
  4. Run the devdocs2zim command:

    devdocs2zim --help
    

Usage

[!WARNING] This project is still a work in progress and isn't ready for use yet, the commands below are examples only.

# Usage
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim [--all|--slug=SLUG|--first=N]

# Fetch all documents
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --all

# Fetch all documents except Ansible
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --all --skip-slug-regex "^ansible.*"

# Fetch Vue related documents
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --slug vue~3 --slug vue_router~4

# Fetch the docs for the two most recent versions of each software
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --first=2

One of the following flags is required:

  • --all: Fetch all Devdocs resources, and produce one ZIM per resource.
  • --slug SLUG: Fetch the provided Devdocs resource. Slugs are the first path entry in the Devdocs URL. For example, the slug for: https://devdocs.io/gcc~12/ is gcc~12. Use --slug several times to add multiple.
  • --first N: Fetch the first number of items per slug as shown in the DevDocs UI.

Optional Flags:

  • --skip-slug-regex REGEX: Skips slugs matching the given regular expression.
  • --output OUTPUT_FOLDER: Output folder for ZIMs. Default: /output
  • --creator CREATOR: Name of content creator. Default: 'DevDocs'
  • --publisher PUBLISHER: Custom publisher name. Default: 'openZIM'
  • --name-format FORMAT: Custom name format for individual ZIMs. Default: 'devdocs_{slug_without_version}_{version}'
  • --title-format FORMAT: Custom title format for individual ZIMs. Value will be truncated to 30 chars. Default: '{full_name} Documentation'
  • --description-format FORMAT: Custom description format for individual ZIMs. Value will be truncated to 80 chars. Default: '{full_name} Documentation'
  • --long-description-format FORMAT: Custom long description format for your ZIM. Value will be truncated to 4000 chars.Default: '{full_name} documentation by DevDocs'
  • --tag TAG: Add tag to the ZIM. Use --tag several times to add multiple. Formatting is supported. Default: ['devdocs', '{slug_without_version}']
  • --logo-format FORMAT: URL/path for the ZIM logo in PNG, JPG, or SVG format. Formatting placeholders are supported. If unset, a DevDocs logo will be used.

Formatting Placeholders

The following formatting placeholders are supported:

  • {name}: Human readable name of the resource e.g. Python.
  • {full_name}: Name with optional version for the resource e.g. Python 3.12.
  • {slug}: Devdocs slug for the resource e.g. python~3.12.
  • {clean_slug}: Slug with non alphanumeric/period characters replaced with - e.g. python-3.12.
  • {slug_without_version}: Devdocs slug for the resource without the version e.g. python.
  • {version}: Shortened version displayed in devdocs, if any e.g. 3.12.
  • {release}: Specific release of the software the documentation is for, if any e.g. 3.12.1.
  • {attribution}: License and attribution information about the resource.
  • {home_link}: Link to the project's home page, if any: e.g. https://python.org.
  • {code_link}: Link to the project's source, if any: e.g. https://github.com/python/cpython.
  • {period}: The current date in YYYY-MM format e.g. 2024-02.

Developing

Use the commands below to set up the project once:

# Install hatch if it isn't installed already. pip install hatch

# Local install (in default env) / re-sync packages hatch run pip list

# Set-up pre-commit pre-commit install

The following commands can be used to build and test the scraper:

# Show scripts hatch env show

# linting, testing, coverage, checking hatch run lint:all
❯ hatch run lint:fixall

# run tests on all matrixed' envs hatch run test:run

# run tests in a single matrixed' env hatch env run -e test -i py=3.12 coverage

# run static type checks hatch env run check:all

# building packages hatch build

Contributing

This project adheres to openZIM's Contribution Guidelines.

This project has implemented openZIM's Python bootstrap, conventions and policies v1.0.3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

devdocs2zim-0.2.1.tar.gz (108.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

devdocs2zim-0.2.1-py3-none-any.whl (37.7 kB view details)

Uploaded Python 3

File details

Details for the file devdocs2zim-0.2.1.tar.gz.

File metadata

  • Download URL: devdocs2zim-0.2.1.tar.gz
  • Upload date:
  • Size: 108.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for devdocs2zim-0.2.1.tar.gz
Algorithm Hash digest
SHA256 3f8ee6cfbfab27ec58efabe3861c97531c8c7a645d32bcbb00b26afa6735eef6
MD5 fcb7f7a7e6c37ac87cba1ad825bee5c5
BLAKE2b-256 0903d5a274a07d8a6a1477819761a93d67e0141e369675d1aad787efb48467b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for devdocs2zim-0.2.1.tar.gz:

Publisher: Publish.yaml on openzim/devdocs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file devdocs2zim-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: devdocs2zim-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 37.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for devdocs2zim-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 de7fd70728020c6198524be2329b6a2e9466669cb9671881700916b3095dcca9
MD5 2dd5e1c4a2c0fc97ea97d7de5b76ebf6
BLAKE2b-256 bf6d731a2a84c10bcad5714cee47a6856b70d48e7bf8262c1b10f21b40237ccf

See more details on using hashes here.

Provenance

The following attestation bundles were made for devdocs2zim-0.2.1-py3-none-any.whl:

Publisher: Publish.yaml on openzim/devdocs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page