Skip to main content

Make ZIM files from DevDocs.io

Project description

Devdocs scraper

This scraper downloads devdocs.io documentation databases and puts them in ZIM files, a clean and user friendly format for storing content for offline usage.

CodeFactor License: GPL v3 codecov PyPI version shields.io PyPI - Python Version Docker

Installation

There are three main ways to install and use devdocs2zim from most recommended to least:

Install using a pre-built container
  1. Download the image using docker:

    docker pull ghcr.io/openzim/devdocs
    
Build your own container
  1. Clone the repository locally:

    git clone https://github.com/openzim/devdocs.git && cd devdocs
    
  2. Build the image:

    docker build -t ghcr.io/openzim/devdocs .
    
Run the software locally using Hatch
  1. Clone the repository locally:

    git clone https://github.com/openzim/devdocs.git && cd devdocs
    
  2. Install Hatch:

    pip3 install hatch
    
  3. Start a hatch shell to install software and dependencies in an isolated virtual environment.

    hatch shell
    
  4. Run the devdocs2zim command:

    devdocs2zim --help
    

Usage

[!WARNING] This project is still a work in progress and isn't ready for use yet, the commands below are examples only.

# Usage
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim [--all|--slug=SLUG|--first=N]

# Fetch all documents
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --all

# Fetch all documents except Ansible
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --all --skip-slug-regex "^ansible.*"

# Fetch Vue related documents
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --slug vue~3 --slug vue_router~4

# Fetch the docs for the two most recent versions of each software
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --first=2

One of the following flags is required:

  • --all: Fetch all Devdocs resources, and produce one ZIM per resource.
  • --slug SLUG: Fetch the provided Devdocs resource. Slugs are the first path entry in the Devdocs URL. For example, the slug for: https://devdocs.io/gcc~12/ is gcc~12. Use --slug several times to add multiple.
  • --first N: Fetch the first number of items per slug as shown in the DevDocs UI.

Optional Flags:

  • --skip-slug-regex REGEX: Skips slugs matching the given regular expression.
  • --output OUTPUT_FOLDER: Output folder for ZIMs. Default: /output
  • --creator CREATOR: Name of content creator. Default: 'DevDocs'
  • --publisher PUBLISHER: Custom publisher name. Default: 'openZIM'
  • --name-format FORMAT: Custom name format for individual ZIMs. Default: 'devdocs_{slug_without_version}_{version}'
  • --title-format FORMAT: Custom title format for individual ZIMs. Value will be truncated to 30 chars. Default: '{full_name} Documentation'
  • --description-format FORMAT: Custom description format for individual ZIMs. Value will be truncated to 80 chars. Default: '{full_name} Documentation'
  • --long-description-format FORMAT: Custom long description format for your ZIM. Value will be truncated to 4000 chars.Default: '{full_name} documentation by DevDocs'
  • --tag TAG: Add tag to the ZIM. Use --tag several times to add multiple. Formatting is supported. Default: ['devdocs', '{slug_without_version}']
  • --logo-format FORMAT: URL/path for the ZIM logo in PNG, JPG, or SVG format. Formatting placeholders are supported. If unset, a DevDocs logo will be used.

Formatting Placeholders

The following formatting placeholders are supported:

  • {name}: Human readable name of the resource e.g. Python.
  • {full_name}: Name with optional version for the resource e.g. Python 3.12.
  • {slug}: Devdocs slug for the resource e.g. python~3.12.
  • {clean_slug}: Slug with non alphanumeric/period characters replaced with - e.g. python-3.12.
  • {slug_without_version}: Devdocs slug for the resource without the version e.g. python.
  • {version}: Shortened version displayed in devdocs, if any e.g. 3.12.
  • {release}: Specific release of the software the documentation is for, if any e.g. 3.12.1.
  • {attribution}: License and attribution information about the resource.
  • {home_link}: Link to the project's home page, if any: e.g. https://python.org.
  • {code_link}: Link to the project's source, if any: e.g. https://github.com/python/cpython.
  • {period}: The current date in YYYY-MM format e.g. 2024-02.

Developing

Use the commands below to set up the project once:

# Install hatch if it isn't installed already. pip install hatch

# Local install (in default env) / re-sync packages hatch run pip list

# Set-up pre-commit pre-commit install

The following commands can be used to build and test the scraper:

# Show scripts hatch env show

# linting, testing, coverage, checking hatch run lint:all
❯ hatch run lint:fixall

# run tests on all matrixed' envs hatch run test:run

# run tests in a single matrixed' env hatch env run -e test -i py=3.12 coverage

# run static type checks hatch env run check:all

# building packages hatch build

Contributing

This project adheres to openZIM's Contribution Guidelines.

This project has implemented openZIM's Python bootstrap, conventions and policies v1.0.3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

devdocs2zim-0.2.0.tar.gz (107.3 kB view details)

Uploaded Source

Built Distribution

devdocs2zim-0.2.0-py3-none-any.whl (37.7 kB view details)

Uploaded Python 3

File details

Details for the file devdocs2zim-0.2.0.tar.gz.

File metadata

  • Download URL: devdocs2zim-0.2.0.tar.gz
  • Upload date:
  • Size: 107.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.7

File hashes

Hashes for devdocs2zim-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9831f977fe086e3f8e7fe63094ff2dc1b849bcbf2f36d2dc907dc2aff8bdb874
MD5 10f42eeca53233005d18fcd5a5915ec8
BLAKE2b-256 9ea8bb0a677ad55425c340c5ff6129423b30c0309f6de15725b1eeb38cefdff0

See more details on using hashes here.

File details

Details for the file devdocs2zim-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: devdocs2zim-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 37.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.7

File hashes

Hashes for devdocs2zim-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 487ec9e35b9ea633347a7650e4135360a57519f8ee5b63ffa4d6d3d97f356fc4
MD5 a93f1b15f9633e3dc4607ac6b2ab09b7
BLAKE2b-256 96029f57a16e2dad80353442bc99e9f2b4828cb620339521c66cadcbbcf94606

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page