Skip to main content

Make ZIM files from DevDocs.io

Project description

Devdocs scraper

This scraper downloads devdocs.io documentation databases and puts them in ZIM files, a clean and user friendly format for storing content for offline usage.

CodeFactor License: GPL v3 codecov PyPI version shields.io PyPI - Python Version

Installation

There are three main ways to install and use devdocs2zim from most recommended to least:

Install using a pre-built container
  1. Download the image using docker:

    docker pull ghcr.io/openzim/devdocs
    
Build your own container
  1. Clone the repository locally:

    git clone https://github.com/openzim/devdocs.git && cd devdocs
    
  2. Build the image:

    docker build -t ghcr.io/openzim/devdocs .
    
Run the software locally using Hatch
  1. Clone the repository locally:

    git clone https://github.com/openzim/devdocs.git && cd devdocs
    
  2. Install Hatch:

    pip3 install hatch
    
  3. Start a hatch shell to install software and dependencies in an isolated virtual environment.

    hatch shell
    
  4. Run the devdocs2zim command:

    devdocs2zim --help
    

Usage

[!WARNING] This project is still a work in progress and isn't ready for use yet, the commands below are examples only.

# Usage
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim [--all|--slug=SLUG|--first=N]

# Fetch all documents
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --all

# Fetch all documents except Ansible
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --all --skip-slug-regex "^ansible.*"

# Fetch Vue related documents
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --slug vue~3 --slug vue_router~4

# Fetch the docs for the two most recent versions of each software
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --first=2

One of the following flags is required:

  • --all: Fetch all Devdocs resources, and produce one ZIM per resource.
  • --slug SLUG: Fetch the provided Devdocs resource. Slugs are the first path entry in the Devdocs URL. For example, the slug for: https://devdocs.io/gcc~12/ is gcc~12. Use --slug several times to add multiple.
  • --first N: Fetch the first number of items per slug as shown in the DevDocs UI.

Optional Flags:

  • --skip-slug-regex REGEX: Skips slugs matching the given regular expression.
  • --output OUTPUT_FOLDER: Output folder for ZIMs. Default: /output
  • --creator CREATOR: Name of content creator. Default: 'DevDocs'
  • --publisher PUBLISHER: Custom publisher name. Default: 'openZIM'
  • --name-format FORMAT: Custom name format for individual ZIMs. Default: 'devdocs_{slug_without_version}_{version}'
  • --title-format FORMAT: Custom title format for individual ZIMs. Value will be truncated to 30 chars. Default: '{full_name} Documentation'
  • --description-format FORMAT: Custom description format for individual ZIMs. Value will be truncated to 80 chars. Default: '{full_name} Documentation'
  • --long-description-format FORMAT: Custom long description format for your ZIM. Value will be truncated to 4000 chars.Default: '{full_name} documentation by DevDocs'
  • --tag TAG: Add tag to the ZIM. Use --tag several times to add multiple. Formatting is supported. Default: ['devdocs', '{slug_without_version}']

Formatting Placeholders

The following formatting placeholders are supported:

  • {name}: Human readable name of the resource e.g. Python.
  • {full_name}: Name with optional version for the resource e.g. Python 3.12.
  • {slug}: Devdocs slug for the resource e.g. python~3.12.
  • {clean_slug}: Slug with non alphanumeric/period characters replaced with - e.g. python-3.12.
  • {slug_without_version}: Devdocs slug for the resource without the version e.g. python.
  • {version}: Shortened version displayed in devdocs, if any e.g. 3.12.
  • {release}: Specific release of the software the documentation is for, if any e.g. 3.12.1.
  • {attribution}: License and attribution information about the resource.
  • {home_link}: Link to the project's home page, if any: e.g. https://python.org.
  • {code_link}: Link to the project's source, if any: e.g. https://github.com/python/cpython.
  • {period}: The current date in YYYY-MM format e.g. 2024-02.

Developing

Use the commands below to set up the project once:

# Install hatch if it isn't installed already. pip install hatch

# Local install (in default env) / re-sync packages hatch run pip list

# Set-up pre-commit pre-commit install

The following commands can be used to build and test the scraper:

# Show scripts hatch env show

# linting, testing, coverage, checking hatch run lint:all
❯ hatch run lint:fixall

# run tests on all matrixed' envs hatch run test:run

# run tests in a single matrixed' env hatch env run -e test -i py=3.12 coverage

# run static type checks hatch env run check:all

# building packages hatch build

Contributing

This project adheres to openZIM's Contribution Guidelines.

This project has implemented openZIM's Python bootstrap, conventions and policies v1.0.3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

devdocs2zim-0.1.0.tar.gz (42.1 kB view details)

Uploaded Source

Built Distribution

devdocs2zim-0.1.0-py3-none-any.whl (36.6 kB view details)

Uploaded Python 3

File details

Details for the file devdocs2zim-0.1.0.tar.gz.

File metadata

  • Download URL: devdocs2zim-0.1.0.tar.gz
  • Upload date:
  • Size: 42.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.6

File hashes

Hashes for devdocs2zim-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c758e86f77bccfa89ec4f4377983a9b76fa8ba233313e35494229f85123997e1
MD5 61615b39a25d8fe8fa5fdfce51bf9cbd
BLAKE2b-256 6c58b519663b69962c7fe7d65c44fd85b3b3923a249806689fa979689434b9ec

See more details on using hashes here.

File details

Details for the file devdocs2zim-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: devdocs2zim-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 36.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.6

File hashes

Hashes for devdocs2zim-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 581acd057874a27ae8ff548beb88686af1c8268868f1a990353c654dfd079e6b
MD5 1a14d9c4f54c9ab559cc9fc6082b6ebd
BLAKE2b-256 7cf4580372397c7147bf944524065a3a1a5d8f6e42984828f6828617e4d800fa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page