Make ZIM files from DevDocs.io
Project description
Devdocs scraper
This scraper downloads devdocs.io documentation databases and puts them in ZIM files, a clean and user friendly format for storing content for offline usage.
Installation
There are three main ways to install and use devdocs2zim
from most recommended to least:
Install using a pre-built container
-
Download the image using
docker
:docker pull ghcr.io/openzim/devdocs
Build your own container
-
Clone the repository locally:
git clone https://github.com/openzim/devdocs.git && cd devdocs
-
Build the image:
docker build -t ghcr.io/openzim/devdocs .
Run the software locally using Hatch
-
Clone the repository locally:
git clone https://github.com/openzim/devdocs.git && cd devdocs
-
Install Hatch:
pip3 install hatch
-
Start a hatch shell to install software and dependencies in an isolated virtual environment.
hatch shell
-
Run the
devdocs2zim
command:devdocs2zim --help
Usage
[!WARNING] This project is still a work in progress and isn't ready for use yet, the commands below are examples only.
# Usage
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim [--all|--slug=SLUG|--first=N]
# Fetch all documents
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --all
# Fetch all documents except Ansible
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --all --skip-slug-regex "^ansible.*"
# Fetch Vue related documents
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --slug vue~3 --slug vue_router~4
# Fetch the docs for the two most recent versions of each software
docker run -v my_dir:/output ghcr.io/openzim/devdocs devdocs2zim --first=2
One of the following flags is required:
--all
: Fetch all Devdocs resources, and produce one ZIM per resource.--slug SLUG
: Fetch the provided Devdocs resource. Slugs are the first path entry in the Devdocs URL. For example, the slug for:https://devdocs.io/gcc~12/
isgcc~12
. Use --slug several times to add multiple.--first N
: Fetch the first number of items per slug as shown in the DevDocs UI.
Optional Flags:
--skip-slug-regex REGEX
: Skips slugs matching the given regular expression.--output OUTPUT_FOLDER
: Output folder for ZIMs. Default: /output--creator CREATOR
: Name of content creator. Default: 'DevDocs'--publisher PUBLISHER
: Custom publisher name. Default: 'openZIM'--name-format FORMAT
: Custom name format for individual ZIMs. Default: 'devdocs_{slug_without_version}_{version}'--title-format FORMAT
: Custom title format for individual ZIMs. Value will be truncated to 30 chars. Default: '{full_name} Documentation'--description-format FORMAT
: Custom description format for individual ZIMs. Value will be truncated to 80 chars. Default: '{full_name} Documentation'--long-description-format FORMAT
: Custom long description format for your ZIM. Value will be truncated to 4000 chars.Default: '{full_name} documentation by DevDocs'--tag TAG
: Add tag to the ZIM. Use --tag several times to add multiple. Formatting is supported. Default: ['devdocs', '{slug_without_version}']
Formatting Placeholders
The following formatting placeholders are supported:
{name}
: Human readable name of the resource e.g.Python
.{full_name}
: Name with optional version for the resource e.g.Python 3.12
.{slug}
: Devdocs slug for the resource e.g.python~3.12
.{clean_slug}
: Slug with non alphanumeric/period characters replaced with-
e.g.python-3.12
.{slug_without_version}
: Devdocs slug for the resource without the version e.g.python
.{version}
: Shortened version displayed in devdocs, if any e.g.3.12
.{release}
: Specific release of the software the documentation is for, if any e.g.3.12.1
.{attribution}
: License and attribution information about the resource.{home_link}
: Link to the project's home page, if any: e.g.https://python.org
.{code_link}
: Link to the project's source, if any: e.g.https://github.com/python/cpython
.{period}
: The current date inYYYY-MM
format e.g.2024-02
.
Developing
Use the commands below to set up the project once:
# Install hatch if it isn't installed already.
❯ pip install hatch
# Local install (in default env) / re-sync packages
❯ hatch run pip list
# Set-up pre-commit
❯ pre-commit install
The following commands can be used to build and test the scraper:
# Show scripts
❯ hatch env show
# linting, testing, coverage, checking
❯ hatch run lint:all
❯ hatch run lint:fixall
# run tests on all matrixed' envs
❯ hatch run test:run
# run tests in a single matrixed' env
❯ hatch env run -e test -i py=3.12 coverage
# run static type checks
❯ hatch env run check:all
# building packages
❯ hatch build
Contributing
This project adheres to openZIM's Contribution Guidelines.
This project has implemented openZIM's Python bootstrap, conventions and policies v1.0.3.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file devdocs2zim-0.1.0.tar.gz
.
File metadata
- Download URL: devdocs2zim-0.1.0.tar.gz
- Upload date:
- Size: 42.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c758e86f77bccfa89ec4f4377983a9b76fa8ba233313e35494229f85123997e1 |
|
MD5 | 61615b39a25d8fe8fa5fdfce51bf9cbd |
|
BLAKE2b-256 | 6c58b519663b69962c7fe7d65c44fd85b3b3923a249806689fa979689434b9ec |
File details
Details for the file devdocs2zim-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: devdocs2zim-0.1.0-py3-none-any.whl
- Upload date:
- Size: 36.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 581acd057874a27ae8ff548beb88686af1c8268868f1a990353c654dfd079e6b |
|
MD5 | 1a14d9c4f54c9ab559cc9fc6082b6ebd |
|
BLAKE2b-256 | 7cf4580372397c7147bf944524065a3a1a5d8f6e42984828f6828617e4d800fa |