Utility for batch downloading (certain) pages from MediaWiki sites as printable PDFs.
Project description
mwpdfify
Batch download multiple pages from MediaWiki sites (All pages or pages of a category) to printable PDFs.
Install / Run
pip install mwpdfify
...or clone repo and pip install .
...or directly download and run src/mwpdfify.py
There are two PDF rendering backends to choose from: pdfkit
(installed as a dependency by default) or weasyprint
. Use pip install -r requirements.txt
to install both or choose one yourself. If using the former remember to also install wkhtmltopdf
on your system.
Usage
- Get the address of the root of your wiki, where its
api.php
andindex.php
resides. Typically it's identical to the site's root (/
). For Wikipedia it's at/w/
; tell me if there are other exceptions ;) - (optional) If you want only a specific category, get its title (in the form of
Category:XXX
) - Run the script. eg.:
mwpdfify https://lycoris-recoil.fandom.com
- Download all pages (as in Special:AllPages) from Lycoris Recoil Fandom Wiki as PDFmwpdfify wiki.archlinux.org -c Category:Installation_process
- Download all pages under Category:Installation_process from ArchWiki as PDFmwpdfify https://en.wikipedia.org/w/ -c Category:Guangzhou_Metro_stations -l 10 -t 4
- Download all pages under Category:Guangzhou_Metro_stations (except subcategories) from Wikipedia, with 4 download threads and an one-time query limit of 10
The downloaded PDFs should be avaliable in a folder marked with the site's domain name in the current directory.
See below for other parameters:
usage: mwpdfify [-h] [-c CATEGORY] [-p] [-t THREADS] [-l LIMIT] [-w] url
positional arguments:
url site root of destination site
options:
-h, --help show this help message and exit
-c CATEGORY, --category CATEGORY
Download only a specified category
-p, --no-printable Force normal instead of printable version of pages
-t THREADS, --threads THREADS
Number of download threads, defaults to 8
-l LIMIT, --limit LIMIT
Limit of JSON info returned at once, defaults to maximum
(0)
-w, --use-weasyprint Use weasyprint as PDF rendering backend
Known issues
&printable=yes
is deprecated in recent versions of MediaWiki (while no substitute API solutions are provided) so there might be layout issues when used with certain wikis; especially Fandom wikis as they also contain ads.- Recursively download pages from subcategories of a category is currently not supported.
Changelog
- v1.1.2 (2022/09/30):
- Set
pdfkit
as required dependency
- Set
- v1.1 (2022/09/04):
- Changed address handling logic
- Bug fixes
- v1.0 (2022/09/03):
- Initial release
License
LGPLv3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mwpdfify-1.1.2.tar.gz
.
File metadata
- Download URL: mwpdfify-1.1.2.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 759d6d3ce35b6f5ba9aba561c889b14e3597483a002fd8a5f99a5428160bfe60 |
|
MD5 | 3eed7dd9cb49d11e383350d606ec9f48 |
|
BLAKE2b-256 | 3df349cc7f76bbc23099060878e89cdee9f6f05e77f7a06229348cc250651108 |
File details
Details for the file mwpdfify-1.1.2-py3-none-any.whl
.
File metadata
- Download URL: mwpdfify-1.1.2-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c8d356ed1d43c78aa5ee4ceaf167850c7377070972d957e5029c62ba98d3540 |
|
MD5 | acd456c4c8d7986f610261148bf040eb |
|
BLAKE2b-256 | ea4daaccf8dd26ee90d0254c0e8b0c72ab51e6145ad452b4915a6f73c9a1fafb |