Utility for batch downloading (certain) pages from MediaWiki sites as printable PDFs.
Project description
mwpdfify
Batch download multiple pages from MediaWiki sites (All pages or pages of a category) to printable PDFs.
Install / Run
pip install mwpdfify
...or clone repo and pip install .
...or directly download and run src/mwpdfify.py
There are two PDF rendering backends to choose from: pdfkit (installed as a dependency by default) or weasyprint. Use pip install -r requirements.txt to install both or choose one yourself. If using the former remember to also install wkhtmltopdf on your system.
Usage
- Get the address of the root of your wiki, where its
api.phpandindex.phpresides. Typically it's identical to the site's root (/). For Wikipedia it's at/w/; tell me if there are other exceptions ;) - (optional) If you want only a specific category, get its title (in the form of
Category:XXX) - Run the script. eg.:
mwpdfify https://lycoris-recoil.fandom.com- Download all pages (as in Special:AllPages) from Lycoris Recoil Fandom Wiki as PDFmwpdfify wiki.archlinux.org -c Category:Installation_process- Download all pages under Category:Installation_process from ArchWiki as PDFmwpdfify https://en.wikipedia.org/w/ -c Category:Guangzhou_Metro_stations -l 10 -t 4- Download all pages under Category:Guangzhou_Metro_stations (except subcategories) from Wikipedia, with 4 download threads and an one-time query limit of 10
The downloaded PDFs should be avaliable in a folder marked with the site's domain name in the current directory.
See below for other parameters:
usage: mwpdfify [-h] [-c CATEGORY] [-p] [-t THREADS] [-l LIMIT] [-w] url
positional arguments:
url site root of destination site
options:
-h, --help show this help message and exit
-c CATEGORY, --category CATEGORY
Download only a specified category
-p, --no-printable Force normal instead of printable version of pages
-t THREADS, --threads THREADS
Number of download threads, defaults to 8
-l LIMIT, --limit LIMIT
Limit of JSON info returned at once, defaults to maximum
(0)
-w, --use-weasyprint Use weasyprint as PDF rendering backend
Known issues
&printable=yesis deprecated in recent versions of MediaWiki (while no substitute API solutions are provided) so there might be layout issues when used with certain wikis; especially Fandom wikis as they also contain ads.- Recursively download pages from subcategories of a category is currently not supported.
Changelog
- v1.1.2 (2022/09/30):
- Set
pdfkitas required dependency
- Set
- v1.1 (2022/09/04):
- Changed address handling logic
- Bug fixes
- v1.0 (2022/09/03):
- Initial release
License
LGPLv3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mwpdfify-1.1.2.tar.gz.
File metadata
- Download URL: mwpdfify-1.1.2.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
759d6d3ce35b6f5ba9aba561c889b14e3597483a002fd8a5f99a5428160bfe60
|
|
| MD5 |
3eed7dd9cb49d11e383350d606ec9f48
|
|
| BLAKE2b-256 |
3df349cc7f76bbc23099060878e89cdee9f6f05e77f7a06229348cc250651108
|
File details
Details for the file mwpdfify-1.1.2-py3-none-any.whl.
File metadata
- Download URL: mwpdfify-1.1.2-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c8d356ed1d43c78aa5ee4ceaf167850c7377070972d957e5029c62ba98d3540
|
|
| MD5 |
acd456c4c8d7986f610261148bf040eb
|
|
| BLAKE2b-256 |
ea4daaccf8dd26ee90d0254c0e8b0c72ab51e6145ad452b4915a6f73c9a1fafb
|