Skip to main content

Make ZIM file from Gutenberg books

Project description

A scraper that downloads the whole repository of [Project Gutenberg] (http://www.gutenberg.org) and puts it into a locally browsable directory and then in a ZIM file (http://www.openzim.org), a clean and user friendly format for storing content for offline usage.

Dependencies

Ubuntu/debian

python-pip python-dev libxml2-dev libxslt-dev advancecomp jpegoptim pngquant p7zip-full gifsicle

macOS

brew install advancecomp jpegoptim pngquant p7zip gifsicle

Usage

gutenberg2zim

By default (no argument), it runs all the steps: download, parse, export and zim.

-h --help                       Display this help message
-y --wipe-db                    Do not wipe the DB during parse stage
-F --force                      Redo step even if target already exist

-l --languages=<list>           Comma-separated list of lang codes to filter export to (preferably ISO 639-1, else ISO 639-3)
-f --formats=<list>             Comma-separated list of formats to filter export to (epub, html, pdf, all)

-m --mirror=<url>               Use URL as base for all downloads.
-r --rdf-folder=<folder>        Don't download rdf-files.tar.bz2 and use extracted folder instead
-e --static-folder=<folder>     Use-as/Write-to this folder static HTML
-z --zim-file=<file>            Write ZIM into this file path
-t --zim-title=<title>          Set ZIM title
-n --zim-desc=<description>     Set ZIM description
-d --dl-folder=<folder>         Folder to use/write-to downloaded ebooks
-u --rdf-url=<url>              Alternative rdf-files.tar.bz2 URL
-b --books=<ids>                Execute the processes for specific books, separated by commas, or dashes for intervals
-c --concurrency=<nb>           Number of concurrent process for download and parsing tasks

-x --zim-title=<title>          Custom title for the ZIM file
-q --zim-desc=<desc>            Custom description for the ZIM file

--check                         Check dependencies
--prepare                       Download & extract rdf-files.tar.bz2
--parse                         Parse all RDF files and fill-up the DB
--download                      Download ebooks based on filters
--export                        Export downloaded content to zim-friendly static HTML
--dev                           Exports *just* Home+JS+CSS files (overwritten by --zim step)
--zim                           Create a ZIM file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gutenberg2zim-2.1.1.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

gutenberg2zim-2.1.1-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file gutenberg2zim-2.1.1.tar.gz.

File metadata

  • Download URL: gutenberg2zim-2.1.1.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for gutenberg2zim-2.1.1.tar.gz
Algorithm Hash digest
SHA256 ca8402a81c905622217199001ba587a39768980909203dee93c80b23752135c8
MD5 6d17a56353adad5e6b47c225e3095a21
BLAKE2b-256 5661b6df994e6b90c8f6daa815c6475c839f0137639cf2c4179cfd3403f342c9

See more details on using hashes here.

File details

Details for the file gutenberg2zim-2.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for gutenberg2zim-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7862c263521aff21f55bb74429b225c368ccbf06f204421f6f7e61e0c1acf63e
MD5 a56aedc86729cf133d3ab27c699879eb
BLAKE2b-256 c4cee3071df62f4b2676a8d67c70229c4f5225e7ec980bc1099a37d812e96cfc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page