Skip to main content

Library for working with works from Projekt Runeberg (Runeberg.org).

Project description

runeberg Build Statuscodecov.io Code Coverage

A library and command line application for downloading and parsing works from Projekt Runeberg.

Installation

You can install the Runeberg from PyPI:

pip install runeberg

It is supported on Python 3.6 and above.

Usage as a command line application

After installing runeberg simply call the program to get a paged output of works to download, follow the prompts to download (and unpack) the files.

$ runeberg
1. "Det Ringer!" Skämt i en akt (1902) by Helena Nyblom [sv]
2. "Då sa' kungen..." : Kungliga anekdoter under hundra år (1946) by ? [sv]
3. "Pastoralier" (1899) by August Olsson [sv]
4. "The Ripper" (uppskäraren) (1892) by Adolf Paul [sv]
5. 100 Præstehistorier eller Præstestandens lyse og mørke Sider (1893) by Nils Poulsen [no]
6. 14 Descriptive Pieces for the Young for Piano (1895) by Sveinbjörn Sveinbjörnsson [en]
7. 14 sovjetryska berättare : valda och översatta från ryskan (1929) by ? [sv]
8. 16 år med Roald Amundsen. Fra Pol til Pol (1930) by Oscar Wisting [no]
9. 1720, 1772, 1809 (1836) by Magnus Crusenstolpe [sv]

What do you want to do? [1–25] to download, [N]ext 25, [Q]uit: █

Use the -a flag to start with a list of authors for which a filtered list of works will be presented:

$ runeberg -a
1. Ülev Aaloe (1944) [ee]
2. Simon Aberstén (1865–1937) [se]
3. Selma Abrahamsson (1872–1911) [fi]
4. Arthur Dyke Acland (1847–1926) [uk]
5. Adam Bremensis (1044–1080) [de]
6. Gertrud Adelborg (1853–1942) [se]
7. Ottilia Adelborg (1855–1936) [se]
8. Gudmund Jöran Adlerbeth (1751–1818) [se]
9. Gustav Magnus Adlercreutz (1775–1845) [se]

What do you want to do? [1–25] to display their works, [N]ext 25, [Q]uit: 6
Displaying works by Gertrud Adelborg [uid=adelbger]…
1. Några drag af de till Danmark utvandrade allmogeflickornas ställning och arbetsförhållanden (1890) by Gertrud Adelborg [sv]
2. Några upplysningar angående de svenska allmogeflickornas utvandring till Danmark (1893) by Gertrud Adelborg [sv]
What do you want to do? [1–2] to download, [Q]uit: █

Use the -h flag to see a full list of options and filters.

Usage as a library

First determine the identifier of the work you wish to download. For e.g. http://runeberg.org/aldrigilif/ this <uid> would be aldrigilif.

# Download and unpack a work from runeberg.org:
# this will by default download the work to /downloaded_data/<uid>/
import runeberg.download as downloader

downloader.get_work('<uid>')
# Warning raised if additional colour images are found, these are not unpacked.

# Parse the downloaded work:
# from the parsed work you can access individual pages, articles/chapters along
# with any metadata
import runeberg
parsed_work = runeberg.Work.from_files('<uid>')

# Create a DjVu file of the work
print(parsed_work.to_djvu())  # outputs the path to the created file

Caveats

Some of the Metadata files are encoded in Windows 1252 rather than the default latin-1. The framework does not currently detect this. If you encounter such a file some characters may be misinterpreted and you must manually re-encode the file before parsing the work.

If the originally scanned images were .jpg then the downloaded "colour images" will just be a second identical copy of these.

Requirements

For DjVu conversion DjVuLibre must be installed.

Change log

0.0.2

  • [Breaking] Rename ocr prpoerty of Page as text.
  • Introduce text property to Work and Article.
  • Re-use djvu file generated by earlier run. Add force argument to avoid reuse.
  • Parse the IMAGE_SOURCE metadata.
  • Expand testing to py37, py38

0.0.1

  • Initial PyPI release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

runeberg-0.0.2.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

runeberg-0.0.2-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file runeberg-0.0.2.tar.gz.

File metadata

  • Download URL: runeberg-0.0.2.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for runeberg-0.0.2.tar.gz
Algorithm Hash digest
SHA256 18fa4a83a96e81b9ad67b1037e859b7dc6d61c29ad6e6a58206b5f2a203505bc
MD5 7ee38476f71fdd548087a35ea9deb7f2
BLAKE2b-256 fb911d9fdab573baad80fd33cac5cbe87f7cee9ad2a5c4d4d8b09b0e9eaaa0e0

See more details on using hashes here.

File details

Details for the file runeberg-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: runeberg-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for runeberg-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 40585fce5d7fa9dc065c80162e44093c65c1bdcf7a220d6b3ac9fc008f784632
MD5 50c183fa41843103df7cc296a7c2a136
BLAKE2b-256 eeff3404bd47fbc4dec30a815880ffb9e38674d5d3f1d95dd5aa67a6415789f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page