Skip to main content

Library to interface with Project Gutenberg

Project description

Overview

This package contains a variety of scripts to make working with the Project Gutenberg body of public domain texts easier.

The functionality provided by this package includes:

  • Downloading texts from Project Gutenberg.

  • Cleaning the texts: removing all the crud, leaving just the text behind.

  • Making meta-data about the texts easily accessible.

Installation

This project is on PyPI, so I’d recommend that you just install everything from there using your favourite Python package manager.

pip install gutenberg

If you want to install from source or modify the package, you’ll need to clone this repository:

git clone https://github.com/c-w/Gutenberg.git

Now, you should probably install the dependencies for the package and verify your checkout by running the tests.

cd Gutenberg

virtualenv --no-site-packages virtualenv
source virtualenv/bin/activate
pip install -r requirements.pip

pip install nose
nosetests

Usage

Downloading a text

from gutenberg.acquire import load_etext
from gutenberg.cleanup import strip_headers

text = strip_headers(load_etext(2701)).strip()
assert text.startswith('MOBY DICK; OR THE WHALE\n\nBy Herman Melville')

Looking up meta-data

from gutenberg.query import get_etexts
from gutenberg.query import get_metadata

assert get_metadata('title', 2701)  == 'Moby Dick; Or, The Whale'
assert get_metadata('author', 2701) == 'Melville, Hermann'

assert 2701 in get_etexts('title', 'Moby Dick; Or, The Whale')
assert 2701 in get_etexts('author', 'Melville, Hermann')

Limitations

This project deliberately does not include any natural language processing functionality. Consuming and processing the text is the responsibility of the client; this library merely focuses on offering a simple and easy to use interface to the works in the Project Gutenberg corpus. Any linguistic processing can easily be done client-side e.g. using the TextBlob library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Gutenberg-0.3.2.tar.gz (10.3 kB view details)

Uploaded Source

File details

Details for the file Gutenberg-0.3.2.tar.gz.

File metadata

  • Download URL: Gutenberg-0.3.2.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for Gutenberg-0.3.2.tar.gz
Algorithm Hash digest
SHA256 09675c5110e18a2ccbe5871de8d08901d2165e9e5428699f81933d64f050812f
MD5 2c6540213f3f0162baa27530c1e975b4
BLAKE2b-256 c0279f817a81defb64c5b1d766394cf1e4bf90231f467d9e95895a1f7f8ae13a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page