Project Gutenberg corpus interface
Project description
This package contains a variety of scripts to make working with the tremendous NLP resource Project Gutenberg easier.
The functionality provided by this package includes: * Downloading etexts from Project Gutenberg * Removing headers and footers from etexts * Organizing meta-data about the etexts in a database