Skip to main content

Download e-texts from Project Gutenberg

Project description

gutenfetchen

PyPI version Python 3.10+ License: MIT PyPI Downloads PyPI Downloads/Month Code style: ruff Type checked: mypy

Verb, pseudo-German. gutenfetchen (/ˈɡuːtənˌfɛtʃən/) "to do the good fetching." From guten (good) + fetchen (to fetch), conjugated in the infinitive as if it were a proper German verb. Because downloading public-domain literature should feel orderly, efficient, and vaguely Teutonic.

Download plain-text e-books from Project Gutenberg with a single command.

Why gutenfetchen?

Most Gutenberg tools (Gutenberg, gutenbergpy) require building a local metadata database before you can do anything - a process that can take hours. gutenfetchen skips all of that.

  • Zero setup - queries the Gutendex API directly, no local database required
  • Smart deduplication - filters out duplicate editions, keeps the highest-quality version
  • Clean output - strips Project Gutenberg boilerplate headers/footers by default
  • Prefers UTF-8 - automatically selects the best plain-text encoding available
  • Dry-run mode - preview results before downloading anything

Install

pip install gutenfetchen

Usage

Search by title:

gutenfetchen "tale of two cities"

Search by author:

gutenfetchen --author "joseph conrad"

Combine author + title filter:

gutenfetchen "heart" --author "joseph conrad"

Download random e-texts:

gutenfetchen --random 5

Preview without downloading:

gutenfetchen --author "jane austen" --dry-run

Limit results and set output directory:

gutenfetchen --author "mark twain" --n 3 -o ./my_texts/

Keep Gutenberg boilerplate (skip cleaning):

gutenfetchen "moby dick" --no-clean

Clean existing files on disk:

gutenfetchen clean ./gutenberg_texts/
gutenfetchen clean file1.txt file2.txt
gutenfetchen clean --dry-run ./gutenberg_texts/

The clean subcommand runs the same boilerplate-stripping pipeline used during download. It is idempotent — running it on already-clean texts leaves them unchanged.

Options

positional:
  title                  Search by title (e.g., 'tale of two cities')

options:
  --author NAME          Search by author name (e.g., 'joseph conrad')
  --random N             Download N random e-texts
  --n N                  Maximum number of texts to download
  -o, --output-dir DIR   Output directory (default: ./gutenberg_texts/)
  --dry-run              List matching books without downloading
  --no-clean             Skip stripping Project Gutenberg boilerplate

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gutenfetchen-1.2.1.tar.gz (29.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gutenfetchen-1.2.1-py3-none-any.whl (42.7 kB view details)

Uploaded Python 3

File details

Details for the file gutenfetchen-1.2.1.tar.gz.

File metadata

  • Download URL: gutenfetchen-1.2.1.tar.gz
  • Upload date:
  • Size: 29.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.11.9 Darwin/24.6.0

File hashes

Hashes for gutenfetchen-1.2.1.tar.gz
Algorithm Hash digest
SHA256 d54cadc9464e81b609ab33ff211d4847f88f5628da8b31b46b61a5ccb04f21ab
MD5 8baabf13c6a1afe87afee47d2345db79
BLAKE2b-256 2a1b04609c73189121133da6b1a193dff6a8febf07b56f327205a95621a107b0

See more details on using hashes here.

File details

Details for the file gutenfetchen-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: gutenfetchen-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 42.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.11.9 Darwin/24.6.0

File hashes

Hashes for gutenfetchen-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 49ac4a2aab2cb14e940ca877ae3e79bd68894bab1e743ce45fe7dac8772e21cb
MD5 1bfb2909a580025224f03908674abc20
BLAKE2b-256 6b6a3f05dc47af613ae4bcbc20303a6763736b9756f22ae91d03a9f97ce33be4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page