Skip to main content

Read the essential book: create extractive abridgements that preserve the author's original passages.

Project description

Book Condenser

Read the Essential Book

Book Condenser creates an extractive abridgement of a nonfiction book. An AI model identifies the original passages that carry the book's central argument, evidence, concepts, turning points, and conclusions. The software then assembles those passages verbatim into a shorter, beautifully formatted reading edition.

This approach preserves what makes a serious book valuable: the author's reasoning, voice, and choice of evidence. Many nonfiction books develop their core ideas through repetition, extended examples, and supporting detail. By retaining the passages that do the essential intellectual work, Book Condenser makes the book more efficient to read while keeping the reader in direct contact with the original text.

The result is a condensed, tablet-friendly PDF designed for focused reading: shorter than the source, richer than a summary, and faithful to the author.

This tool is intended for books you own the rights to process, public-domain works, or other material you are legally allowed to transform and store. Generated outputs may contain substantial verbatim source text.

Features

  • Supports EPUB, PDF, DOCX, TXT, and Markdown input.
  • Validates parsing with --parse-only before making API calls.
  • Preserves chronology and argument structure through subtype-aware selection rules.
  • Protects broad coverage with --coverage-mode all and per-section concentration limits.
  • Produces reading_abridgement.pdf as the primary reader-facing output.
  • Writes audit artifacts so users can inspect selected passages, scores, coverage, and quality-control decisions.

Installation

From PyPI after release:

pip install book-condenser

For local development from a checkout:

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Set your OpenAI API key in the environment before running the full pipeline:

export OPENAI_API_KEY="your-api-key-here"

You can also set OPENAI_MODEL; otherwise the CLI defaults to gpt-5-mini.

Quick Start

Validate parsing before any API calls:

book-condenser path/to/public-domain-book.epub \
  --output-dir out/example \
  --parse-only

Review out/example/parsed_structure_report.md. Continue only if chapter and back-matter detection look plausible.

Generate a reading edition:

book-condenser path/to/public-domain-book.epub \
  --output-dir out/example \
  --target-ratio 0.25 \
  --coverage-mode all \
  --chapter-max-share 0.08 \
  --apply-qc

For PDFs with unreliable bookmarks, provide a manual chapter map:

book-condenser path/to/public-domain-book.pdf \
  --chapter-map examples/chapter_map.json \
  --output-dir out/example \
  --parse-only

The root book_condenser.py file is a compatibility launcher. Prefer the installed book-condenser command for normal use.

Key Controls

Argument Purpose Default
--target-ratio Target proportion of source words retained 0.25
--candidate-ratio Candidate pool before global pruning 0.42
--coverage-mode Section coverage rule: all, major, or none all
--chapter-max-share Maximum nominal share of final text from one chapter 0.08
--chapter-map Manual PDF section/page map when bookmarks are unreliable none
--parse-only Validate structure and cleanup without API calls off
--apply-qc Apply final model review within constraints off
--pdf-page-size small-tablet, a5, or large-tablet small-tablet
--pdf-font-size Body type size between 11 and 20 pt 14.0
--pdf-font auto, georgia, dejavu serif, or times auto
--no-docx Skip optional DOCX output off

Outputs

out/example/
    parsed_structure_report.md
    book_metadata.json
    book_paragraphs.jsonl
    structural_overview.json
    chapter_candidates/
    scored_candidates.json
    global_selection.json
    quality_control.json
    selection_audit.md
    reading_abridgement.md
    reading_abridgement.pdf
    reading_abridgement.docx

reading_abridgement.pdf is the primary reading edition. selection_audit.md records subtype classification, chapter balance, selected passage functions, scores, protected anchors, and locations.

Treat the entire output directory as private by default. It can contain verbatim source text, local paths, and model-generated analysis.

Manual Chapter Map Format

Pages are 1-indexed. end_page is optional; when omitted, the next section's start_page - 1 is used.

[
  {"title": "Prologue", "start_page": 1, "end_page": 8},
  {"title": "Chapter One", "start_page": 9},
  {"title": "Chapter Two", "start_page": 28},
  {"title": "Bibliography", "start_page": 410}
]

Back matter headings are retained in the parse audit but excluded from selection and source-word budgeting.

Source Format Guidance

Prefer EPUB when available. PDFs may require a manual chapter map and inspection of the parse-only report. If a PDF is scanned or image-only, run OCR first.

The parser supports EPUB 2 toc.ncx, EPUB 3 navigation documents, semantic back-matter signals, anchored subsections, PDF bookmarks, visible-heading fallback, and common PDF text cleanup.

Cost and Privacy

Full runs send selected source excerpts and structural context to the configured OpenAI model. Use --parse-only to inspect local parsing before any API calls. Larger books, higher --candidate-ratio, and --apply-qc increase token usage and cost.

Do not process confidential, copyrighted, or sensitive books unless your API/provider settings and legal rights allow that use.

Development

Run checks locally:

ruff check .
pytest
python -m build
twine check dist/*

The package exposes book-condenser as a console script and python -m book_condenser as a module entry point.

Release Checklist

  1. Confirm the repository root is this project directory, not a parent home directory.
  2. Verify no .env, books/, out/, generated abridgements, or copyrighted fixtures are tracked.
  3. Run ruff check ., pytest, python -m build, and twine check dist/*.
  4. Configure PyPI trusted publishing for khalidlabs/book-condenser using the Publish to PyPI workflow.
  5. Publish a GitHub release or run the publish workflow manually after package install and CLI smoke tests pass.

License

Book Condenser is licensed under the PolyForm Noncommercial License 1.0.0. Commercial use is not permitted by this license without a separate commercial license from the licensor.

Disclaimer

Book Condenser is provided as-is and does not provide legal advice. You are responsible for ensuring that your source material and generated outputs comply with copyright law, contract terms, platform policies, and any other obligations that apply to your use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

book_condenser-0.1.0.tar.gz (41.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

book_condenser-0.1.0-py3-none-any.whl (37.7 kB view details)

Uploaded Python 3

File details

Details for the file book_condenser-0.1.0.tar.gz.

File metadata

  • Download URL: book_condenser-0.1.0.tar.gz
  • Upload date:
  • Size: 41.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for book_condenser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6e636a287f85bab4b9f594cde256ad817b197b4d4bc8621edcbd4b014398c4fc
MD5 469f0db4b83d9c09a6a63d520fb234f6
BLAKE2b-256 ae236f8bed411511d039399be55d14cc3c7a9478d2b236814b11e5b3ba44e2da

See more details on using hashes here.

Provenance

The following attestation bundles were made for book_condenser-0.1.0.tar.gz:

Publisher: publish.yml on khalidlabs/book-condenser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file book_condenser-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: book_condenser-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 37.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for book_condenser-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f5574a1ec13d81a2330041b1f3f9b83fda9dcec46674b37e6b350d2b0036af94
MD5 afe2150a56ce934055e8bbb27c3de6f3
BLAKE2b-256 6668e07d180a360b2ed638a03f40104c02468f46c042b050f95ec348ad9e9735

See more details on using hashes here.

Provenance

The following attestation bundles were made for book_condenser-0.1.0-py3-none-any.whl:

Publisher: publish.yml on khalidlabs/book-condenser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page