Read the essential book: create extractive abridgements that preserve the author's original passages.
Project description
Book Condenser
Read the Essential Book
Book Condenser creates an extractive abridgement of a nonfiction book. An AI model identifies the original passages that carry the book's central argument, evidence, concepts, turning points, and conclusions. The software then assembles those passages verbatim into a shorter, beautifully formatted reading edition.
This approach preserves what makes a serious book valuable: the author's reasoning, voice, and choice of evidence. Many nonfiction books develop their core ideas through repetition, extended examples, and supporting detail. By retaining the passages that do the essential intellectual work, Book Condenser makes the book more efficient to read while keeping the reader in direct contact with the original text.
The result is a condensed, tablet-friendly PDF designed for focused reading: shorter than the source, richer than a summary, and faithful to the author.
This tool is intended for books you own the rights to process, public-domain works, or other material you are legally allowed to transform and store. Generated outputs may contain substantial verbatim source text.
Features
- Supports EPUB, PDF, DOCX, TXT, and Markdown input.
- Validates parsing with
--parse-onlybefore making API calls. - Preserves chronology and argument structure through subtype-aware selection rules.
- Protects broad coverage with
--coverage-mode alland per-section concentration limits. - Produces
reading_abridgement.pdfas the primary reader-facing output. - Writes audit artifacts so users can inspect selected passages, scores, coverage, and quality-control decisions.
Installation
From PyPI after release:
pip install book-condenser
For local development from a checkout:
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
Set your OpenAI API key in the environment before running the full pipeline:
export OPENAI_API_KEY="your-api-key-here"
You can also set OPENAI_MODEL; otherwise the CLI defaults to gpt-5-mini.
Quick Start
Validate parsing before any API calls:
book-condenser path/to/public-domain-book.epub \
--output-dir out/example \
--parse-only
Review out/example/parsed_structure_report.md. Continue only if chapter and back-matter detection look plausible.
Generate a reading edition:
book-condenser path/to/public-domain-book.epub \
--output-dir out/example \
--target-ratio 0.25 \
--coverage-mode all \
--chapter-max-share 0.08 \
--apply-qc
For PDFs with unreliable bookmarks, provide a manual chapter map:
book-condenser path/to/public-domain-book.pdf \
--chapter-map examples/chapter_map.json \
--output-dir out/example \
--parse-only
The root book_condenser.py file is a compatibility launcher. Prefer the installed book-condenser command for normal use.
Key Controls
| Argument | Purpose | Default |
|---|---|---|
--target-ratio |
Target proportion of source words retained | 0.25 |
--candidate-ratio |
Candidate pool before global pruning | 0.42 |
--coverage-mode |
Section coverage rule: all, major, or none |
all |
--chapter-max-share |
Maximum nominal share of final text from one chapter | 0.08 |
--chapter-map |
Manual PDF section/page map when bookmarks are unreliable | none |
--parse-only |
Validate structure and cleanup without API calls | off |
--apply-qc |
Apply final model review within constraints | off |
--pdf-page-size |
small-tablet, a5, or large-tablet |
small-tablet |
--pdf-font-size |
Body type size between 11 and 20 pt | 14.0 |
--pdf-font |
auto, georgia, dejavu serif, or times |
auto |
--no-docx |
Skip optional DOCX output | off |
Outputs
out/example/
parsed_structure_report.md
book_metadata.json
book_paragraphs.jsonl
structural_overview.json
chapter_candidates/
scored_candidates.json
global_selection.json
quality_control.json
selection_audit.md
reading_abridgement.md
reading_abridgement.pdf
reading_abridgement.docx
reading_abridgement.pdf is the primary reading edition. selection_audit.md records subtype classification, chapter balance, selected passage functions, scores, protected anchors, and locations.
Treat the entire output directory as private by default. It can contain verbatim source text, local paths, and model-generated analysis.
Manual Chapter Map Format
Pages are 1-indexed. end_page is optional; when omitted, the next section's start_page - 1 is used.
[
{"title": "Prologue", "start_page": 1, "end_page": 8},
{"title": "Chapter One", "start_page": 9},
{"title": "Chapter Two", "start_page": 28},
{"title": "Bibliography", "start_page": 410}
]
Back matter headings are retained in the parse audit but excluded from selection and source-word budgeting.
Source Format Guidance
Prefer EPUB when available. PDFs may require a manual chapter map and inspection of the parse-only report. If a PDF is scanned or image-only, run OCR first.
The parser supports EPUB 2 toc.ncx, EPUB 3 navigation documents, semantic back-matter signals, anchored subsections, PDF bookmarks, visible-heading fallback, and common PDF text cleanup.
Cost and Privacy
Full runs send selected source excerpts and structural context to the configured OpenAI model. Use --parse-only to inspect local parsing before any API calls. Larger books, higher --candidate-ratio, and --apply-qc increase token usage and cost.
Do not process confidential, copyrighted, or sensitive books unless your API/provider settings and legal rights allow that use.
Development
Run checks locally:
ruff check .
pytest
python -m build
twine check dist/*
The package exposes book-condenser as a console script and python -m book_condenser as a module entry point.
Release Checklist
- Confirm the repository root is this project directory, not a parent home directory.
- Verify no
.env,books/,out/, generated abridgements, or copyrighted fixtures are tracked. - Run
ruff check .,pytest,python -m build, andtwine check dist/*. - Configure PyPI trusted publishing for
khalidlabs/book-condenserusing thePublish to PyPIworkflow. - Publish a GitHub release or run the publish workflow manually after package install and CLI smoke tests pass.
License
Book Condenser is licensed under the PolyForm Noncommercial License 1.0.0. Commercial use is not permitted by this license without a separate commercial license from the licensor.
Disclaimer
Book Condenser is provided as-is and does not provide legal advice. You are responsible for ensuring that your source material and generated outputs comply with copyright law, contract terms, platform policies, and any other obligations that apply to your use.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file book_condenser-0.1.0.tar.gz.
File metadata
- Download URL: book_condenser-0.1.0.tar.gz
- Upload date:
- Size: 41.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e636a287f85bab4b9f594cde256ad817b197b4d4bc8621edcbd4b014398c4fc
|
|
| MD5 |
469f0db4b83d9c09a6a63d520fb234f6
|
|
| BLAKE2b-256 |
ae236f8bed411511d039399be55d14cc3c7a9478d2b236814b11e5b3ba44e2da
|
Provenance
The following attestation bundles were made for book_condenser-0.1.0.tar.gz:
Publisher:
publish.yml on khalidlabs/book-condenser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
book_condenser-0.1.0.tar.gz -
Subject digest:
6e636a287f85bab4b9f594cde256ad817b197b4d4bc8621edcbd4b014398c4fc - Sigstore transparency entry: 1620785152
- Sigstore integration time:
-
Permalink:
khalidlabs/book-condenser@0dd60f1427db4bb5a653347153c871b1d5ab29db -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/khalidlabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0dd60f1427db4bb5a653347153c871b1d5ab29db -
Trigger Event:
release
-
Statement type:
File details
Details for the file book_condenser-0.1.0-py3-none-any.whl.
File metadata
- Download URL: book_condenser-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5574a1ec13d81a2330041b1f3f9b83fda9dcec46674b37e6b350d2b0036af94
|
|
| MD5 |
afe2150a56ce934055e8bbb27c3de6f3
|
|
| BLAKE2b-256 |
6668e07d180a360b2ed638a03f40104c02468f46c042b050f95ec348ad9e9735
|
Provenance
The following attestation bundles were made for book_condenser-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on khalidlabs/book-condenser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
book_condenser-0.1.0-py3-none-any.whl -
Subject digest:
f5574a1ec13d81a2330041b1f3f9b83fda9dcec46674b37e6b350d2b0036af94 - Sigstore transparency entry: 1620785351
- Sigstore integration time:
-
Permalink:
khalidlabs/book-condenser@0dd60f1427db4bb5a653347153c871b1d5ab29db -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/khalidlabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0dd60f1427db4bb5a653347153c871b1d5ab29db -
Trigger Event:
release
-
Statement type: