Skip to main content

A library for compiling text files into a book-form docx file

Project description

catbook

A very simple docx file builder. Catbook was created to make managing book chapters simple. The goal was a minimal-markup way to concatenate text files into Word docs that could be converted to epub, mobi, pdf, etc.

The tool needed to:

  • Allow chapters to be quickly rearranged
  • Allow multi-section chapters
  • Offer a trivially easy way to differentiate quotes, blocks, and special words
  • Support three levels of hierarchy
  • Include only the absolute minimum of markup and functionality

Bookfiles

Catbook reads a flat list of text files from a .bookfile and concatenates them into a Word doc. The doc may have up to three levels. The levels are titled using Word styles.

Metadata about the files that are concatenated into the docx is available from the Book object and each section.

Bookfiles can include several things besides paths to text files.

  • Comments as lines starting with #
  • TITLE and AUTHOR to be shown in the book's metadata
  • INCLUDE of preexisting docx
  • A METADATA directive that inserts a page with a table containing the author, title, bookfile path, word count and other metadata.

For e.g.

#
# this is a complete bookfile
# TITLE: This is my book
# AUTHOR: John Doe
#
# INSERT: an-existing/file.docx
#
filesdir/section-1.txt
morefiles/section-2.txt
# INSERT: another/file.docx
still/morefiles/section-2.txt
#
# METADATA
#

Text files

Sections

Each text file that is concatenated into the docx is a "section". Sections have two parts:

  • The first line
  • All other lines

The first line is presented as a title, subject to the markup described below. Every other line becomes a paragraph.

Catbook skips blank lines. If the first line is blank the section will have no title to distinguish it from the section before it. A sequence of blank lines is no different than a single blank line.

Note that while in general blank lines are skipped and have no effect, in rare cases a blank line at the bottom of the doc will cause Word to insert a blank page. This can happens when the number of non-blank lines exactly fits the page.

Comments

Any line that begins with a # is considered a comment. Comment lines are skipped. There can be any number of comment lines before the title line; the first non-comment line is considered the title line.

Each comment will be checked for directives.

The INCLUDE IMAGE directive includes an image. Images are centered in a paragraph. The directive is in the form:

# INCLUDE IMAGE: path/to/my/image.png

The METADATA directive prints the section metadata collected to that point. The directive looks like:

# METADATA

The MARK directive prints a file and line number indicating what file and line the directive was positioned. This is intended to help identify where a point in the text is located in the files being concatenated. Adding a MARK to files is useful when there is a series of files without title lines. Use the directive like:

# MARK

Markups

There are a very small number of markups to do things like italicize quotes, force a page break between sections, etc. Markup chars and fonts are minimally customizable using .ini files. See catbook/markup.py and catbook/fonts.py.

  • Book title: ~~

A book title is the first line of a text file. The markup must be the first char. Book titles are the top grouping unit in the same way that a first-level heading in a docx is the top of a TOC. Book titles contain chapters and sections.

~~Book One: A New Hope
  • Chapter title: ~

A title is the first line of a text file. The markup must be the first char. Chapter titles are a 2nd level grouping that is below a book and above section

~Chapter ten: In which a storm gathers
  • Stand-alone section: >

This markup must be the first char of the first line of a text file. It forces the section to start on a new page

>1918: Vienna
In 1918 the empire slept...
  • Jump: ***

A jump is on the first line of a text file. Jumps creates a break within a chapter by adding an untitled section. The section is separated from the preceding section by an indicator called an asterism. Most commonly the asterism is three widely spaced stars. The asterism text is set as the ASTERISM.

***
In this section I will show that...
  • Asterism: * ⁂ *

The asterism is a section separator that is inserted when the JUMP markup is seen.

  • Block: |

A block may start on any line. The markup must be the first char. Blocks are text that is set off from the rest of the paragraphs in a different font.

The letter said
|Dear Jack.
|I hope you've been well.

  • Quoted line: "

A quote may start on any line. The markup must be the first char. A quote is another type of block. This markup is also useful for forcing a blank line. To make a blank line put the markup in the first char of an otherwise empty line.

"Hey!
Jack said. But it was quiet.

"
Eventually there was a sound.
  • Highlighted text: |

Put pipes around any word or words to highlight them. Assuming | is used for both highlights and blocks, if a highlight begins with the first word of a paragraph it looks like a block. In that case use a double highlight mark, as in:

||some highlighted words| that start a line.

There are more |highlighted words| in this line.

Usage

For usage, see main.py and/or test/test_builder.py.

This code creates a docx file called My Book.docx in the working directory. It uses the charles.bookfile to know what text files to concatenate. The text files live in the directories below test/config/texts/charles and the bookfile refers to them relative to that path.

from catbook import Builder

def main():
    builder = Builder()
    builder.init()
    builder.files.OUTPUT = "./My Book.docx"
    builder.files.INPUT = "test/config/charles.bookfile"
    builder.files.FILES = "test/config/texts/charles"

    builder.build()
    print(f"words: {builder.book.metadata.word_count}")

if __name__ == "__main__":
    main()

The output looks like this:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catbook-0.1.1.3.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

catbook-0.1.1.3-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file catbook-0.1.1.3.tar.gz.

File metadata

  • Download URL: catbook-0.1.1.3.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.12.1 Darwin/22.6.0

File hashes

Hashes for catbook-0.1.1.3.tar.gz
Algorithm Hash digest
SHA256 894055ed49063ae09ed52400814f16dfba619dc005e43c52a4bad9986348cdb0
MD5 2646ac7348b4dc4a9e42540cd8bc66af
BLAKE2b-256 63244735a0e66c3ed54f37fde53c7d62ac6cc26a4277baeb5b4e77ea8da79555

See more details on using hashes here.

File details

Details for the file catbook-0.1.1.3-py3-none-any.whl.

File metadata

  • Download URL: catbook-0.1.1.3-py3-none-any.whl
  • Upload date:
  • Size: 18.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.12.1 Darwin/22.6.0

File hashes

Hashes for catbook-0.1.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0bc55de70443cfeb1a540a3f7d1e43be891bd3a3d0f9ede3d1a0df37827a9b1c
MD5 18a49b11f65bed897d782e600991e9ed
BLAKE2b-256 e16e60f9d3766bf85b87222a2612964ed90a4063a2e126d623077f37347f2c7c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page