Skip to main content

Deterministic CLI that prepares Markdown and EPUB documents for translation by a coding agent.

Project description

PyPI - Version PyPI - Python Version PyPI - Downloads codecov

booktx

booktx is a deterministic local CLI that prepares Markdown and EPUB documents for translation by a coding agent or human translator.

It:

  • extracts source text into stable record chunks,
  • tracks progress and translation versions,
  • hands out safe translation tasks,
  • validates submissions,
  • rebuilds translated output.

booktx never translates text itself and makes no network calls.

Install

pip install -e .

For development and docs:

python -m pip install -e ".[dev,docs]"

Python 3.10+ is supported.

Core model

Profile = hard boundary for mutable translation state
Access mode = determines whether sibling profiles are visible
Version = history/candidate boundary inside that profile

.booktx/ now holds only shared source-derived state. Mutable translation state lives under translations/<profile>/.

Project layout

book/
  source/
    book.epub

  .booktx/
    source-config.toml
    source-manifest.json
    names.json
    chapter-map.json
    profile-state.json
    chunks/

  translations/
    de_gpt5_5/
      .booktx-profile.json
      config.toml
      identity.json
      context.json
      context.md
      translation-store.json
      translation-version-ledger.json
      tasks/
      ingest/
      translated/
      reports/
      output/
        book.de.epub

Quickstart

booktx init ./demo --source-file book.epub --source-lang en
booktx extract ./demo

booktx profile create ./demo de_gpt5_5 \
  --target de \
  --target-locale de-DE \
  --model codex-openai/gpt-5.5@low \
  --select

booktx context init ./demo --profile de_gpt5_5 --non-interactive
booktx context questions ./demo --profile de_gpt5_5
# Ask the user to approve or edit answers before continuing.
booktx context approve ./demo --profile de_gpt5_5 Q001 --text "<USER_APPROVED_TEXT>" --approved-by "user:<USER>"
booktx context render ./demo --profile de_gpt5_5 --write
booktx context mark-ready ./demo --profile de_gpt5_5

booktx translate next ./demo \
  --profile de_gpt5_5 \
  --unit batch \
  --max-words 800 \
  --format block

booktx translate insert ./demo \
  --profile de_gpt5_5 \
  --task-id TASK \
  --file translations/de_gpt5_5/ingest/TASK.block.txt \
  --format block

booktx validate ./demo --profile de_gpt5_5
booktx build ./demo --profile de_gpt5_5

Collaborative vs isolated profile-root mode

booktx supports two deliberate access modes:

  1. Collaborative project-root mode: start the harness at the book project root when you need profile administration, profile comparison, or other cross-profile work.
  2. Isolated profile-root mode: start the harness inside translations/<profile>/ when you want unbiased model evaluation without normal booktx workflows revealing sibling profiles.

Profile-root isolation is booktx-mediated isolation, not OS sandboxing. It depends on two things:

  • the harness starts inside translations/<profile>/ and blocks parent paths, absolute paths, sibling profile paths, shell globs, and arbitrary filesystem inspection snippets;
  • booktx commands are used with project argument . and do not print parent or sibling paths.

If a profile-root command suggests ../, prints an absolute path, or reveals a sibling profile, stop and report a booktx isolation bug.

Isolated evaluation workflow

From book/translations/de_gpt5_5/:

booktx mode .
booktx doctor isolation .
booktx source status .
booktx context status .
booktx translate next . --unit batch --max-words 800 --format block
booktx translate insert . --task-id TASK --file ingest/TASK.block.txt --format block
booktx validate .
booktx build .

In this mode, booktx automatically binds the current profile, brokers source access internally, and renders profile-local paths such as tasks/..., ingest/..., reports/..., and output/....

Bounded agent runs

When asking an agent to continue for several chapters, create a durable todo:

booktx translate todo-next ./demo \
  --profile de_gpt5_5 \
  --chapters 3 \
  --batch-words 800 \
  --max-run-words 12000 \
  --write

This writes a todo file (not translations) under translations/<profile>/todos/. Continue bounded runs with:

booktx translate todo-status ./demo --profile de_gpt5_5 --latest
booktx translate todo-resume ./demo --profile de_gpt5_5 --latest --format block
booktx check ./demo --profile de_gpt5_5 --fail-on-warnings

Single large chapters

If the user asks to finish a single chapter that has more than the safe task budget (default 800 source words), booktx automatically creates or reuses a single-chapter todo and returns bounded batch tasks:

booktx translate next ./demo --chapter 0005 --unit chapter --max-words 800

This creates a todo for chapter 0005 and returns the first bounded batch. Continue with booktx translate todo-resume until the chapter completes.

To override this behavior and force a whole-chapter task:

booktx translate next ./demo --chapter 0005 --unit chapter --force-chapter

After each chapter, run booktx check before adding the chapter note:

booktx check ./demo --profile de_gpt5_5 --chapter 0005 --fail-on-warnings
booktx context chapter-note ./demo --profile de_gpt5_5 0005 ...

--max-run-words is advisory only: it tells the agent when to stop and report progress, but booktx does not hard-stop accepted work at that threshold. Prefer batches over chapter-sized tasks. not create a new dotted translation version. Dotted versions track baseline policy changes such as style, glossary, answered questions, global rules, readiness, source metadata, language metadata, or actor/model track changes.

Final release output

For final release output, prefer:

booktx validate ./demo --profile de_gpt5_5 --fail-on-warnings
booktx build ./demo --profile de_gpt5_5 --require-complete

Editor QA indexes

Refresh editor-friendly indexes:

booktx translate export-index ./demo --profile de_gpt5_5

This writes:

  • translations/de_gpt5_5/source-index.json -- source text only, best for reading/searching the original source inside the profile, including isolated profile runs.
  • translations/de_gpt5_5/target-index.json -- target text only, best for searching translated terms without English source false positives.
  • translations/de_gpt5_5/source-target-index.json -- slim source/target side-by-side view, best for scanning translation fit in an editor.
# Search only the original source language.
rg "Wasp" translations/de_gpt5_5/source-index.json

# Search only translated German target text.
rg "Wespen" translations/de_gpt5_5/target-index.json

# Scan source and target side by side.
nvim translations/de_gpt5_5/source-target-index.json

# Inspect canonical state for a hit.
booktx translation get-record ./demo 0014-000029 --profile de_gpt5_5 --json

All three files are generated artifacts. Do not edit them manually. The canonical state remains translation-store.json.

Pass-through validation profile

Use a pass-through profile to verify that extraction and EPUB reconstruction include all text before doing real translation:

booktx extract ./demo
booktx pass-through ./demo --profile passthrough_en --create

This writes source-as-target translated chunks under translations/passthrough_en/translated/, validates complete coverage, and builds translations/passthrough_en/output/.... Compare the output EPUB against source/book.epub with an EPUB diff viewer. The included EPUB fixture should be byte-identical, but real-world EPUBs should be treated as reconstruction checks, not guaranteed byte-for-byte copies. Never run pass-through against a real translation profile.

Multiple profiles

Create one profile per target language, model experiment, or hard-isolated context experiment. Two profiles can target the same language with different models, or the same model with different languages:

booktx profile create ./demo de_gpt5_5 --target de --model codex-openai/gpt-5.5@low
booktx profile create ./demo de_glm_5_2 --target de --model glm-5.2
booktx profile create ./demo fr_gpt5_5 --target fr --model codex-openai/gpt-5.5@low

Profile resolution

When a command needs a single profile, booktx resolves it in this order:

--profile wins; otherwise the active profile; otherwise exactly one profile;
otherwise fail for target-state commands.

If a project has more than one profile, always pass --profile.

Live identity

profile list and profile show render the current identity from translations/<profile>/identity.json, which is updated by booktx model set, actor set, and harness set. The identity embedded in config.toml is only the initial default captured at creation.

Legacy projects

Old single-layout projects can be migrated in place:

booktx profile migrate-current ./demo de_gpt5_5 --select

CLI identity overrides (--model, --actor, --harness) are honored over any legacy .booktx/identity.json.

Common commands

booktx status ./demo
booktx status ./demo --profile de_gpt5_5
booktx mode ./demo
booktx profile list ./demo
booktx profile show ./demo de_gpt5_5
booktx whoami ./demo --profile de_gpt5_5
booktx version current ./demo --profile de_gpt5_5
booktx translate task-status ./demo --profile de_gpt5_5 --task-id TASK
booktx translation compare ./demo --profile de_gpt5_5 74@38 --versions 1.1,1.2
booktx profile compare ./demo --profiles de_gpt5_5,de_glm_5_2 --record 0001-000001
booktx source status ./demo

booktx translate next also snapshots the exact effective task context under translations/<profile>/context-history/views/<sha>/. New tasks carry both the baseline version (for example 1.2) and the immutable context-view evidence used for that task, and accepted candidates preserve that evidence.

Translation contract

  • record ids must stay unchanged;
  • placeholders must stay unchanged;
  • targets must be non-empty;
  • submissions must stay in the selected profile;
  • translations/<profile>/translation-store.json is the primary record-level state;
  • translations/<profile>/translated/*.json is compatibility/export output.

Documentation

Context approval

booktx never decides translation policy by itself. An agent may propose context answers, but the user must approve them before translation begins. Do not use context mark-ready --force during normal translation work.

EPUB inline XHTML records

EPUB records may expose constrained inline XHTML fragments such as <em>, <strong>, <span class="...">, <a href="...">, <sup>, <sub>, or <code>. Translators must preserve tags and attributes around the equivalent target-language phrase and must not replace XHTML with Markdown markers.

Quality review commands

Quality review is an optional workflow that improves already-accepted translations:

  • booktx review status . -- report review coverage
  • booktx review next . --pass 1 -- create a review task for pass 1
  • booktx review insert . --review-task-id TASK --file reviews/TASK.block.txt -- accept review results
  • booktx review activate . RECORD R1.2 -- manually activate a review candidate

Review candidates are stored separately from translation versions in translations/<profile>/reviews/. The effective output resolves as active_review (if valid) -> active_version -> missing.

Enable quality review by adding [quality_review] to the profile's config.toml. See docs/profiles.md for configuration reference.

Glossary correction

# Fix wrong forbidden targets (replaces, doesn't append).
booktx context add-term . "empire" --target "Imperium" --forbid "Reich" --forbid "Empire"

# Remove a wrong entry.
booktx context remove-term . "empire"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

booktx-0.3.0.tar.gz (279.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

booktx-0.3.0-py3-none-any.whl (188.4 kB view details)

Uploaded Python 3

File details

Details for the file booktx-0.3.0.tar.gz.

File metadata

  • Download URL: booktx-0.3.0.tar.gz
  • Upload date:
  • Size: 279.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for booktx-0.3.0.tar.gz
Algorithm Hash digest
SHA256 46aecb668fb52d76b2dc04745f0a5d7156b479867138ac185050b06a8fe735e8
MD5 7adcab09a676ffa04c7e37c63c61ba6e
BLAKE2b-256 67598fe663d7bd8d5416a3f11406ea93e9ad9180252deec3993218ba2c97c63a

See more details on using hashes here.

File details

Details for the file booktx-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: booktx-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 188.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for booktx-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d3b1c6dfb8846b8e2b9115ff6de65975c66830a23c1ea23ea7dc37a28d1cb0be
MD5 01528222c0180932d8ee4d722c834553
BLAKE2b-256 d668dd5021d5b403186f8406b30f668c0f63d947c206a2062c5499ce6d7f1b9d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page