Skip to main content

OOXML-level extensions for python-docx: style cascade, controls, fields, comments, layout, bookmarks, notes, publishing.

Project description

docx_plus

OOXML-level extensions for python-docx. Composes with python-docx rather than replacing it: callers keep their Document object and use docx_plus for the operations python-docx can't reach.

Capabilities (v0.1, v0.2, and v0.2 in-place expansion):

  • Style cascade: read the effective formatting that would apply to any paragraph/run/cell, with per-field provenance; modify styles in the Word-native way rather than scattering direct formatting.
  • Content controls: build text / dropdown / date / checkbox controls with FormBuilder; read their values back; round-trip them through save/reopen.
  • Fields: insert PAGE / NUMPAGES / DATE / generic complex fields; mark fields dirty so Word recalculates them on next open.
  • Protection: enforce form-fill, read-only, comments-only, or tracked-changes mode at the document level.
  • Anchored comments (v0.2): the body-side range markers python-docx skips, so "show in document" actually works.
  • Layout: multi-column sections, mid-document section breaks, distinct even/odd headers (v0.2).
  • Bookmarks + cross-references: paired body markers plus REF / PAGEREF fields (v0.2).
  • Footnotes + endnotes: insert-only API backed by the separate footnotes.xml / endnotes.xml parts; in-place edits via edit_footnote / edit_endnote (v0.2).
  • Layout extras (continued): line numbering (set_line_numbering) and page borders (set_page_borders + Border dataclass) (v0.2).
  • Conditional table-style formatting: the cascade resolver applies <w:tblStylePr> branches (firstRow, lastRow, banded fills, corners) in ECMA-376 17.7.6.5 precedence order (v0.2).
  • Publishing primitives: Table of Contents (add_toc), figure / table captions (add_caption), Table of Figures (add_table_of_figures) (v0.2).

Status: v0.2.0 is the current release, published on PyPI. Read SPEC.md for the API contract and IMPLEMENTATION.md for the build plan.

Install (development)

git clone https://github.com/thomas-villani/docx-plus.git
cd docx-plus
uv sync --extra dev      # or: pip install -e ".[dev]"

60-second quickstart

Inspect: why does this paragraph look the way it does?

from docx import Document
from docx_plus.styles import resolve_effective_formatting

doc = Document("report.docx")
p = doc.paragraphs[0]

resolved = resolve_effective_formatting(p, include_provenance=True)
print(resolved.style_name)              # e.g. "Title"
print(resolved.font_size)               # e.g. 28.0  (points)
print(resolved.bold)                    # True / False / None
print(resolved.provenance["font_size"]) # FormattingSource(layer='paragraphStyle', ...)

ResolvedFormatting carries every formatting field that the OOXML cascade can set — font_name, font_size, bold, italic, color_rgb, alignment, indent_*, spacing_*, line_spacing, plus run-level toggles. With include_provenance=True, every populated field is keyed in .provenance to the cascade layer (and style ID) that contributed it. That's how you answer "why is this paragraph 14pt italic?" — the provenance tells you exactly which style in the basedOn chain set the size and whether the italic came through XOR.

Modify: define a custom heading and apply it

from docx import Document
from docx_plus.styles import create_style, apply_style

doc = Document()
create_style(
    doc, "BrandHeading",
    style_type="paragraph",
    based_on="Heading1",
    font_name="Inter",
    font_size=18.0,
    color_rgb="2F5496",
    bold=True,
    spacing_after=240,
)

p = doc.add_paragraph("Hello, world")
apply_style(p, "BrandHeading")
doc.save("out.docx")

This is the Word-native workflow: define a style, apply it. Changing the style later changes every paragraph that uses it, not just the ones you remember to update.

Ensure: materialise a built-in latent style

Word's built-ins (Heading1Heading9, Title, Quote, TOC1TOC9, FootnoteText, BlockText, PlainText, …) are latent — defined by Word's defaults but not actually present in styles.xml until they're used. ensure_style knows about 107 of them, with defaults extracted from real Word-saved samples (not guessed):

from docx import Document
from docx_plus.styles import ensure_style, apply_style

doc = Document()
ensure_style(doc, "Heading1")           # idempotent — materialises if absent
ensure_style(doc, "Heading1")           # ...no-op the second time
ensure_style(doc, "TOC2")               # also works for less-common built-ins
ensure_style(doc, "BlockText")
apply_style(doc.add_paragraph("Intro"), "Heading1")

The full list is tiered in Architecture §5 — Core/A–G cover essentially every style a Word user reaches for.

For documents authored elsewhere where IDs may not match (e.g. style named "Heading 1" with a space), ensure_style(doc, "Heading1", match_existing=True) will find the existing definition via case- and space-insensitive matching, or use remap_styles for document-wide normalisation.

Forms: build a fillable document with FormBuilder

from docx_plus.controls import FormBuilder

fb = FormBuilder()  # or FormBuilder("template.docx")
fb.doc.add_heading("New employee form", level=1)

p = fb.doc.add_paragraph("Full name: ")
fb.add_text_control(p, tag="full_name", placeholder="Type your name")

p = fb.doc.add_paragraph("Department: ")
fb.add_dropdown(p, tag="dept", items=["Engineering", "Design", "Ops"])

p = fb.doc.add_paragraph("Start date: ")
fb.add_date_picker(p, tag="start_date", date_format="M/d/yyyy")

p = fb.doc.add_paragraph("Remote? ")
fb.add_checkbox(p, tag="remote", checked=False)

fb.save("form.docx")

Read or update an existing form's values with read_controls / set_control_value:

from docx import Document
from docx_plus.controls import read_controls, set_control_value

doc = Document("form.docx")
set_control_value(doc, "full_name", "Ada Lovelace")
set_control_value(doc, "dept", "Engineering")
doc.save("form_filled.docx")

values = read_controls(Document("form_filled.docx"))
print(values["full_name"].value)   # 'Ada Lovelace'
print(values["dept"].value)        # 'Engineering'

Fields and protection: page numbers + lock-down

from docx import Document
from docx_plus.fields import add_page_number_field, mark_fields_dirty
from docx_plus.protection import protect_document

doc = Document()
p = doc.add_paragraph("Page ")
add_page_number_field(p)
p.add_run(" of ")
add_page_number_field(p, field="NUMPAGES")

mark_fields_dirty(doc)               # Word recalculates fields on open
protect_document(doc, mode="forms")  # only content controls editable

doc.save("report.docx")

add_date_field and the generic add_field(instruction=..., initial_text=...) cover dates and any other complex field (TOC, REF, MERGEFIELD, …). unprotect_document(doc) removes any protection; is_protected(doc) is a one-liner predicate.

Comments: anchor reviewer feedback to specific runs

from docx import Document
from docx_plus.comments import add_comment, read_comments

doc = Document()
p = doc.add_paragraph()
p.add_run("Project Apollo ")
target = p.add_run("ships next quarter")
p.add_run(".")

add_comment(target, "Optimistic — let's see what QA says.", author="Alice")
doc.save("review.docx")

for c in read_comments(Document("review.docx")):
    print(f"{c.author}: {c.text!r} on {c.anchored_text!r}")

add_comment accepts a Run, a Paragraph (wraps every run), or a (start_run, end_run) tuple for ranges. Unlike python-docx's Comments.add_comment (which only writes the part-side body), docx_plus writes the three body-side anchors — so "show in document" actually jumps to the right place.

Layout: columns and mid-document section breaks

from docx import Document
from docx_plus.layout import (
    enable_distinct_even_odd_headers,
    insert_section_break,
    set_columns,
)

doc = Document()
doc.add_heading("Intro (single-column)", level=1)
split = doc.add_paragraph("Section break here ↓")

new_section = insert_section_break(split, start_type="continuous")
set_columns(new_section, 2, space=720, separator=True)

doc.add_heading("Body (two-column)", level=1)
for _ in range(10):
    doc.add_paragraph("Lorem ipsum…")

enable_distinct_even_odd_headers(doc)  # doc-level settings.xml flag
doc.save("multicol.docx")

Bookmarks + cross-references

from docx import Document
from docx_plus.bookmarks import add_bookmark, add_cross_reference
from docx_plus.fields import mark_fields_dirty

doc = Document()
heading = doc.add_heading("Introduction", level=1)
add_bookmark(heading, "intro_section")

p = doc.add_paragraph("See ")
add_cross_reference(p, bookmark="intro_section", kind="text")
p.add_run(" on page ")
add_cross_reference(p, bookmark="intro_section", kind="page")

mark_fields_dirty(doc)               # Word recalculates REF / PAGEREF
doc.save("xref.docx")

Footnotes and endnotes

from docx import Document
from docx_plus.notes import add_footnote, add_endnote

doc = Document()
p = doc.add_paragraph("This claim has a footnote")
add_footnote(p, "Sourced from internal benchmarks, 2026-05-19.")
add_endnote(p, "Re-validated against external dataset Q3 2026.")
doc.save("notes.docx")

The footnotes part (word/footnotes.xml) is created on first use and round-trips with parsed XML — re-opening the saved document and adding another footnote inherits the existing ids correctly. Edit existing notes in place via edit_footnote(doc, id, text) / edit_endnote(doc, id, text); the reference marker stays put.

Line numbering and page borders

from docx import Document
from docx_plus.layout import Border, set_line_numbering, set_page_borders

doc = Document()
set_line_numbering(doc.sections[0], count_by=5, restart="newPage")

rule = Border(style="single", size=8, color="2F5496", space=24)
set_page_borders(
    doc.sections[0], top=rule, bottom=rule, left=rule, right=rule,
)
doc.save("formal.docx")

Publishing: TOC, captions, Table of Figures

from docx import Document
from docx_plus.fields import mark_fields_dirty
from docx_plus.publishing import add_caption, add_table_of_figures, add_toc

doc = Document()
doc.add_heading("Contents", level=1)
add_toc(doc.add_paragraph(), levels=(1, 2))

doc.add_heading("Architecture", level=1)
doc.add_paragraph("High-level diagram below.")
cap = doc.add_paragraph()
add_caption(cap, "Figure ", caption_type="Figure")
cap.add_run(": System overview.")

doc.add_heading("List of Figures", level=1)
add_table_of_figures(doc.add_paragraph(), caption_type="Figure")

mark_fields_dirty(doc)  # Word populates TOC / SEQ / ToF on open
doc.save("paper.docx")

What's next

v0.2 ships the feature modules listed at the top of this README, plus the in-place expansion (line numbering, page borders, conditional table-style formatting, comment / note editing, and the publishing module). The v0.3 deferred list (SPEC §15) tracks what comes after — w15 threaded comments (respond / resolve / reopen), STYLEREF / sequence-field cross-references, a CLI (restyle / inspect / controls), content-control data binding to Custom XML Parts, bibliography (citations + BIBLIOGRAPHY field), tracked changes, glossary placeholder text, and password-protected forms. Open an issue if your use case needs any of these.

Build history (for contributors)
  • v0.1.0 — complete: foundation (core/), style inspection + modification + remapping (styles/), content controls (controls/), fields + document protection (fields/, protection/), and release polish (examples, LibreOffice smoke tests, CI doc build).
  • v0.2.0 — complete: core/parts, comments/, layout/, bookmarks/, notes/, plus the in-place expansion (toggle properties, in-place edit verbs, line numbering, page borders, conditional table styles, and the publishing/ module).

The per-phase log with dates lives in IMPLEMENTATION.md §12.

Documentation

Full docs (rendered by MkDocs + mkdocstrings) are published at https://thomas-villani.github.io/docx-plus/.

  • Architecture — module layout, cascade algorithm, schema-strict insertion, error hierarchy, invariants
  • API Index — hand-curated index of every public symbol with links to the auto-generated reference
  • Agent skill for LLM coding agents: skills/docx-plus/ — point Claude Code (or any agent) at it to generate docx_plus automation. Overview at docs/SKILLS
  • Test Gaps — honest accounting of where the test suite has real holes (snapshot at end of Phase 5)
  • Per-module API reference lives under https://thomas-villani.github.io/docx-plus/reference/; uv run mkdocs serve to browse locally.

License

MIT. Copyright (c) 2026 Tom Villani, PhD. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docx_plus-0.2.1.tar.gz (208.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docx_plus-0.2.1-py3-none-any.whl (128.6 kB view details)

Uploaded Python 3

File details

Details for the file docx_plus-0.2.1.tar.gz.

File metadata

  • Download URL: docx_plus-0.2.1.tar.gz
  • Upload date:
  • Size: 208.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docx_plus-0.2.1.tar.gz
Algorithm Hash digest
SHA256 73d578d8ad7c8bf969db50e599dac46017823be2b7db75110d8ffeb98a7bd201
MD5 a8350c8c79800dd1dba368c02661d2f5
BLAKE2b-256 d457a43d5e692cdd8f09b79a78f2f6d9ca9b1236c495662e3e64dc2182256d82

See more details on using hashes here.

Provenance

The following attestation bundles were made for docx_plus-0.2.1.tar.gz:

Publisher: release.yml on thomas-villani/docx-plus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docx_plus-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: docx_plus-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 128.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for docx_plus-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c3b1734afa892e749b8f4a887699adad63ecdfae5254d302c94aee366755bbbf
MD5 2b390b73df95261454485d1d9e635f41
BLAKE2b-256 1bc92d10740a8706c177d77fd5d9d2f33dfed2a71887baea34949b81ba9ed5a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for docx_plus-0.2.1-py3-none-any.whl:

Publisher: release.yml on thomas-villani/docx-plus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page