Skip to main content

Minimal DOCX rendering core for template, markdown, field refresh, and PDF conversion workflows

Project description

docxrender

docxrender is a small Python package for Word-first DOCX rendering.

Its core boundary is intentionally narrow:

file_template + context + markdown_body + DocxStyle -> DOCX -> PDF

The package owns technical rendering mechanics: DOCX template rendering, markdown body insertion, Word style application, DOCX field handling, and eventual LibreOffice-based PDF conversion. Product repositories own report content, workflow resource layout, section rendering, manifest schemas, figure selection, captions, and delivery directory semantics.

Capabilities

Current package surface:

  • Public style/options/result dataclasses are available.
  • write_docx(...) can create a minimal DOCX from a DOCX template, context, markdown body, image assets, and DocxStyle.
  • Markdown support currently covers a CommonMark-ish shared subset: headings, paragraphs, hard line breaks, unordered lists, ordered lists, pipe tables, images, inline bold, inline code text cleanup, link-text reduction, page breaks, and spacers.
  • Basic Word styling is applied from caller-provided DocxStyle.
  • DOCX field update/freeze behavior is implemented through DOCX XML rewriting.
  • write_docx(...) can optionally refresh TOC/page fields through LibreOffice UNO when DocxFieldRefreshOptions is provided.
  • convert_docx_to_pdf(...) converts through LibreOffice UNO when the external LibreOffice/UNO runtime is available.

Install

pdm install

Runtime dependencies are declared in pyproject.toml:

  • docxtpl
  • python-docx

PDF conversion and DOCX field refresh are optional runtime features. They do require an external LibreOffice/UNO runtime.

libreoffice --headless --version
python -c "import uno"

On Debian or Ubuntu, that runtime is typically installed outside Python:

sudo apt install libreoffice python3-uno

Base DOCX writing with field_refresh=None does not import UNO and works without LibreOffice.

Public API

The stable public API is exported from the package root. Product repositories should prefer DocxRenderer for normal use. The dataclasses and module-level functions remain public for advanced callers that want explicit contracts, configuration adapters, or focused tests. Implementation modules such as docxrender.markdown and docxrender.docx are technical layers and are not compatibility-stable public contracts.

DocxRenderer follows value semantics. Every with_* call returns a new renderer object and leaves the original unchanged. Runtime docxtpl objects, including inline images, are materialized only by terminal methods such as write_docx_template(...), write_docx(...), and write_pdf().

from docxrender import (
    DocxRenderer,
    DocxBodyAnchorOptions,
    DocxBodyRenderPolicy,
    DocxFieldMarkerOptions,
    DocxFieldRefreshOptions,
    DocxFontStyle,
    DocxHeaderFooterImageOptions,
    DocxMarkdownOptions,
    DocxParagraphStyle,
    DocxSizeStyle,
    DocxStyle,
    DocxTableStyle,
    DocxTemplateContextPolicy,
    DocxTemplateImageSpec,
    DocxTemplateRenderOptions,
    DocxWriteOptions,
    write_docx_template,
    write_docx,
)

DocxFieldMarkerOptions controls DOCX field update markers and field freezing without LibreOffice or UNO:

DocxRenderer(file_docx=Path("report.docx")).with_field_update_markers(
    should_update_fields=True,
    should_freeze_fields=False,
).write_docx()

DocxFieldRefreshOptions is optional. Use it only when the caller has provided a LibreOffice/UNO runtime and wants a DOCX whose TOC, page fields, or other Word fields have been refreshed by LibreOffice:

DocxWriteOptions(
    ...,
    field_refresh=DocxFieldRefreshOptions(
        exe_libreoffice=Path("/usr/bin/libreoffice"),
        dir_user_profile=Path("tmp/lo-profile"),
        should_require_toc=True,
        should_freeze_fields=True,
    ),
)

write_docx_template(...) is the generic docxtpl technical boundary. It renders a DOCX template with caller context, optional inline images, and optional default injections. It does not insert markdown bodies, apply DOCX body styling, or run field/PDF post-processing:

from pathlib import Path

from docxrender import DocxTemplateRenderOptions, write_docx_template

result = write_docx_template(
    DocxTemplateRenderOptions(
        file_template=Path("template.docx"),
        file_out_docx=Path("template-rendered.docx"),
        context={"report_title": "Example Report"},
        context_defaults={"body_anchor": "__REPORT_BODY_ANCHOR__"},
        context_policy=DocxTemplateContextPolicy(
            rule_conflict="caller_wins",
            required_keys=("report_title",),
        ),
    )
)
print(result.file_docx)

DocxTemplateContextPolicy controls:

  • merge behavior between caller context and default injections
  • conflict priority: caller_wins or defaults_win
  • required keys that must exist after merge and before render

Minimal DocxRenderer DOCX write example:

from pathlib import Path

from docxrender import DocxRenderer

result = (
    DocxRenderer()
    .with_template(
        file_template=Path("template.docx"),
        context={"report_title": "Example Report"},
        rule_conflict="caller_wins",
        required_keys=("report_title",),
    )
    .with_fonts(
        font_name_latin="Times New Roman",
        font_name_body_east_asia="宋体",
        font_name_heading_east_asia="宋体",
    )
    .with_sizes(
        pt_title_page_title=36.0,
        pt_title_page_meta=18.0,
        pt_title_page_compiler=15.0,
        pt_body=12.0,
        pt_caption=10.5,
        pt_table=12.0,
        pt_heading_by_level={1: 16.0, 2: 14.0, 3: 12.0},
    )
    .with_table(
        border_color="000000",
        stripe_fill_color="D9D9D9",
        border_size_main="12",
        border_size_header="6",
        line_spacing=1.5,
    )
    .with_paragraph(
        line_spacing_body=1.5,
        line_spacing_note=1.2,
        first_line_indent_cm=0.74,
    )
    .with_header_footer_images(
        file_header_image=Path("header.png"),
        file_footer_image=Path("footer.png"),
        idx_section_start=1,
    )
    .with_markdown(
        should_parse_inline_bold=True,
        should_parse_inline_code=True,
        should_parse_links_as_text=True,
        should_parse_image_width_attr=True,
        default_image_width_pct=90.0,
    )
    .with_body_render_policy(
        should_number_headings=False,
        rule_ordered_list="word_style",
        rule_unordered_list="word_style",
        should_stripe_table_rows=False,
    )
    .with_body_anchor(rule_match="equals", rule_missing="raise")
    .write_docx(
        file_out_docx=Path("report.docx"),
        markdown_body="# Summary **Bold**\n\nBody text with [link](https://example.com).",
        dir_base=Path("."),
    )
)
print(result.file_docx)

markdown_body is the already-rendered Markdown body to insert into the DOCX template. dir_base is the base directory used to resolve relative image paths inside that Markdown body.

DocxMarkdownOptions only covers the shared CommonMark-ish subset. Product repositories should keep custom markdown dialects outside docxrender, either through caller-side preprocessing or explicit higher-level options in their own repo.

DocxBodyRenderPolicy controls structural rendering choices such as heading numbering, Word-style versus plain-text lists, and striped table body rows. It does not classify product-specific paragraphs.

DocxBodyAnchorOptions controls where the Markdown body is inserted. The search is limited to top-level body paragraphs in the DOCX main document. equals matches paragraph.text.strip() == anchor_token; contains matches templates where the token is embedded in a larger paragraph. Missing anchors can either append content or raise a template error.

DocxRenderer can also start from an existing DOCX and run only later technical steps:

from pathlib import Path

from docxrender import DocxRenderer

DocxRenderer(file_docx=Path("report.docx")).with_field_refresh(
    exe_libreoffice=Path("/usr/bin/libreoffice"),
    dir_user_profile=Path("tmp/lo-profile"),
    should_require_toc=True,
).write_docx()

Generic docxtpl inline-image binding is also supported through the same template entrypoint:

from pathlib import Path

from docxrender import DocxRenderer, DocxTemplateImageSpec

renderer = DocxRenderer().with_template(
    file_template=Path("template.docx"),
    context={"report_title": "Example Report"},
    inline_images={
        "cover_image": DocxTemplateImageSpec(
            file_image=Path("cover.png"),
            width_mm=120,
        ),
    },
)

The same renderer can convert the current DOCX to PDF:

from pathlib import Path

from docxrender import DocxRenderer

result = (
    DocxRenderer(file_docx=Path("report.docx"))
    .with_pdf_conversion(
        exe_libreoffice=Path("/usr/bin/libreoffice"),
        dir_user_profile=Path("tmp/lo-profile"),
        file_out_pdf=Path("report.pdf"),
    )
    .write_pdf()
)
print(result.file_pdf)

Advanced explicit dataclass DOCX write example:

from pathlib import Path

from docxrender import (
    DocxFontStyle,
    DocxParagraphStyle,
    DocxSizeStyle,
    DocxStyle,
    DocxTableStyle,
    DocxWriteOptions,
    write_docx,
)

style = DocxStyle(
    fonts=DocxFontStyle(
        font_name_latin="Times New Roman",
        font_name_body_east_asia="宋体",
        font_name_heading_east_asia="宋体",
    ),
    sizes=DocxSizeStyle(
        pt_title_page_title=36.0,
        pt_title_page_meta=18.0,
        pt_title_page_compiler=15.0,
        pt_body=12.0,
        pt_caption=10.5,
        pt_table=12.0,
        pt_heading_by_level={1: 16.0, 2: 14.0, 3: 12.0},
    ),
    table=DocxTableStyle(
        border_color="000000",
        stripe_fill_color="D9D9D9",
        border_size_main="12",
        border_size_header="6",
        line_spacing=1.5,
    ),
    paragraph=DocxParagraphStyle(
        line_spacing_body=1.5,
        line_spacing_note=1.2,
        first_line_indent_cm=0.74,
    ),
)

result = write_docx(
    DocxWriteOptions(
        file_template=Path("template.docx"),
        file_out_docx=Path("report.docx"),
        context={"report_title": "Example Report"},
        markdown_body="# Summary\n\nBody text.",
        dir_base=Path("."),
        style=style,
    )
)
print(result.file_docx)

The template should contain a paragraph whose text is the body anchor token:

{{ body_anchor }}

docxrender sets body_anchor in the template context when the caller does not provide it.

Style Configuration

docxrender does not read TOML, JSON, YAML, or any other config file in its public API. Callers convert their own configuration into DocxStyle.

The initial style model is based on:

/home/fqzhang/project/workflows/resources/common/report/style.toml

That file is a reference for fields and defaults, not a runtime dependency of the package.

Non-Goals

docxrender does not own:

  • report manifest schemas
  • workflow resource layout
  • Jinja section discovery
  • product-specific context builders
  • figure registries or captions
  • Result/... delivery path semantics
  • 结果目录 text generation
  • style config file readers

Tests

Run the current test suite:

pdm run python -m pytest -v

ty is available as an advisory type checker beside pyright:

pdm run ty check .

Pyright remains the primary type gate.

The suite currently covers public API construction, minimal DOCX writing, markdown body insertion, basic style application, and the boundary that docxrender does not import product repositories.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docxrender-0.1.4.tar.gz (45.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docxrender-0.1.4-py3-none-any.whl (36.6 kB view details)

Uploaded Python 3

File details

Details for the file docxrender-0.1.4.tar.gz.

File metadata

  • Download URL: docxrender-0.1.4.tar.gz
  • Upload date:
  • Size: 45.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docxrender-0.1.4.tar.gz
Algorithm Hash digest
SHA256 23a5fbb46dbedbec79a6fa2cc77ee794b83fcdba5a6c361d551b984f92e4e54e
MD5 6af5bb0de7924717b0ee905b008dc703
BLAKE2b-256 100df169e10c892569f5fd549a9d47a6dc84a05815f2e5850949e0bdc2928488

See more details on using hashes here.

Provenance

The following attestation bundles were made for docxrender-0.1.4.tar.gz:

Publisher: publish.yml on FuqingZh/docxrender

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docxrender-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: docxrender-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 36.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docxrender-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3dd4aeec00153920ab6c30723c8e20cfb7878d0b16d7f38af2825b5383c4a95b
MD5 d44ad2d894cf7a1f447b765ed077efb0
BLAKE2b-256 395e2f778c0db32a5b62edcc0dcd894a78761bfa1cc60cdae8e64a630252f146

See more details on using hashes here.

Provenance

The following attestation bundles were made for docxrender-0.1.4-py3-none-any.whl:

Publisher: publish.yml on FuqingZh/docxrender

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page