Skip to main content

A Python library for working with Microsoft Word via COM automation

Project description

pymsword

Pymsword is a Python library for generating DOCX documents with simple templates.


Features

  • Direct generation of DOCX documents with text and images — provides good performance and reliability.
  • Jinja-like Placeholder Syntax — easy-to-use templating for text and images.
  • COM Automation Support — integrate COM calls to extend document generation capabilities, at the cost of performance.
  • MIT Licensed — free and open-source with a permissive license.

Template Syntax

Template syntax is inspired by Jinja, but only the small subset of features is implemented:

  • Variables: {{ variable }} for inserting variables.
  • Groups: {% group_name %} ... {% end group_name %} for grouping content.

Variables

Use {{variable}} to insert values. In the data, variables are defined as keys in a dictionary.

Escaping Braces

To escape double braces, use: {{"{{"}}.

Groups

Use {% group_name %} ... {% end group_name %} to define a group of content that can be repeated. In the data, groups are defined as lists of dictionaries.

Extending group capture range

Due to the structure of DOCX documents, it is not possible to put text tags outside of the paragraph or table row. Consequently, simple groups can not be used to generate table rows or lists.

To overcome this limitation, group's range can be extended using additional markers:

  • Use {% group_name p %} ... {% group_name %} for a group that captures the whole paragraph or list item.
  • Use {% group_name row %} ... {% group_name %} for a group that captures the whole table row.
  • Use {% group_name cell %} ... {% group_name %} for a group that captures the whole cell of a table.

Example:

template = DocxTemplate("extend_groups.docx")
data = {
    "items": [
    {"text": "Item 1"},
    {"text": "Item 2"},
    {"text": "Item 3"},
    ]
}
template.generate(data, "extend_groups_result.docx")

Group example

DOCX Document Generation

Pymsword supports two modes of document generation: direct DOCX generation, that is done using pure Python code. It provides optimal performance and reliability, however it can only fill templates with plain text and images.

THe following example demonstrates how to use the library to generate a DOCX document with a template:

from pymsword.docx_template import DocxTemplate, DocxImageInserter

# Load the template
template = DocxTemplate("template.docx")
# Define the data for the template
data = {
    "title": "My Document",
    "content": "This is a sample document.",
    "items": [
        {"name": "Item 1", "value": 10},
        {"name": "Item 2", "value": 20},
    ]
    "image": DocxImageInserter("image.png"),
}
# Render the template with data
template.generate(data, "output.docx")

The template and the resulting document are shown below: Template and result

For the more complete example refer to the examples directory.

DOCX + COM Automation

More advanced documents can be generated using COM automation which gives access to the full range of Word features. Its disadvantage is that it is significantly slower than direct DOCX generation, and requires Microsoft Word to be installed on the system. To use COM automation, you need to install the pywin32 package and use the DocxComTemplate class.

To insert data using COM, put inserter fucntion instead of the value in the data. Inserter functions take single argument which is Word.Range object, and insert desired data into it.

Module pymsword.com_utilities provides some useful inserters.

from pymsword.docxcom_template import DocxComTemplate
from pymsword.com_utilities import table_inserter

# Load the template
template = DocxComTemplate("template.docx")

# Define the data for the template
data = {
    "header": "My Document",
    "table": table_inserter([["Col1", "Col2"], ["Row1", "Row2"]])
}
# Render the template with data
template.generate(data, "output.docx")

The template and the resulting document are shown below:

COM Template and result

Complete list of inserter functions, available in pymsword.com_utilities module:

Function Description
`table_inserter(data:List[List[str]]) Inserts a table with the given data. Each sublist represents a row.
image_inserter(picture_path:str) Inserts an image from the specified path using COM. Supports more formats than the DOCX mode
document_inserter(document_path:str) Inserts content of another document, can be DOCX, RTF or anything supported by Word.
anchor_inserter(text:str, anchor:str) Inserts an anchor with the given text and name.
heading_inserter(text:str, level:int=1) Inserts a heading with the given text and level. Level 1 is the highest level.

COM Post-Processing

WHen generating documents using COM-assisted mode (DocxComTemplate), you can use post-processing to modify the document after it has been generated. To do this, specify the postprocess argument when calling the DocxComTemplate.generate method. Library pymsword.com_utilities provides an example post-processing function that updates document creation date and table of content:

from pymsword.docxcom_template import DocxComTemplate
from pymsword.com_utilities import update_document_toc

template = DocxComTemplate("template.docx")
data = ...
template.generate(
    data,
    "output.docx",
    postprocess=update_document_toc
)

Post-processing function must take single argument which is the Word.Document object.

Requirements

  • Python 3.7 or higher
  • pywin32 package for COM automation
  • Microsoft Word installed (optional, for DOCX + COM mode)
  • Pillow package for image handling

Installation

You can install Pymsword using pip:

pip install pymsword

License

This project is licensed under the MIT License - see the LICENSE.MIT file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymsword-0.1.0.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymsword-0.1.0-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file pymsword-0.1.0.tar.gz.

File metadata

  • Download URL: pymsword-0.1.0.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for pymsword-0.1.0.tar.gz
Algorithm Hash digest
SHA256 97d6e6c192489c066ea5c6cd7ebd16f0557e22faf941c49d2b6498be373b482f
MD5 ed78dba3998ca869e32abe48fe6d19d0
BLAKE2b-256 8d1d0f0ab33a131fe28f4a9a071a84e737280f785fde256ac620457085115857

See more details on using hashes here.

File details

Details for the file pymsword-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pymsword-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for pymsword-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 080d8d2ad3ae06d24719fd92e46aad23e1823494e8479f3aed3843b62a2bf79d
MD5 bc409520d3e7d75c68c4ff4f7465c73e
BLAKE2b-256 a684388fec4fc4daf48808b16bd815ea6dfbbe068e522c8bcd5e9108118cbaaf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page