Skip to main content

Python tools for Word OOXML documents, including MS-DOCX extension namespaces

Project description

msdocx

Python tools for Word OOXML documents, including MS-DOCX extension namespaces (w14/w15/w16).

Overview

msdocx is a Python package for working with Word .docx files, including the published MS-DOCX specification. It supports extension namespaces that are not part of the base ISO/IEC 29500 standard.

Installation

pip install msdocx

Quick Start

from msdocx import Document
from msdocx.text_effects import TextEffect, ShadowEffect, GlowEffect

# Create a new document
doc = Document.new()

# Add content
doc.add_heading("My Document", level=1)
doc.add_paragraph("Hello, World!", bold=True, font="Calibri", size=28)

# Add a table
doc.add_table(rows=3, cols=2, width=9360)

# Add lists
doc.add_bullet_list(["Item 1", "Item 2", "Item 3"])
doc.add_numbered_list(["Step 1", "Step 2", "Step 3"])

# Save
doc.save("output.docx")

MS-DOCX Extension Features

These features are available in msdocx and cover MS-DOCX extension areas beyond the base OOXML standard:

Text Effects (w14 namespace)

from msdocx.text_effects import TextEffect, ShadowEffect, GlowEffect, ReflectionEffect, TextOutline
from msdocx.oxml.text import make_run
from msdocx.oxml.ns import qn

# Create a run with text effects
run = make_run(text="Styled Text", bold=True, size=48)
rpr = run.find(qn("w:rPr"))

effect = TextEffect(
    shadow=ShadowEffect(color="4472C4", alpha=60000),
    glow=GlowEffect(color="00FF00", radius=63500),
    reflection=ReflectionEffect(),
    ligatures="standardContextual",
    numeral_form="oldStyle",
)
effect.apply(rpr)

Content Controls with MS-DOCX Extensions

from msdocx.content_controls import ContentControl, ContentControlType

# Checkbox (w14 extension)
checkbox = ContentControl(ContentControlType.CHECKBOX)
checkbox.set_checked(True)

# Repeating section (w15 extension)
repeating = ContentControl(ContentControlType.REPEATING_SECTION)

# Entity picker (w15 extension)
picker = ContentControl(ContentControlType.ENTITY_PICKER)

# Color picker (w14 extension)
color = ContentControl(ContentControlType.COLOR)

Tracked Changes with Conflict Resolution

from msdocx.tracked_changes import TrackedChange

# Standard tracked changes
ins = TrackedChange.insert("new text", author="Alice")
del_el = TrackedChange.delete("old text", author="Bob")
del_el, ins_el = TrackedChange.replace("old", "new", author="Editor")

# MS-DOCX conflict resolution (w14 extension)
conflict_ins = TrackedChange.conflict_insert("conflict text", author="User1")
conflict_del = TrackedChange.conflict_delete("deleted in conflict", author="User2")

Collaboration Features

from msdocx.collaboration import CollaborationInfo, mark_spelling_error

# Paragraph unique IDs (w14:paraId, w14:textId) — auto-generated
info = CollaborationInfo.generate()
info.apply_to_paragraph(paragraph_element)

# Inline spelling markup
mark_spelling_error(paragraph, start_run_index=1, end_run_index=1)

Extended Compatibility Settings

from msdocx.compatibility import set_compat_mode, enable_opentype_features

# Set Word 2013+ compatibility mode
set_compat_mode(doc.settings, mode=15)

# Enable OpenType font features (MS-DOCX extension)
enable_opentype_features(doc.settings)

Accept / Reject Tracked Changes

from msdocx.tracked_changes import TrackedChange
from msdocx.oxml.ns import qn

# Accept a single insertion (keep the inserted text, remove tracking wrapper)
ins_element = body.find(f".//{qn('w:ins')}")
TrackedChange.accept_insertion(ins_element)

# Reject a single insertion (remove the inserted text entirely)
TrackedChange.reject_insertion(ins_element)

# Accept a deletion (remove the deleted text permanently)
TrackedChange.accept_deletion(del_element)

# Reject a deletion (restore the deleted text)
TrackedChange.reject_deletion(del_element)

# Accept or reject ALL changes in the document body at once
TrackedChange.accept_all(doc.body)
TrackedChange.reject_all(doc.body)

Reading Document Content

doc = Document.open("existing.docx")

# Get all text as a single string
text = doc.get_text()

# Get structured paragraph data
paragraphs = doc.get_paragraphs()
for p in paragraphs:
    print(p["text"], p["style"], p["bold"], p["italic"])

# Get tables as nested lists
tables = doc.get_tables()
for table in tables:
    for row in table:
        print(row)  # ["cell1", "cell2", ...]

# Get all comments with threading info
comments = doc.get_comments()
for c in comments:
    print(f'{c["author"]}: {c["text"]}')
    if c["parent_id"] is not None:
        print(f'  (reply to comment {c["parent_id"]})')

Comment Threading

# Create a parent comment
parent_id = doc.add_comment("Please review this section", author="Reviewer")

# Reply to the parent comment — automatically creates commentsExtended.xml
reply_id = doc.add_comment("Fixed in v2", author="Author", parent_id=parent_id)

# Read back threaded comments
comments = doc.get_comments()
# comments[1]["parent_id"] == parent_id

Specification Reference

This package implements the [MS-DOCX] specification, revision 22.1 (November 2025): https://learn.microsoft.com/en-us/openspecs/office_standards/ms-docx/b839fe1f-e1ca-4fa6-8c26-5954d0abbccd

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

msdocx-0.2.1.tar.gz (75.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

msdocx-0.2.1-py3-none-any.whl (62.0 kB view details)

Uploaded Python 3

File details

Details for the file msdocx-0.2.1.tar.gz.

File metadata

  • Download URL: msdocx-0.2.1.tar.gz
  • Upload date:
  • Size: 75.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for msdocx-0.2.1.tar.gz
Algorithm Hash digest
SHA256 db44e5d263b331e88e7b8d9679ef29ecaddf7c2c9d15efcba083f4f6f9749ca9
MD5 d1ff24600d6fff3b090d248c1c40fe41
BLAKE2b-256 b4d56c1170d0f2ec6fb8d0ccbf63b92e95b85eed1964b4098dc9c64a080f546e

See more details on using hashes here.

File details

Details for the file msdocx-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: msdocx-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 62.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for msdocx-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 def48cd0e85adea79e269daf84ce1ea89d2bcbc165b41de83a991c135bd39537
MD5 d681d09d6ed6b6df9a66b8117d061ace
BLAKE2b-256 4d31969702d466e919fcfae0bbc8c9a651846140dbb81c7fd5f4c9ca359fe3cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page