Skip to main content

Python tools for Word OOXML documents, including MS-DOCX extension namespaces

Project description

msdocx

Python tools for Word OOXML documents, including MS-DOCX extension namespaces (w14/w15/w16).

Overview

msdocx is a Python package for working with Word .docx files, including the published MS-DOCX specification. It supports extension namespaces that are not part of the base ISO/IEC 29500 standard.

Installation

pip install msdocx

Quick Start

from msdocx import Document
from msdocx.text_effects import TextEffect, ShadowEffect, GlowEffect

# Create a new document
doc = Document.new()

# Add content
doc.add_heading("My Document", level=1)
doc.add_paragraph("Hello, World!", bold=True, font="Calibri", size=28)

# Add a table
doc.add_table(rows=3, cols=2, width=9360)

# Add lists
doc.add_bullet_list(["Item 1", "Item 2", "Item 3"])
doc.add_numbered_list(["Step 1", "Step 2", "Step 3"])

# Save
doc.save("output.docx")

MS-DOCX Extension Features

These features are available in msdocx and cover MS-DOCX extension areas beyond the base OOXML standard:

Text Effects (w14 namespace)

from msdocx.text_effects import TextEffect, ShadowEffect, GlowEffect, ReflectionEffect, TextOutline
from msdocx.oxml.text import make_run
from msdocx.oxml.ns import qn

# Create a run with text effects
run = make_run(text="Styled Text", bold=True, size=48)
rpr = run.find(qn("w:rPr"))

effect = TextEffect(
    shadow=ShadowEffect(color="4472C4", alpha=60000),
    glow=GlowEffect(color="00FF00", radius=63500),
    reflection=ReflectionEffect(),
    ligatures="standardContextual",
    numeral_form="oldStyle",
)
effect.apply(rpr)

Content Controls with MS-DOCX Extensions

from msdocx.content_controls import ContentControl, ContentControlType

# Checkbox (w14 extension)
checkbox = ContentControl(ContentControlType.CHECKBOX)
checkbox.set_checked(True)

# Repeating section (w15 extension)
repeating = ContentControl(ContentControlType.REPEATING_SECTION)

# Entity picker (w15 extension)
picker = ContentControl(ContentControlType.ENTITY_PICKER)

# Color picker (w14 extension)
color = ContentControl(ContentControlType.COLOR)

Tracked Changes with Conflict Resolution

from msdocx.tracked_changes import TrackedChange

# Standard tracked changes
ins = TrackedChange.insert("new text", author="Alice")
del_el = TrackedChange.delete("old text", author="Bob")
del_el, ins_el = TrackedChange.replace("old", "new", author="Editor")

# MS-DOCX conflict resolution (w14 extension)
conflict_ins = TrackedChange.conflict_insert("conflict text", author="User1")
conflict_del = TrackedChange.conflict_delete("deleted in conflict", author="User2")

Collaboration Features

from msdocx.collaboration import CollaborationInfo, mark_spelling_error

# Paragraph unique IDs (w14:paraId, w14:textId) — auto-generated
info = CollaborationInfo.generate()
info.apply_to_paragraph(paragraph_element)

# Inline spelling markup
mark_spelling_error(paragraph, start_run_index=1, end_run_index=1)

Extended Compatibility Settings

from msdocx.compatibility import set_compat_mode, enable_opentype_features

# Set Word 2013+ compatibility mode
set_compat_mode(doc.settings, mode=15)

# Enable OpenType font features (MS-DOCX extension)
enable_opentype_features(doc.settings)

Accept / Reject Tracked Changes

from msdocx.tracked_changes import TrackedChange
from msdocx.oxml.ns import qn

# Accept a single insertion (keep the inserted text, remove tracking wrapper)
ins_element = body.find(f".//{qn('w:ins')}")
TrackedChange.accept_insertion(ins_element)

# Reject a single insertion (remove the inserted text entirely)
TrackedChange.reject_insertion(ins_element)

# Accept a deletion (remove the deleted text permanently)
TrackedChange.accept_deletion(del_element)

# Reject a deletion (restore the deleted text)
TrackedChange.reject_deletion(del_element)

# Accept or reject ALL changes in the document body at once
TrackedChange.accept_all(doc.body)
TrackedChange.reject_all(doc.body)

Reading Document Content

doc = Document.open("existing.docx")

# Get all text as a single string
text = doc.get_text()

# Get structured paragraph data
paragraphs = doc.get_paragraphs()
for p in paragraphs:
    print(p["text"], p["style"], p["bold"], p["italic"])

# Get tables as nested lists
tables = doc.get_tables()
for table in tables:
    for row in table:
        print(row)  # ["cell1", "cell2", ...]

# Get all comments with threading info
comments = doc.get_comments()
for c in comments:
    print(f'{c["author"]}: {c["text"]}')
    if c["parent_id"] is not None:
        print(f'  (reply to comment {c["parent_id"]})')

Comment Threading

# Create a parent comment
parent_id = doc.add_comment("Please review this section", author="Reviewer")

# Reply to the parent comment — automatically creates commentsExtended.xml
reply_id = doc.add_comment("Fixed in v2", author="Author", parent_id=parent_id)

# Read back threaded comments
comments = doc.get_comments()
# comments[1]["parent_id"] == parent_id

Specification Reference

This package implements the [MS-DOCX] specification, revision 22.1 (November 2025): https://learn.microsoft.com/en-us/openspecs/office_standards/ms-docx/b839fe1f-e1ca-4fa6-8c26-5954d0abbccd

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

msdocx-0.4.0.tar.gz (74.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

msdocx-0.4.0-py3-none-any.whl (61.4 kB view details)

Uploaded Python 3

File details

Details for the file msdocx-0.4.0.tar.gz.

File metadata

  • Download URL: msdocx-0.4.0.tar.gz
  • Upload date:
  • Size: 74.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for msdocx-0.4.0.tar.gz
Algorithm Hash digest
SHA256 5fb84271ab5efd840e555d9f52218b7f745949df085499ccad459b96117914e2
MD5 be667618ee105d91b3599f8c8c5b3ad0
BLAKE2b-256 41d9ae6d4088d8df3697079756bef474ebcbe7307b9f2763a87edc4e83eca80b

See more details on using hashes here.

File details

Details for the file msdocx-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: msdocx-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 61.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for msdocx-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c52880db91cd5f4afe4cda72892a92c63aed451d6745116007468282397a4bdd
MD5 ab50aef65890f3d7c5b6b5d3a3b979c0
BLAKE2b-256 2d852a9967a4d71f467ca844efecf37164560024e8543bed80e21a378842066c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page