Python tools for Word OOXML documents, including MS-DOCX extension namespaces
Project description
msdocx
Python tools for Word OOXML documents, including MS-DOCX extension namespaces (w14/w15/w16).
Overview
msdocx is a Python package for working with Word .docx files, including the published MS-DOCX specification. It supports extension namespaces that are not part of the base ISO/IEC 29500 standard.
Installation
pip install msdocx
Quick Start
from msdocx import Document
from msdocx.text_effects import TextEffect, ShadowEffect, GlowEffect
# Create a new document
doc = Document.new()
# Add content
doc.add_heading("My Document", level=1)
doc.add_paragraph("Hello, World!", bold=True, font="Calibri", size=28)
# Add a table
doc.add_table(rows=3, cols=2, width=9360)
# Add lists
doc.add_bullet_list(["Item 1", "Item 2", "Item 3"])
doc.add_numbered_list(["Step 1", "Step 2", "Step 3"])
# Save
doc.save("output.docx")
MS-DOCX Extension Features
These features are available in msdocx and cover MS-DOCX extension areas beyond the base OOXML standard:
Text Effects (w14 namespace)
from msdocx.text_effects import TextEffect, ShadowEffect, GlowEffect, ReflectionEffect, TextOutline
from msdocx.oxml.text import make_run
from msdocx.oxml.ns import qn
# Create a run with text effects
run = make_run(text="Styled Text", bold=True, size=48)
rpr = run.find(qn("w:rPr"))
effect = TextEffect(
shadow=ShadowEffect(color="4472C4", alpha=60000),
glow=GlowEffect(color="00FF00", radius=63500),
reflection=ReflectionEffect(),
ligatures="standardContextual",
numeral_form="oldStyle",
)
effect.apply(rpr)
Content Controls with MS-DOCX Extensions
from msdocx.content_controls import ContentControl, ContentControlType
# Checkbox (w14 extension)
checkbox = ContentControl(ContentControlType.CHECKBOX)
checkbox.set_checked(True)
# Repeating section (w15 extension)
repeating = ContentControl(ContentControlType.REPEATING_SECTION)
# Entity picker (w15 extension)
picker = ContentControl(ContentControlType.ENTITY_PICKER)
# Color picker (w14 extension)
color = ContentControl(ContentControlType.COLOR)
Tracked Changes with Conflict Resolution
from msdocx.tracked_changes import TrackedChange
# Standard tracked changes
ins = TrackedChange.insert("new text", author="Alice")
del_el = TrackedChange.delete("old text", author="Bob")
del_el, ins_el = TrackedChange.replace("old", "new", author="Editor")
# MS-DOCX conflict resolution (w14 extension)
conflict_ins = TrackedChange.conflict_insert("conflict text", author="User1")
conflict_del = TrackedChange.conflict_delete("deleted in conflict", author="User2")
Collaboration Features
from msdocx.collaboration import CollaborationInfo, mark_spelling_error
# Paragraph unique IDs (w14:paraId, w14:textId) — auto-generated
info = CollaborationInfo.generate()
info.apply_to_paragraph(paragraph_element)
# Inline spelling markup
mark_spelling_error(paragraph, start_run_index=1, end_run_index=1)
Extended Compatibility Settings
from msdocx.compatibility import set_compat_mode, enable_opentype_features
# Set Word 2013+ compatibility mode
set_compat_mode(doc.settings, mode=15)
# Enable OpenType font features (MS-DOCX extension)
enable_opentype_features(doc.settings)
Accept / Reject Tracked Changes
from msdocx.tracked_changes import TrackedChange
from msdocx.oxml.ns import qn
# Accept a single insertion (keep the inserted text, remove tracking wrapper)
ins_element = body.find(f".//{qn('w:ins')}")
TrackedChange.accept_insertion(ins_element)
# Reject a single insertion (remove the inserted text entirely)
TrackedChange.reject_insertion(ins_element)
# Accept a deletion (remove the deleted text permanently)
TrackedChange.accept_deletion(del_element)
# Reject a deletion (restore the deleted text)
TrackedChange.reject_deletion(del_element)
# Accept or reject ALL changes in the document body at once
TrackedChange.accept_all(doc.body)
TrackedChange.reject_all(doc.body)
Reading Document Content
doc = Document.open("existing.docx")
# Get all text as a single string
text = doc.get_text()
# Get structured paragraph data
paragraphs = doc.get_paragraphs()
for p in paragraphs:
print(p["text"], p["style"], p["bold"], p["italic"])
# Get tables as nested lists
tables = doc.get_tables()
for table in tables:
for row in table:
print(row) # ["cell1", "cell2", ...]
# Get all comments with threading info
comments = doc.get_comments()
for c in comments:
print(f'{c["author"]}: {c["text"]}')
if c["parent_id"] is not None:
print(f' (reply to comment {c["parent_id"]})')
Comment Threading
# Create a parent comment
parent_id = doc.add_comment("Please review this section", author="Reviewer")
# Reply to the parent comment — automatically creates commentsExtended.xml
reply_id = doc.add_comment("Fixed in v2", author="Author", parent_id=parent_id)
# Read back threaded comments
comments = doc.get_comments()
# comments[1]["parent_id"] == parent_id
Specification Reference
This package implements the [MS-DOCX] specification, revision 22.1 (November 2025): https://learn.microsoft.com/en-us/openspecs/office_standards/ms-docx/b839fe1f-e1ca-4fa6-8c26-5954d0abbccd
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file msdocx-0.5.0.tar.gz.
File metadata
- Download URL: msdocx-0.5.0.tar.gz
- Upload date:
- Size: 74.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6cc26f50a604763c37a51f08fdb9752f7a2e8b8003804338c4abf08c6d886b34
|
|
| MD5 |
2692ab4a276a22417398c35145a3e052
|
|
| BLAKE2b-256 |
a840c6c4b773c433e1b54b66c5e277517e8396ef34b78187451e35e6608088e6
|
File details
Details for the file msdocx-0.5.0-py3-none-any.whl.
File metadata
- Download URL: msdocx-0.5.0-py3-none-any.whl
- Upload date:
- Size: 61.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e57a0a752dced65b0be158d903a9af21ae27b0dbef86e5d1517d988276b6a640
|
|
| MD5 |
baf921ffcf5e9a069f43b3dbaabfee9d
|
|
| BLAKE2b-256 |
1533e09b76ee3dc24dc9b6eff1a6a447bf2eb7ecb12b0be3c4d5910a135a4181
|