Skip to main content

Edit docx with python

Project description

docx-editor

Release Build status codecov Commit activity License

Pure Python library for Word document track changes and comments, without requiring Microsoft Word.

Note: The PyPI package is named docx-editor because docx-edit was too similar to an existing package.

Features

  • Hash-Anchored Paragraph References: list_paragraphs() returns stable, hash-based paragraph IDs for safe, unambiguous targeting
  • Paragraph Location: get_paragraph_location(ref) reports whether a paragraph lives in the body or inside a table cell — with w:gridSpan-aware logical column, row, table index, and nesting depth. list_paragraph_locations() returns (ref, location) for every paragraph in one batch pass, avoiding a per-paragraph table rescan
  • Batch Editing: Atomic batch_edit() with upfront hash validation across all operations
  • Paragraph Rewrite: rewrite_paragraph() with automatic word-level diffing — specify desired text, get fine-grained tracked changes
  • Track Changes: Replace, delete, and insert text with revision tracking
  • Cross-Boundary Editing: Find and replace text spanning multiple XML elements and revision boundaries
  • Mixed-State Editing: Atomic decomposition for text spanning <w:ins>/<w:del> boundaries
  • Comments: Add, reply, resolve, and delete comments
  • Revision Management: List, accept, and reject tracked changes
  • Cross-Platform: Works on Linux, macOS, and Windows
  • No Dependencies: Only requires defusedxml for secure XML parsing

Installation

pip install docx-editor

Claude Code Plugin

This repo includes a plugin for Claude Code that enables AI-assisted Word document editing.

This plugin extends the original Anthropic docx skill which requires Claude to manually manipulate OOXML. Instead, this plugin provides an interface (docx-editor) that handles all the complexity—Claude just calls simple Python methods like doc.replace() or doc.add_comment(), making document editing significantly faster and less error-prone.

Install as plugin

# Add the marketplace
/plugin marketplace add pablospe/docx-editor

# Install the plugin
/plugin install docx-editor@docx-editor-marketplace

# Install dependencies
pip install docx-editor python-docx

Manual install (alternative)

# Install dependencies
pip install docx-editor python-docx

# Copy skill to Claude Code skills directory
git clone https://github.com/pablospe/docx-editor /tmp/docx-editor
mkdir -p ~/.claude/skills
cp -r /tmp/docx-editor/skills/docx ~/.claude/skills/
rm -rf /tmp/docx-editor

Once installed, Claude Code can help you edit Word documents with track changes, comments, and revisions.

Quick Start

from docx_editor import Document
import os

author = os.environ.get("USER") or "Reviewer"
with Document.open("contract.docx", author=author) as doc:
    # Step 1: List paragraphs with hash-anchored references
    for p in doc.list_paragraphs():
        print(p)
    # Output: P1#a7b2| Introduction to the contract...
    #         P2#f3c1| The committee shall review...

    # Step 2: Edit — each method returns the new paragraph ref
    r = doc.replace("30 days", "60 days", paragraph="P2#f3c1")
    doc.replace("net", "gross", paragraph=r)  # chain without list_paragraphs()
    doc.delete("obsolete text", paragraph="P5#d4e5")
    doc.insert_after("Section 5", " (as amended)", paragraph="P3#b2c4")

    # Rewrite entire paragraph (automatic word-level diff)
    doc.rewrite_paragraph("P2#f3c1",
        "The board shall approve the updated proposal.")

    # Comments
    doc.add_comment("Section 5", "Please review")

    # Revision management
    revisions = doc.list_revisions()
    doc.accept_revision(revision_id=1)

    doc.save()

Cross-Boundary Text Operations

Text in Word documents with tracked changes can span revision boundaries. docx-editor handles this transparently:

from docx_editor import Document
import os

author = os.environ.get("USER") or "Reviewer"
with Document.open("reviewed.docx", author="Editor") as doc:
    # Get visible text (inserted text included, deleted excluded)
    text = doc.get_visible_text()

    # List paragraphs to find hash-anchored references
    refs = doc.list_paragraphs()

    # Page through large documents — you choose the page size; refs stay
    # globally indexed (page 2 with size 50 starts at P51, not P1)
    total = doc.paragraph_count()
    page_size = 50
    for start in range(1, total + 1, page_size):
        for ref in doc.list_paragraphs(start=start, limit=page_size):
            print(ref)  # process this page of refs

    # Find text across element boundaries
    match = doc.find_text("Aim: To")
    if match and match.spans_boundary:
        print("Text spans a revision boundary")

    # Replace works even across revision boundaries
    doc.replace("Aim: To", "Goal: To", paragraph="P1#a7b2")

    doc.save()

Batch Editing

Apply multiple edits atomically with upfront hash validation:

from docx_editor import Document, EditOperation

with Document.open("contract.docx", author="Editor") as doc:
    refs = doc.list_paragraphs()
    doc.batch_edit([
        EditOperation(action="replace", find="old", replace_with="new", paragraph="P2#f3c1"),
        EditOperation(action="delete", text="remove this", paragraph="P5#d4e5"),
    ])
    doc.save()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docx_editor-0.3.1.tar.gz (463.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docx_editor-0.3.1-py3-none-any.whl (57.1 kB view details)

Uploaded Python 3

File details

Details for the file docx_editor-0.3.1.tar.gz.

File metadata

  • Download URL: docx_editor-0.3.1.tar.gz
  • Upload date:
  • Size: 463.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for docx_editor-0.3.1.tar.gz
Algorithm Hash digest
SHA256 bfa060aed347a00ec18943e52bdec4d0f3c6c5686b9669e5a991daf689cc9cdc
MD5 ca310a047ade8213956d2bfd987d4faf
BLAKE2b-256 ca88a4a53d14fc2b531139be4d58ba169a558656de8e8629bba48a172698eb8d

See more details on using hashes here.

File details

Details for the file docx_editor-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: docx_editor-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 57.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for docx_editor-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ae4e032b5b0f1c907d02c06fc2b1e1863efbee2021f2a4b39608d5e6a42d7c54
MD5 b51d57a46fbfd5614c97cda61aa4bb20
BLAKE2b-256 b1a9f27899eec4c5600cda89c97740f4de6f09b960a3bf01ba0c4c364ce90e44

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page