Skip to main content

Extract reviewer comments from .docx files and insert them inline with the text

Project description

docx-comments-to-text

Extract reviewer comments from .docx files and insert them inline with the text they reference, creating a plain text output that keeps feedback in context.

Installation

From PyPI (recommended)

pip install docx-comments-to-text

From source

# Clone the repository
git clone https://github.com/platelminto/docx-comments-to-text
cd docx-comments-to-text

# Install in development mode
uv sync --dev
# or: pip install -e .

Usage

Command Line Interface

# Basic usage - output to stdout
docx-comments-to-text document.docx

# Save to file
docx-comments-to-text document.docx -o output.txt

# Control author display
docx-comments-to-text document.docx --authors never    # Hide authors
docx-comments-to-text document.docx --authors always   # Always show authors
docx-comments-to-text document.docx --authors auto     # Show authors when multiple exist (default)

# Control comment placement
docx-comments-to-text document.docx --placement inline         # Inline with text (default)
docx-comments-to-text document.docx --placement end-paragraph  # At end of each paragraph
docx-comments-to-text document.docx --placement comments-only  # Comments only with context

Development Usage

If working from source:

# Run with uv
uv run docx-comments-to-text document.docx

# Or use module syntax
uv run python -m docx_comments_to_text.cli document.docx

Example Output

Inline placement (default)

Original text with [reviewer feedback] [COMMENT: "This needs clarification"] continues here.
More content [needs examples] [COMMENT John: "Consider adding examples"] and final text.

End-paragraph placement

Original text with reviewer feedback[1] continues here.
More content needs examples[2] and final text.

Comments:
1. This needs clarification
2. John: Consider adding examples

Comments-only placement

"reviewer feedback": This needs clarification
"needs examples": John: Consider adding examples

Features

  • Accurate comment positioning and text preservation
  • Handles overlapping comments and multiple comment types
  • Configurable author display
  • Multiple comment placement styles (inline, end-of-paragraph, comments-only)

Technical Details

DOCX Structure

  • DOCX files are ZIP archives containing XML files
  • word/document.xml - main document content
  • word/comments.xml - comment definitions
  • Comment ranges marked with <w:commentRangeStart> and <w:commentRangeEnd>

Comment Insertion Strategy

  1. Parse document XML to extract text and track character positions
  2. Map comment ranges to their start/end positions in the text
  3. Sort comments by position for safe insertion (reverse order)
  4. Wrap commented text in brackets: [commented text]
  5. Insert comment content after bracketed text: [COMMENT: "feedback"]

Dependencies

  • python-docx - DOCX file handling
  • lxml - XML parsing
  • click - Command line interface

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docx_comments_to_text-0.2.2.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docx_comments_to_text-0.2.2-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file docx_comments_to_text-0.2.2.tar.gz.

File metadata

File hashes

Hashes for docx_comments_to_text-0.2.2.tar.gz
Algorithm Hash digest
SHA256 bed74040e67a1824072ad6878dd7e5121cb75311895b11632b6c1c9bd253526b
MD5 3698eefd03ec6a0e72c423ec0effb5ff
BLAKE2b-256 499825124c274ab947eb3f7cf0f5c0c3af7a3a73df30224bdbf9298ef5b16291

See more details on using hashes here.

File details

Details for the file docx_comments_to_text-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for docx_comments_to_text-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fcf9ba5e1c74a4e8917f5565f826dcfa5664e3b540a661ba3759eaa1739f31e5
MD5 5ef1aff45e38f5156d72d9399ad1bf84
BLAKE2b-256 eae3edb9254c5b567e437c5a96c0faf69a6f5bbd54e7aeb71974e0dad8c47a9c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page