Skip to main content

Extract reviewer comments from .docx files and insert them inline with the text

Project description

docx-comments-to-text

Extract reviewer comments from .docx files and insert them inline with the text they reference, creating a plain text output that keeps feedback in context.

Installation

# Clone the repository
git clone https://github.com/platelminto/docx-comments-to-text
cd docx-comments-to-text

# Install dependencies
uv sync
# or: pip install python-docx lxml click

Usage

Command Line Interface

# Basic usage - output to stdout
python cli.py document.docx

# Save to file
python cli.py document.docx -o output.txt

# Control author display
python cli.py document.docx --authors never    # Hide authors
python cli.py document.docx --authors always   # Always show authors
python cli.py document.docx --authors auto     # Show authors when multiple exist (default)

# Control comment placement
python cli.py document.docx --placement inline         # Inline with text (default)
python cli.py document.docx --placement end-paragraph  # At end of each paragraph
python cli.py document.docx --placement comments-only  # Comments only with context

Example Output

Inline placement (default)

Original text with [reviewer feedback] [COMMENT: "This needs clarification"] continues here.
More content [needs examples] [COMMENT John: "Consider adding examples"] and final text.

End-paragraph placement

Original text with reviewer feedback[1] continues here.
More content needs examples[2] and final text.

Comments:
1. This needs clarification
2. John: Consider adding examples

Comments-only placement

"reviewer feedback": This needs clarification
"needs examples": John: Consider adding examples

Features

  • Accurate comment positioning and text preservation
  • Handles overlapping comments and multiple comment types
  • Configurable author display
  • Multiple comment placement styles (inline, end-of-paragraph, comments-only)

Technical Details

DOCX Structure

  • DOCX files are ZIP archives containing XML files
  • word/document.xml - main document content
  • word/comments.xml - comment definitions
  • Comment ranges marked with <w:commentRangeStart> and <w:commentRangeEnd>

Comment Insertion Strategy

  1. Parse document XML to extract text and track character positions
  2. Map comment ranges to their start/end positions in the text
  3. Sort comments by position for safe insertion (reverse order)
  4. Wrap commented text in brackets: [commented text]
  5. Insert comment content after bracketed text: [COMMENT: "feedback"]

Dependencies

  • python-docx - DOCX file handling
  • lxml - XML parsing
  • click - Command line interface

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docx_comments_to_text-0.1.1.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docx_comments_to_text-0.1.1-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file docx_comments_to_text-0.1.1.tar.gz.

File metadata

File hashes

Hashes for docx_comments_to_text-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e6327e9febca67e853a77a069a68c30d38d97cc04c4d0c88f31f84364352b19f
MD5 c051d27d641cbb25dd6e3711ec4310a3
BLAKE2b-256 2783a5e07ea615f354cb0617df16b6538bb84c40125b6433797b7ddd4cef1f9a

See more details on using hashes here.

File details

Details for the file docx_comments_to_text-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for docx_comments_to_text-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9f72c9a3741559427c506eabf1e32f63657cf69f127aa3ffb59b81d597b9553e
MD5 27734886da4ad0d8eae3fb628499cf63
BLAKE2b-256 b66145dc9bb8246f90d58b2146a8580799d13d5222d7d3f00cfe106fa25c913c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page