Skip to main content

Command line tool to extract review changes and comments from a docx file as plain text.

Project description

docxreviews2txt

Command line tool to extract review changes and comments from a docx file as plain text. It is particullary usefull after do review changes in pdf files at docx editor (e.g., MS Word, gdocs).

How to install?

pip install docxreviews2txt

How to use it?

usage: docxreviews2txt [-h] [--save_p_xml] [--version] docx

Extract review changes and comments from a docx file as plain text.

positional arguments:
  docx          input docx

optional arguments:
  -h, --help    show this help message and exit
  --save_p_xml  also save extracted Docx paragraphs as xml for debugging
  --version     show version

Example:

$ docxreviews2txt tests/lorem_ipsum.docx
txt reviews at file:///C:/Users/alan/src/docxreviews2txt/tests/lorem_ipsum_review.txt
$ cat c:/Users/alan/src/docxreviews2txt/tests/lorem_ipsum_review.txt
# comments
- This is a comment from docx
# Typos and rewriting suggestions
- sit amet, consectetur  -> sit amet, consectetur Lorem ipsum
- sit amet, consectetur adipiscing elit, sed do -> sit amet, consectetur elit, sed do
- sit amet, consectetur adipiscing elit, sed -> sit amet, consectetur adipiscings elit, sed
- enim ad minim veniam, quis nostrud -> enim ad minim do veniam, quis nostrud
- enim ad minim veniam -> enim ad minim Lorem veniam
- veniam, quis nostrud -> veniam ipsum, quis nostrud
- sit amet, consectetur adipiscing elit, sed do -> sit amet, consectetur elit, sed do

TODO

  • improve N words extractions for reviews changes and enable pass it as a param
  • organized extracted reviews by the input Docx headings
  • save txt as Docx to enable editing
  • support drag-and-drop GUI

Known issues

The tool fails to capture changes in Docx files with text organized in tables (e.g., pdf2docx converts columns to tables).

ChangeLog

  • v0.4: add main.py, rm --save_xml_p_elems, -nwords
  • v0.3: add --version
  • v0.2: add python module and unittests
  • v0.1: one-script intial version

References

This project takes inspiration from:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

docxreviews2txt-0.4.2-py3-none-any.whl (4.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page