Skip to main content

Command line tool to extract review changes from a docx file as plain text

Project description

docxreviews2txt

Command line tool to extract review changes and comments from a docx file as plain text. It is particullary usefull after do review changes in pdf files at docx editor (e.g., MS Word, gdocs).

How to install?

pip install docxreviews2txt

How to use it?

usage: docxreviews2txt [-h] [--save_p_xml] [--version] docx

Extract review changes and comments from a docx file as plain text.

positional arguments:
  docx          input docx

optional arguments:
  -h, --help    show this help message and exit
  --save_p_xml  also save extracted Docx paragraphs as xml for debugging
  --version     show version

Example:

$ docxreviews2txt tests/lorem_ipsum.docx
txt reviews at file:///C:/Users/alan/src/docxreviews2txt/tests/lorem_ipsum_review.txt
$ cat c:/Users/alan/src/docxreviews2txt/tests/lorem_ipsum_review.txt
# comments
- This is a comment from docx
# Typos and rewriting suggestions
- sit amet, consectetur  -> sit amet, consectetur Lorem ipsum
- sit amet, consectetur adipiscing elit, sed do -> sit amet, consectetur elit, sed do
- sit amet, consectetur adipiscing elit, sed -> sit amet, consectetur adipiscings elit, sed
- enim ad minim veniam, quis nostrud -> enim ad minim do veniam, quis nostrud
- enim ad minim veniam -> enim ad minim Lorem veniam
- veniam, quis nostrud -> veniam ipsum, quis nostrud
- sit amet, consectetur adipiscing elit, sed do -> sit amet, consectetur elit, sed do

TODO

  • improve N words extractions for reviews changes and enable pass it as a param
  • organized extracted reviews by the input Docx headings
  • save txt as Docx to enable editing
  • support drag-and-drop GUI

Known issues

The tool fails to capture changes in Docx files with text organized in tables (e.g., pdf2docx converts columns to tables).

ChangeLog

  • v0.4: add main.py, rm --save_xml_p_elems, -nwords
  • v0.3: add --version
  • v0.2: add python module and unittests
  • v0.1: one-script intial version

References

This project takes inspiration from:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docxreviews2txt-0.4.3-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file docxreviews2txt-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: docxreviews2txt-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for docxreviews2txt-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b07b7a518e7a561e49f3ad295df3dd529c6494d3bc351be83965eb795c1eb082
MD5 cebcb91a7f82194a51da0479949b2478
BLAKE2b-256 8c56c485b5a54e9226472e57a9073b885add0cfa0ba2afaa122d8f9b8b560502

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page