Skip to main content

parse all contents of a docx file with python-docx

Project description

Parse all contents of a docx file with python-docx

Installation

python3 -m pip install docx-parser

Features:

  • paragraph: text paragraph, with style_id
  • multipart: paragraph with image or hyperlink
  • table: table data with merged_cells

Examples

  • CMD
docx_parser tests/demo.docx
docx_parser tests/demo.docx -A base64 -o out.jl
  • Python
from docx_parser import DocumentParser

infile = 'tests/demo.docx'
doc = DocumentParser(infile)
for _type, item in doc.parse():
    print(_type, item)

ToDo

  • parse text style: color, bgcolor, font, bold, italic ...
  • parse paragraph format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docx_parser-1.0.0.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

docx_parser-1.0.0-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file docx_parser-1.0.0.tar.gz.

File metadata

  • Download URL: docx_parser-1.0.0.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.3

File hashes

Hashes for docx_parser-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ecac84b6d46020adeb4488358c04d1763a632737af0d8f6d296392e9e58b7aa6
MD5 cbe1ee97d36544723909b9b7e05604f0
BLAKE2b-256 1c3c27d7a71787bd05126de06bfcf337695e8b577b9538f4460ce9c7803da092

See more details on using hashes here.

File details

Details for the file docx_parser-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: docx_parser-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.3

File hashes

Hashes for docx_parser-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ef896abadbbea308c981403ebdb595c4c15b7c83747bac8e85c96cfdf8014b55
MD5 09fbdbd0f2c55749f930e815b3543e2f
BLAKE2b-256 a4fcbfc7fb07d19f249a505c4aa09fa0d2a0103bac50f8fe62229a1202daae6b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page