Skip to main content

parse all contents of a docx file with python-docx

Project description

Parse all contents of a docx file with python-docx

Installation

python3 -m pip install docx-parser

Features:

  • paragraph: text paragraph, with style_id
  • multipart: paragraph with image or hyperlink
  • table: table data with merged_cells

Examples

  • CMD
docx_parser --help

# parse image as file
docx_parser tests/demo.docx -D tests/media -o tests/out.file.jl

# parse image as base64 string
  docx_parser tests/demo.docx -A base64 -o tests/out.base64.jl
  • Python
from docx_parser import DocumentParser

infile = 'tests/demo.docx'
doc = DocumentParser(infile)
for _type, item in doc.parse():
    print(_type, item)

ToDo

  • parse text style: color, bgcolor, font, bold, italic ...
  • parse paragraph format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docx_parser-1.0.1.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

docx_parser-1.0.1-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file docx_parser-1.0.1.tar.gz.

File metadata

  • Download URL: docx_parser-1.0.1.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.3

File hashes

Hashes for docx_parser-1.0.1.tar.gz
Algorithm Hash digest
SHA256 6ea5154029de211c103c13096116e2e4cd3841ed7a83e307c080f7146b572351
MD5 81ac54c1c87207c04f108dc33d955364
BLAKE2b-256 b13448da6c31dfa75277667deaa07197b908b85c8327215922fb7343dd04fc06

See more details on using hashes here.

File details

Details for the file docx_parser-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: docx_parser-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.3

File hashes

Hashes for docx_parser-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bf03908319d0c557c202e37768b12620c00435ff603a11f39b6d7cc115a1a2c1
MD5 025cb18c3db698833bc4479f069f9148
BLAKE2b-256 fe1926b3aa5f19d37c3db6ee5977b92b856dd837ca1672046b7d4c5ebfc5644d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page