Skip to main content

parse all contents of a docx file with python-docx

Project description

PyPI GitHub last commit

Parse all contents of a docx file with python-docx

Installation

python3 -m pip install docx-parser

Features:

  • paragraph: text paragraph, with style_id
  • multipart: paragraph with image or hyperlink
  • table: table data with merged_cells

Examples

  • CMD
docx_parser --help

# parse image as file
docx_parser tests/demo.docx -D tests/media -o tests/out.file.jl

# parse image as base64 string
docx_parser tests/demo.docx -A base64 -o tests/out.base64.jl
  • Python
from docx_parser import DocumentParser

infile = 'tests/demo.docx'
doc = DocumentParser(infile)
for _type, item in doc.parse():
    print(_type, item)

ToDo

  • parse text style: color, bgcolor, font, bold, italic ...
  • parse paragraph format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docx_parser-1.0.2.tar.gz (5.3 kB view hashes)

Uploaded Source

Built Distribution

docx_parser-1.0.2-py3-none-any.whl (5.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page