parse all contents of a docx file with python-docx
Project description
Parse all contents of a docx file with python-docx
Installation
python3 -m pip install docx-parser
Features:
paragraph
: text paragraph, with style_idmultipart
: paragraph with image or hyperlinktable
: table data with merged_cells
Examples
- CMD
docx_parser --help
# parse image as file
docx_parser tests/demo.docx -D tests/media -o tests/out.file.jl
# parse image as base64 string
docx_parser tests/demo.docx -A base64 -o tests/out.base64.jl
- Python
from docx_parser import DocumentParser
infile = 'tests/demo.docx'
doc = DocumentParser(infile)
for _type, item in doc.parse():
print(_type, item)
ToDo
- parse text style: color, bgcolor, font, bold, italic ...
- parse paragraph format
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
docx_parser-1.0.1.tar.gz
(5.1 kB
view hashes)
Built Distribution
Close
Hashes for docx_parser-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf03908319d0c557c202e37768b12620c00435ff603a11f39b6d7cc115a1a2c1 |
|
MD5 | 025cb18c3db698833bc4479f069f9148 |
|
BLAKE2b-256 | fe1926b3aa5f19d37c3db6ee5977b92b856dd837ca1672046b7d4c5ebfc5644d |