parse all contents of a docx file with python-docx
Project description
Parse all contents of a docx file with python-docx
Installation
python3 -m pip install docx-parser
Features:
paragraph
: text paragraph, with style_idmultipart
: paragraph with image or hyperlinktable
: table data with merged_cells
Examples
- CMD
docx_parser tests/demo.docx
docx_parser tests/demo.docx -A base64 -o out.jl
- Python
from docx_parser import DocumentParser
infile = 'tests/demo.docx'
doc = DocumentParser(infile)
for _type, item in doc.parse():
print(_type, item)
ToDo
- parse text style: color, bgcolor, font, bold, italic ...
- parse paragraph format
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
docx_parser-1.0.0.tar.gz
(5.1 kB
view details)
Built Distribution
File details
Details for the file docx_parser-1.0.0.tar.gz
.
File metadata
- Download URL: docx_parser-1.0.0.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ecac84b6d46020adeb4488358c04d1763a632737af0d8f6d296392e9e58b7aa6 |
|
MD5 | cbe1ee97d36544723909b9b7e05604f0 |
|
BLAKE2b-256 | 1c3c27d7a71787bd05126de06bfcf337695e8b577b9538f4460ce9c7803da092 |
File details
Details for the file docx_parser-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: docx_parser-1.0.0-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef896abadbbea308c981403ebdb595c4c15b7c83747bac8e85c96cfdf8014b55 |
|
MD5 | 09fbdbd0f2c55749f930e815b3543e2f |
|
BLAKE2b-256 | a4fcbfc7fb07d19f249a505c4aa09fa0d2a0103bac50f8fe62229a1202daae6b |