Get the parsed microsoft word document in a hierarchical tree structure.
Project description
mswordtree
Parse your whole word document in a hierarchical tree structure. The document content will be listed down as Heading and its children as subheading/paragraph/table etc.
Install the library using following comand
pip install mswordtree
Use the following code to parse your word document in a tree structure
from mswordtree import GetWordDocTree
root = GetWordDocTree('test.docx')
Now you can iterate over all objects of the document by using the following code
for item in root.Items:
print('Type: {} -> Content {}\n'.format(item.Type, item.Content))
To make the json use the following code
from mswordtree import ToString
ToString([root])
Common Methods
Find(guid)
Use the root element to find any element in its tree structure by mathing its GUID.
item = root.Find('3b34509b-533e-40cc-b0dc-c44df5bcba51')
ToString_AllHeadings(root)
Returns the string of all heading elements in a tree structure, which we can use as a json string.
from mswordtree import ToString_AllHeadings
import json
data = ToString_AllHeadings(root)
json.dumps(data)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mswordtree-0.1.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f9eedb472e14e0554f5aabcc670337f43f025b0f2c3367b71f85c9a3a0d9c559 |
|
MD5 | f9b944fbbd8623d74c750340c75f94f4 |
|
BLAKE2b-256 | 89189c839c7b60ad6df30b3d98af71a28285b622a9b533ceb4569f3badccdb1b |