A pure python-based utility to extract text, hyperlinks and imagesfrom docx files.
Project description
# docxpy

]
This project is forked from [ankushshah89/python-docx2txt](https://github.com/ankushshah89/python-docx2txt/pull/10/files).
A new feature is added: extract the hyperlinks and its corresponding texts.
It is a pure python-based utility to extract text from docx files. The code is taken and adapted from [python-docx](https://github.com/python-openxml/python-docx). It can however also extract **text** from header, footer and **hyperlinks**. It can now also extract **images**.
## How to install? ##
```bash
pip install docxpy
```
## How to run? ##
a. From command line:
```bash
# extract text
docx2txt file.docx
# extract text and images
docx2txt -i /tmp/img_dir file.docx
```
b. From python:
```python
import docxpy
c = 'file.docx'
# extract text
text = docxpy.process(file)
# extract text and write images in /tmp/img_dir
text = docxpy.process(file, "/tmp/img_dir")
# if you want the hyperlinks
doc = docxpy.DOCReader(file)
doc.process() # process file
hyperlinks = doc.data['links']
```

]
This project is forked from [ankushshah89/python-docx2txt](https://github.com/ankushshah89/python-docx2txt/pull/10/files).
A new feature is added: extract the hyperlinks and its corresponding texts.
It is a pure python-based utility to extract text from docx files. The code is taken and adapted from [python-docx](https://github.com/python-openxml/python-docx). It can however also extract **text** from header, footer and **hyperlinks**. It can now also extract **images**.
## How to install? ##
```bash
pip install docxpy
```
## How to run? ##
a. From command line:
```bash
# extract text
docx2txt file.docx
# extract text and images
docx2txt -i /tmp/img_dir file.docx
```
b. From python:
```python
import docxpy
c = 'file.docx'
# extract text
text = docxpy.process(file)
# extract text and write images in /tmp/img_dir
text = docxpy.process(file, "/tmp/img_dir")
# if you want the hyperlinks
doc = docxpy.DOCReader(file)
doc.process() # process file
hyperlinks = doc.data['links']
```
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
docxpy-0.8.1.tar.gz
(3.9 kB
view details)
File details
Details for the file docxpy-0.8.1.tar.gz
.
File metadata
- Download URL: docxpy-0.8.1.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
a52f28626e3161c74b73ad67a99b44bc8afb9cd851258e77cbf34833f46d2e8d
|
|
MD5 |
6b6384a0e48350642545069be6c4caaa
|
|
BLAKE2b-256 |
f3cc74e1d889e6a324187b37daf0f9369d5ec6b59f68a82c38b427f9c31c04da
|