Skip to main content

A pure python-based utility to extract text, hyperlinks and imagesfrom docx files.

Project description

# docxpy


![](https://travis-ci.org/badbye/docxpy.svg?branch=master)
![PyPI](https://img.shields.io/pypi/pyversions/scrapy-corenlp.svg?style=flat-square)]


This project is forked from [ankushshah89/python-docx2txt](https://github.com/ankushshah89/python-docx2txt/pull/10/files).
A new feature is added: extract the hyperlinks and its corresponding texts.

It is a pure python-based utility to extract text from docx files. The code is taken and adapted from [python-docx](https://github.com/python-openxml/python-docx). It can however also extract **text** from header, footer and **hyperlinks**. It can now also extract **images**.

## How to install? ##
```bash
pip install docxpy
```

## How to run? ##

a. From command line:
```bash
# extract text
docx2txt file.docx
# extract text and images
docx2txt -i /tmp/img_dir file.docx
```


b. From python:
```python
import docxpy

c = 'file.docx'

# extract text
text = docxpy.process(file)

# extract text and write images in /tmp/img_dir
text = docxpy.process(file, "/tmp/img_dir")


# if you want the hyperlinks
doc = docxpy.DOCReader(file)
doc.process() # process file
hyperlinks = doc.data['links']
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docxpy-0.8.1.tar.gz (3.9 kB view details)

Uploaded Source

File details

Details for the file docxpy-0.8.1.tar.gz.

File metadata

  • Download URL: docxpy-0.8.1.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for docxpy-0.8.1.tar.gz
Algorithm Hash digest
SHA256 a52f28626e3161c74b73ad67a99b44bc8afb9cd851258e77cbf34833f46d2e8d
MD5 6b6384a0e48350642545069be6c4caaa
BLAKE2b-256 f3cc74e1d889e6a324187b37daf0f9369d5ec6b59f68a82c38b427f9c31c04da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page