Expansion to the unstructured package, adding support for image extraction.
Project description
Unstructured Expanded
The unstructured_expanded
library is a wrapper around the unstructured
open source library to add image-extraction capabilities to the API.
Its only purpose is to provide a more complete API for the unstructured
library, since the library maintainers of the open source project
have chosen to lock image extraction for office documents behind a paywall.
Quick-Start
This library is meant to be used in conjunction with the unstructured
library.
Versions of this library are equivalent to the unstructured
library version they are based on.
# Install the variant of unstructured with everything you need support for
pip install unstructured["all-docs"]
# Install the unstructured_expanded library on top of it
pip install unstructured_expanded
License
See the licensing information in the LICENSE file.
Citation
If you use this library in your research, please include a citation:
@misc{unstructured_expanded,
title={Unstructured_expanded: A Python Library for Extracting Text and Images from Documents using the unstructured API.},
author={Kogan, Isaac},
year={2024},
url={https://github.com/isaackogan/unstructured_expanded}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for unstructured_expanded-0.16.4.post3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e414089592e4e20efd160941584acf3cc786d28e644a0af9f2920337ec83ce0 |
|
MD5 | 25072a1286c78ae308a61488bc12aa33 |
|
BLAKE2b-256 | f5c655932bc5cf99da7aa447757b2d8dd81dae07693b043980b9ae6476004f2b |
Hashes for unstructured_expanded-0.16.4.post3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11bba6b3a158ef0996a314e4d5e2c4aa1568a99a3316a37c0cd564c8391f6b7b |
|
MD5 | 6c03cadafecf3cbfe8042981bf4050b8 |
|
BLAKE2b-256 | 7b339f2d69936dd3495172fd508e1d7ee2c78e0d40e9c335f06174a2ef2a96d3 |