Expansion to the unstructured package, adding support for image extraction.
Project description
Unstructured Expanded
The unstructured_expanded
library is a wrapper around the unstructured
open source library to add image-extraction capabilities to the API.
Its only purpose is to provide a more complete API for the unstructured
library, since the library maintainers of the open source project
have chosen to lock image extraction for office documents behind a paywall.
Quick-Start
This library is meant to be used in conjunction with the unstructured
library.
Versions of this library are equivalent to the unstructured
library version they are based on.
# Install the variant of unstructured with everything you need support for
pip install unstructured["all-docs"]
# Install the unstructured_expanded library on top of it
pip install unstructured_expanded
License
See the licensing information in the LICENSE file.
Citation
If you use this library in your research, please include a citation:
@misc{unstructured_expanded,
title={Unstructured_expanded: A Python Library for Extracting Text and Images from Documents using the unstructured API.},
author={Kogan, Isaac},
year={2024},
url={https://github.com/isaackogan/unstructured_expanded}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for unstructured_expanded-0.16.4.post1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5f1e7e5a39291398f28e21fd1ecfe2a497a1ebebf1ed45c54306da0bce08b79 |
|
MD5 | e11c130ea6bb31f2cf6aef66e98717b4 |
|
BLAKE2b-256 | 8c142d0906f27bb5cc7712dbe1a7ed7bcc5436ec2943484d50d63ae98e3640f0 |
Hashes for unstructured_expanded-0.16.4.post1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a8c3437ffcef365ae092ea24bce487e6cb8fb2da8ab3203c112e467bfe9e91f |
|
MD5 | b78dfa19019d6b2d55afa26a904bf770 |
|
BLAKE2b-256 | 607f844aaaf1e060af84da4ecf0e261cfd0ab69dbbd04db24c66d3f9bbc23106 |