Skip to main content

Expansion to the unstructured package, adding support for image extraction.

Project description

Unstructured Expanded

The unstructured_expanded library is a wrapper around the unstructured open source library to add image-extraction capabilities to the API.

Its only purpose is to provide a more complete API for the unstructured library, since the library maintainers of the open source project have chosen to lock image extraction for office documents behind a paywall.

Quick-Start

This library is meant to be used in conjunction with the unstructured library.

Versions of this library are equivalent to the unstructured library version they are based on.

# Install the variant of unstructured with everything you need support for
pip install unstructured["all-docs"]

# Install the unstructured_expanded library on top of it
pip install unstructured_expanded

License

See the licensing information in the LICENSE file.

Citation

If you use this library in your research, please include a citation:

@misc{unstructured_expanded,
  title={Unstructured_expanded: A Python Library for Extracting Text and Images from Documents using the unstructured API.},
  author={Kogan, Isaac},
  year={2024},
  url={https://github.com/isaackogan/unstructured_expanded}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unstructured_expanded-0.16.4.post4.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file unstructured_expanded-0.16.4.post4.tar.gz.

File metadata

File hashes

Hashes for unstructured_expanded-0.16.4.post4.tar.gz
Algorithm Hash digest
SHA256 e0c0f00ad0c52871994bfae80172dd581e71e2dfe312754c583d71214d49cff7
MD5 8593f082bc07dd5d3a0f539842260800
BLAKE2b-256 fc122bcd8021d9aa2ffaf212c3f0674828b9c0f81099b91bfcb979567174d33a

See more details on using hashes here.

File details

Details for the file unstructured_expanded-0.16.4.post4-py3-none-any.whl.

File metadata

File hashes

Hashes for unstructured_expanded-0.16.4.post4-py3-none-any.whl
Algorithm Hash digest
SHA256 4dbd46d7a3e6f9950d4066909f1a705ccb0983c67fceae587ee7b852930bf3ae
MD5 d481ed6b683883a70b5320d138c0dc65
BLAKE2b-256 cd58f8e4a127ff398993eaa25c5b721c8efb2a7e22e277a612d73593c30384fe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page