Skip to main content

An integration package connecting Unstructured and LangChain

Reason this release was yanked:

yank unannounced package, replace with changes by unstructured

Project description

langchain-unstructured

This package contains the LangChain integration with Unstructured

Installation

pip install -U langchain-unstructured

And you should configure credentials by setting the following environment variables:

export UNSTRUCTURED_API_KEY="your-api-key"

Loaders

Partition and load files using either the unstructured-client sdk and the Unstructured API or locally using the unstructured library.

API: To partition via the Unstructured API pip install unstructured-client and set partition_via_api=True and define api_key. If you are running the unstructured API locally, you can change the API rule by defining url when you initialize the loader. The hosted Unstructured API requires an API key. See the links below to learn more about our API offerings and get an API key.

Local: By default the file loader uses the Unstructured partition function and will automatically detect the file type.

In addition to document specific partition parameters, Unstructured has a rich set of "chunking" parameters for post-processing elements into more useful text segments for uses cases such as Retrieval Augmented Generation (RAG). You can pass additional Unstructured kwargs to the loader to configure different unstructured settings.

Setup:

    pip install -U langchain-unstructured
    pip install -U unstructured-client
    export UNSTRUCTURED_API_KEY="your-api-key"

Instantiate:

from langchain_unstructured import UnstructuredLoader

loader = UnstructuredLoader(
    file_path = ["example.pdf", "fake.pdf"],
    api_key=UNSTRUCTURED_API_KEY,
    partition_via_api=True,
    chunking_strategy="by_title",
    strategy="fast",
)

Load:

docs = loader.load()

print(docs[0].page_content[:100])
print(docs[0].metadata)

References

https://docs.unstructured.io/api-reference/api-services/sdk https://docs.unstructured.io/api-reference/api-services/overview https://docs.unstructured.io/open-source/core-functionality/partitioning https://docs.unstructured.io/open-source/core-functionality/chunking

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_unstructured-0.1.0.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

langchain_unstructured-0.1.0-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file langchain_unstructured-0.1.0.tar.gz.

File metadata

File hashes

Hashes for langchain_unstructured-0.1.0.tar.gz
Algorithm Hash digest
SHA256 03f6f33024d034229c96e1c086baa82d0140970b2541d5d076fde2d9edada522
MD5 52707156f540bdfbf42cb595dc3274d1
BLAKE2b-256 5127ff7c936ec9f8df358ab634d98416c108c8f81ef9dee15be3ddeff7b624c4

See more details on using hashes here.

File details

Details for the file langchain_unstructured-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_unstructured-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e7cb0ce3fc2e94d0092326e5008a66a9aeaf8545f494a252a4906307c27fb4e
MD5 11524f6767c1f9790e5b8c164f8153f6
BLAKE2b-256 3f6e805f374d7b4b6be291e370b240fca715e5e3413d76b4e9f545c7534d3d39

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page