Skip to main content

An integration package connecting Unstructured and LangChain

Project description

langchain-unstructured

This package contains the LangChain integration with Unstructured

Installation

pip install -U langchain-unstructured

And you should configure credentials by setting the following environment variables:

export UNSTRUCTURED_API_KEY="your-api-key"

Loaders

Partition and load files using either the unstructured-client sdk and the Unstructured API or locally using the unstructured library.

API: To partition via the Unstructured API pip install unstructured-client and set partition_via_api=True and define api_key. If you are running the unstructured API locally, you can change the API rule by defining url when you initialize the loader. The hosted Unstructured API requires an API key. See the links below to learn more about our API offerings and get an API key.

Local: By default the file loader uses the Unstructured partition function and will automatically detect the file type.

In addition to document specific partition parameters, Unstructured has a rich set of "chunking" parameters for post-processing elements into more useful text segments for uses cases such as Retrieval Augmented Generation (RAG). You can pass additional Unstructured kwargs to the loader to configure different unstructured settings.

Setup:

    pip install -U langchain-unstructured
    pip install -U unstructured-client
    export UNSTRUCTURED_API_KEY="your-api-key"

Instantiate:

from langchain_unstructured import UnstructuredLoader

loader = UnstructuredLoader(
    file_path = ["example.pdf", "fake.pdf"],
    api_key=UNSTRUCTURED_API_KEY,
    partition_via_api=True,
    chunking_strategy="by_title",
    strategy="fast",
)

Load:

docs = loader.load()

print(docs[0].page_content[:100])
print(docs[0].metadata)

References

https://docs.unstructured.io/api-reference/api-services/sdk https://docs.unstructured.io/api-reference/api-services/overview https://docs.unstructured.io/open-source/core-functionality/partitioning https://docs.unstructured.io/open-source/core-functionality/chunking

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gigachain_unstructured-0.1.2.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

gigachain_unstructured-0.1.2-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file gigachain_unstructured-0.1.2.tar.gz.

File metadata

  • Download URL: gigachain_unstructured-0.1.2.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.5 Darwin/23.5.0

File hashes

Hashes for gigachain_unstructured-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3b88c12f516ed41f706b553bc2acf95b9fcf17015ff9ff000199109d9133ef62
MD5 e9506c5cdd481a67ffb95cbb7c1575b0
BLAKE2b-256 9d686cdb817d970c1833df0f34500e4b609954080c435018c89f14714e6d6efc

See more details on using hashes here.

File details

Details for the file gigachain_unstructured-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for gigachain_unstructured-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 045ba95139d87356df7c40dff9bea2e04ec176a1e05b7949a83a931a21d3e2af
MD5 1c1c98242a19bd8f17b1427a63bf18e1
BLAKE2b-256 9ad6f9a4d6d0d0fd41d8ad7af2d7cb0b6b799d0eb49e57f556d179dbb9261167

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page