Skip to main content

llama-index readers azstorage_blob integration

Project description

Azure Storage Blob Loader

pip install llama-index-readers-azstorage-blob

This loader parses any file stored as an Azure Storage blob or the entire container (with an optional prefix / attribute filter) if no particular file is specified. When initializing AzStorageBlobReader, you may pass in your account url with a SAS token or crdentials to authenticate.

All files are temporarily downloaded locally and subsequently parsed with SimpleDirectoryReader. Hence, you may also specify a custom file_extractor, relying on any of the loaders in this library (or your own)! If you need a clue on finding the file extractor object because you'd like to use your own file extractor, follow this sample.

import llama_index

file_extractor = llama_index.readers.file.base.DEFAULT_FILE_READER_CLS

# Make sure to use an instantiation of a class
file_extractor.update({".pdf": SimplePDFReader()})

Usage

To use this loader, you need to pass in the name of your Azure Storage Container. After that, if you want to just parse a single file, pass in its blob name. Note that if the file is nested in a subdirectory, the blob name should contain the path such as subdirectory/input.txt. This loader is a thin wrapper over the Azure Blob Storage Client for Python, see ContainerClient for detailed parameter usage options.

Using a Storage Account SAS URL

from llama_index.readers.azstorage_blob import AzStorageBlobReader

loader = AzStorageBlobReader(
    container="scrabble-dictionary",
    blob="dictionary.txt",
    account_url="<SAS_URL>",
)

documents = loader.load_data()

Using a Storage Account with connection string

The sample below will download all files in a container, by only specifying the storage account's connection string and the container name.

from llama_index.readers.azstorage_blob import AzStorageBlobReader

loader = AzStorageBlobReader(
    container_name="<CONTAINER_NAME>",
    connection_string="<STORAGE_ACCOUNT_CONNECTION_STRING>",
)

documents = loader.load_data()

Using Azure AD

Ensure the Azure Identity library is available pip install azure-identity

The sample below downloads all files in the container using the default credential, alternative credential options are available such as a service principal ClientSecretCredential

from azure.identity import DefaultAzureCredential

default_credential = DefaultAzureCredential()

from llama_index.readers.azstorage_blob import AzStorageBlobReader

loader = AzStorageBlobReader(
    container_name="scrabble-dictionary",
    account_url="https://<storage account name>.blob.core.windows.net",
    credential=default_credential,
)

documents = loader.load_data()

This loader is designed to be used as a way to load data into LlamaIndex.

Updates

[2023-12-14] by JAlexMcGraw (#765)

  • Added functionality to allow user to connect to blob storage with connection string
  • Changed temporary file names from random to back to original names

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file llama_index_readers_azstorage_blob-0.3.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_azstorage_blob-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e06140415767ad62693a99d3479eccc5ab76497c19b039a0f19e034f834ad32b
MD5 d8badfbd15952f78cb63635fe1b086d3
BLAKE2b-256 25c9b78c266fb0d52d52f89cb2e5ca061c8b28f4672e6daab350c350d2f076be

See more details on using hashes here.

File details

Details for the file llama_index_readers_azstorage_blob-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_azstorage_blob-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 80279a0df92b47290525f282b8d122ce30b65388ed739991e7a10d14236ee69e
MD5 ff903a4a03dfb2f7886b7b822e8d59d3
BLAKE2b-256 cf4f12d43162ff27744865469f3b4a2b5dc6e182a92884e5e77c5eec44ebb0ad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page