Skip to main content

llama-index readers azstorage_blob integration

Project description

Azure Storage Blob Loader

pip install llama-index-readers-azstorage-blob

This loader parses any file stored as an Azure Storage blob or the entire container (with an optional prefix / attribute filter) if no particular file is specified. When initializing AzStorageBlobReader, you may pass in your account url with a SAS token or crdentials to authenticate.

All files are temporarily downloaded locally and subsequently parsed with SimpleDirectoryReader. Hence, you may also specify a custom file_extractor, relying on any of the loaders in this library (or your own)! If you need a clue on finding the file extractor object because you'd like to use your own file extractor, follow this sample.

import llama_index

file_extractor = llama_index.readers.file.base.DEFAULT_FILE_READER_CLS

# Make sure to use an instantiation of a class
file_extractor.update({".pdf": SimplePDFReader()})

Usage

To use this loader, you need to pass in the name of your Azure Storage Container. After that, if you want to just parse a single file, pass in its blob name. Note that if the file is nested in a subdirectory, the blob name should contain the path such as subdirectory/input.txt. This loader is a thin wrapper over the Azure Blob Storage Client for Python, see ContainerClient for detailed parameter usage options.

Using a Storage Account SAS URL

from llama_index.readers.azstorage_blob import AzStorageBlobReader

loader = AzStorageBlobReader(
    container="scrabble-dictionary",
    blob="dictionary.txt",
    account_url="<SAS_URL>",
)

documents = loader.load_data()

Using a Storage Account with connection string

The sample below will download all files in a container, by only specifying the storage account's connection string and the container name.

from llama_index.readers.azstorage_blob import AzStorageBlobReader

loader = AzStorageBlobReader(
    container_name="<CONTAINER_NAME>",
    connection_string="<STORAGE_ACCOUNT_CONNECTION_STRING>",
)

documents = loader.load_data()

Using Azure AD

Ensure the Azure Identity library is available pip install azure-identity

The sample below downloads all files in the container using the default credential, alternative credential options are available such as a service principal ClientSecretCredential

from azure.identity import DefaultAzureCredential

default_credential = DefaultAzureCredential()

from llama_index.readers.azstorage_blob import AzStorageBlobReader

loader = AzStorageBlobReader(
    container_name="scrabble-dictionary",
    account_url="https://<storage account name>.blob.core.windows.net",
    credential=default_credential,
)

documents = loader.load_data()

This loader is designed to be used as a way to load data into LlamaIndex.

Updates

[2023-12-14] by JAlexMcGraw (#765)

  • Added functionality to allow user to connect to blob storage with connection string
  • Changed temporary file names from random to back to original names

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file llama_index_readers_azstorage_blob-0.2.0.tar.gz.

File metadata

File hashes

Hashes for llama_index_readers_azstorage_blob-0.2.0.tar.gz
Algorithm Hash digest
SHA256 250e4f343d94f828d181739f9267f10ce7318787ddaced768f5bf9475bc81559
MD5 b1ca75488dc0402223897d88ece504fc
BLAKE2b-256 535410099f7689279f670f6c862930ac8dc7c4e7fee3111f148cb4b7784821ab

See more details on using hashes here.

File details

Details for the file llama_index_readers_azstorage_blob-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_readers_azstorage_blob-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 37ecbe3aeb63a48b341d2d250f56adb8665aaeef7ed7c642774b560741b9957c
MD5 64c8417bcff62711e34cfe595eb0eac2
BLAKE2b-256 8b093d791f6af456fec76cc4825aa6c9c0be621a63142ddb03dc282dc2139399

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page