Skip to main content

llama-index readers azstorage_blob integration

Project description

Azure Storage Blob Loader

pip install llama-index-readers-azstorage-blob

This loader parses any file stored as an Azure Storage blob or the entire container (with an optional prefix / attribute filter) if no particular file is specified. When initializing AzStorageBlobReader, you may pass in your account url with a SAS token or crdentials to authenticate.

All files are temporarily downloaded locally and subsequently parsed with SimpleDirectoryReader. Hence, you may also specify a custom file_extractor, relying on any of the loaders in this library (or your own)! If you need a clue on finding the file extractor object because you'd like to use your own file extractor, follow this sample.

import llama_index

file_extractor = llama_index.readers.file.base.DEFAULT_FILE_READER_CLS

# Make sure to use an instantiation of a class
file_extractor.update({".pdf": SimplePDFReader()})

Usage

To use this loader, you need to pass in the name of your Azure Storage Container. After that, if you want to just parse a single file, pass in its blob name. Note that if the file is nested in a subdirectory, the blob name should contain the path such as subdirectory/input.txt. This loader is a thin wrapper over the Azure Blob Storage Client for Python, see ContainerClient for detailed parameter usage options.

Using a Storage Account SAS URL

from llama_index.readers.azstorage_blob import AzStorageBlobReader

loader = AzStorageBlobReader(
    container="scrabble-dictionary",
    blob="dictionary.txt",
    account_url="<SAS_URL>",
)

documents = loader.load_data()

Using a Storage Account with connection string

The sample below will download all files in a container, by only specifying the storage account's connection string and the container name.

from llama_index.readers.azstorage_blob import AzStorageBlobReader

loader = AzStorageBlobReader(
    container_name="<CONTAINER_NAME>",
    connection_string="<STORAGE_ACCOUNT_CONNECTION_STRING>",
)

documents = loader.load_data()

Using Azure AD

Ensure the Azure Identity library is available pip install azure-identity

The sample below downloads all files in the container using the default credential, alternative credential options are available such as a service principal ClientSecretCredential

from azure.identity import DefaultAzureCredential

default_credential = DefaultAzureCredential()

from llama_index.readers.azstorage_blob import AzStorageBlobReader

loader = AzStorageBlobReader(
    container_name="scrabble-dictionary",
    account_url="https://<storage account name>.blob.core.windows.net",
    credential=default_credential,
)

documents = loader.load_data()

This loader is designed to be used as a way to load data into LlamaIndex.

Updates

[2023-12-14] by JAlexMcGraw (#765)

  • Added functionality to allow user to connect to blob storage with connection string
  • Changed temporary file names from random to back to original names

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_azstorage_blob-0.5.0.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_readers_azstorage_blob-0.5.0.tar.gz.

File metadata

  • Download URL: llama_index_readers_azstorage_blob-0.5.0.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_azstorage_blob-0.5.0.tar.gz
Algorithm Hash digest
SHA256 c4e13aa2ed745cd7ef4fe8805689e6e4b2a1bc3896473fc57bcc1f3eacf2cc42
MD5 47fed3dbabad60ebee2774d3b62e72e2
BLAKE2b-256 714017aaec14273155693e2ba06ef2123e7eccb8b18ebfc506a57a46da5c44df

See more details on using hashes here.

File details

Details for the file llama_index_readers_azstorage_blob-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: llama_index_readers_azstorage_blob-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_azstorage_blob-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b3b7518795228cb7c19d2f0386eeff2d2f9d0f3f71c0ed5e5c5def34b8c910f2
MD5 9b4c17d83941a0af0f483ea126470f66
BLAKE2b-256 599e223775ccd8eb7ed35fea25b384de446596ec75e11e6ab5a52dac6d8467b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page