llama-index readers azstorage_blob integration
Project description
Azure Storage Blob Loader
pip install llama-index-readers-azstorage-blob
This loader parses any file stored as an Azure Storage blob or the entire container (with an optional prefix / attribute filter) if no particular file is specified. When initializing AzStorageBlobReader
, you may pass in your account url with a SAS token or crdentials to authenticate.
All files are temporarily downloaded locally and subsequently parsed with SimpleDirectoryReader
. Hence, you may also specify a custom file_extractor
, relying on any of the loaders in this library (or your own)! If you need a clue on finding the file extractor object because you'd like to use your own file extractor, follow this sample.
import llama_index
file_extractor = llama_index.readers.file.base.DEFAULT_FILE_READER_CLS
# Make sure to use an instantiation of a class
file_extractor.update({".pdf": SimplePDFReader()})
Usage
To use this loader, you need to pass in the name of your Azure Storage Container. After that, if you want to just parse a single file, pass in its blob name. Note that if the file is nested in a subdirectory, the blob name should contain the path such as subdirectory/input.txt
. This loader is a thin wrapper over the Azure Blob Storage Client for Python, see ContainerClient for detailed parameter usage options.
Using a Storage Account SAS URL
from llama_index.readers.azstorage_blob import AzStorageBlobReader
loader = AzStorageBlobReader(
container="scrabble-dictionary",
blob="dictionary.txt",
account_url="<SAS_URL>",
)
documents = loader.load_data()
Using a Storage Account with connection string
The sample below will download all files in a container, by only specifying the storage account's connection string and the container name.
from llama_index.readers.azstorage_blob import AzStorageBlobReader
loader = AzStorageBlobReader(
container_name="<CONTAINER_NAME>",
connection_string="<STORAGE_ACCOUNT_CONNECTION_STRING>",
)
documents = loader.load_data()
Using Azure AD
Ensure the Azure Identity library is available pip install azure-identity
The sample below downloads all files in the container using the default credential, alternative credential options are available such as a service principal ClientSecretCredential
from azure.identity import DefaultAzureCredential
default_credential = DefaultAzureCredential()
from llama_index.readers.azstorage_blob import AzStorageBlobReader
loader = AzStorageBlobReader(
container_name="scrabble-dictionary",
account_url="https://<storage account name>.blob.core.windows.net",
credential=default_credential,
)
documents = loader.load_data()
This loader is designed to be used as a way to load data into LlamaIndex.
Updates
[2023-12-14] by JAlexMcGraw (#765)
- Added functionality to allow user to connect to blob storage with connection string
- Changed temporary file names from random to back to original names
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for llama_index_readers_azstorage_blob-0.1.6.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d3e62651cc059a3c508ff66a1efbe768e334d9e75b18638909bc655088e6520 |
|
MD5 | 567a698c6055910f464d4e0f57741b0d |
|
BLAKE2b-256 | 52794d1b431679c7f968039f9e69a822f8747a1f166f9d233c4d85bb37bc4b33 |
Hashes for llama_index_readers_azstorage_blob-0.1.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99403df8e7ce4075b379819b3ad67e87bd2d65fe98f54db2a0b62738fb777480 |
|
MD5 | 4cea884ede027c25ad2198a52ad02569 |
|
BLAKE2b-256 | ae7c1a43f86d022bf22bf16557cb44227da658b75ba99fe7029ad4c30ec1588b |