llama-index readers azstorage_blob integration
Project description
Azure Storage Blob Loader
pip install llama-index-readers-azstorage-blob
This loader parses any file stored as an Azure Storage blob or the entire container (with an optional prefix / attribute filter) if no particular file is specified. When initializing AzStorageBlobReader
, you may pass in your account url with a SAS token or crdentials to authenticate.
All files are temporarily downloaded locally and subsequently parsed with SimpleDirectoryReader
. Hence, you may also specify a custom file_extractor
, relying on any of the loaders in this library (or your own)! If you need a clue on finding the file extractor object because you'd like to use your own file extractor, follow this sample.
import llama_index
file_extractor = llama_index.readers.file.base.DEFAULT_FILE_READER_CLS
# Make sure to use an instantiation of a class
file_extractor.update({".pdf": SimplePDFReader()})
Usage
To use this loader, you need to pass in the name of your Azure Storage Container. After that, if you want to just parse a single file, pass in its blob name. Note that if the file is nested in a subdirectory, the blob name should contain the path such as subdirectory/input.txt
. This loader is a thin wrapper over the Azure Blob Storage Client for Python, see ContainerClient for detailed parameter usage options.
Using a Storage Account SAS URL
from llama_index.readers.azstorage_blob import AzStorageBlobReader
loader = AzStorageBlobReader(
container="scrabble-dictionary",
blob="dictionary.txt",
account_url="<SAS_URL>",
)
documents = loader.load_data()
Using a Storage Account with connection string
The sample below will download all files in a container, by only specifying the storage account's connection string and the container name.
from llama_index.readers.azstorage_blob import AzStorageBlobReader
loader = AzStorageBlobReader(
container_name="<CONTAINER_NAME>",
connection_string="<STORAGE_ACCOUNT_CONNECTION_STRING>",
)
documents = loader.load_data()
Using Azure AD
Ensure the Azure Identity library is available pip install azure-identity
The sample below downloads all files in the container using the default credential, alternative credential options are available such as a service principal ClientSecretCredential
from azure.identity import DefaultAzureCredential
default_credential = DefaultAzureCredential()
from llama_index.readers.azstorage_blob import AzStorageBlobReader
loader = AzStorageBlobReader(
container_name="scrabble-dictionary",
account_url="https://<storage account name>.blob.core.windows.net",
credential=default_credential,
)
documents = loader.load_data()
This loader is designed to be used as a way to load data into LlamaIndex and/or subsequently used as a Tool in a LangChain Agent.
Updates
[2023-12-14] by JAlexMcGraw (#765)
- Added functionality to allow user to connect to blob storage with connection string
- Changed temporary file names from random to back to original names
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for llama_index_readers_azstorage_blob-0.1.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9013ecb349965dcef4bf2c5d34cdc1fe7c4cbd9b63550d54515e0ad4660eff7 |
|
MD5 | 6583e7d1edebac47916e2f8a769f8345 |
|
BLAKE2b-256 | a56ae0a934deee569b3c432e60338d701a09804b237a04c40795eee669f01b3d |
Hashes for llama_index_readers_azstorage_blob-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef6083613e6d663a12ebfea330a992098f07808289ec08d2b993944fcbc1072f |
|
MD5 | 45cacf4efb49ec293a418ae295f0ebfa |
|
BLAKE2b-256 | cb8677452745fdafcbfc402f063a561e0d6ab54dbc119507a033c6518e31808b |