llama-index readers azstorage_blob integration
Project description
Azure Storage Blob Loader
This loader parses any file stored as an Azure Storage blob or the entire container (with an optional prefix / attribute filter) if no particular file is specified. When initializing AzStorageBlobReader
, you may pass in your account url with a SAS token or crdentials to authenticate.
All files are temporarily downloaded locally and subsequently parsed with SimpleDirectoryReader
. Hence, you may also specify a custom file_extractor
, relying on any of the loaders in this library (or your own)! If you need a clue on finding the file extractor object because you'd like to use your own file extractor, follow this sample.
import llama_index
file_extractor = llama_index.readers.file.base.DEFAULT_FILE_READER_CLS
# Make sure to use an instantiation of a class
file_extractor.update({".pdf": SimplePDFReader()})
Usage
To use this loader, you need to pass in the name of your Azure Storage Container. After that, if you want to just parse a single file, pass in its blob name. Note that if the file is nested in a subdirectory, the blob name should contain the path such as subdirectory/input.txt
. This loader is a thin wrapper over the Azure Blob Storage Client for Python, see ContainerClient for detailed parameter usage options.
Using a Storage Account SAS URL
from llama_index import download_loader
AzStorageBlobReader = download_loader("AzStorageBlobReader")
loader = AzStorageBlobReader(
container="scrabble-dictionary",
blob="dictionary.txt",
account_url="<SAS_URL>",
)
documents = loader.load_data()
Using a Storage Account with connection string
The sample below will download all files in a container, by only specifying the storage account's connection string and the container name.
from llama_index import download_loader
AzStorageBlobReader = download_loader("AzStorageBlobReader")
loader = AzStorageBlobReader(
container_name="<CONTAINER_NAME>",
connection_string="<STORAGE_ACCOUNT_CONNECTION_STRING>",
)
documents = loader.load_data()
Using Azure AD
Ensure the Azure Identity library is available pip install azure-identity
The sample below downloads all files in the container using the default credential, alternative credential options are available such as a service principal ClientSecretCredential
from llama_index import download_loader
from azure.identity import DefaultAzureCredential
default_credential = DefaultAzureCredential()
AzStorageBlobReader = download_loader("AzStorageBlobReader")
loader = AzStorageBlobReader(
container_name="scrabble-dictionary",
account_url="https://<storage account name>.blob.core.windows.net",
credential=default_credential,
)
documents = loader.load_data()
This loader is designed to be used as a way to load data into LlamaIndex and/or subsequently used as a Tool in a LangChain Agent. See here for examples.
Updates
[2023-12-14] by JAlexMcGraw (#765)
- Added functionality to allow user to connect to blob storage with connection string
- Changed temporary file names from random to back to original names
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for llama_index_readers_azstorage_blob-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46194316dc8044b96fb2c15be4e60f183c3d7a80c8ceda1a8acc1d510c8b33a0 |
|
MD5 | 4afce6b4168f9604ace3b775e6f1b416 |
|
BLAKE2b-256 | 9fd29a7e8dccafbdfdba33a2f3f6353248e42a5224c09bef32f7437e6fcbf4af |
Hashes for llama_index_readers_azstorage_blob-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18735c5e3b43406103b6f089d36bb37e9aaef4ef0758afb812c589e688e3f386 |
|
MD5 | b8c40e1805979f6511b675b64385b0a1 |
|
BLAKE2b-256 | 2c98d214bc9483bf40c71392195f76839c691cdd54b7b10c3c9569af7407d7d6 |