llama-index readers azstorage_blob integration
Project description
Azure Storage Blob Loader
pip install llama-index-readers-azstorage-blob
This loader parses any file stored as an Azure Storage blob or the entire container (with an optional prefix / attribute filter) if no particular file is specified. When initializing AzStorageBlobReader
, you may pass in your account url with a SAS token or crdentials to authenticate.
All files are temporarily downloaded locally and subsequently parsed with SimpleDirectoryReader
. Hence, you may also specify a custom file_extractor
, relying on any of the loaders in this library (or your own)! If you need a clue on finding the file extractor object because you'd like to use your own file extractor, follow this sample.
import llama_index
file_extractor = llama_index.readers.file.base.DEFAULT_FILE_READER_CLS
# Make sure to use an instantiation of a class
file_extractor.update({".pdf": SimplePDFReader()})
Usage
To use this loader, you need to pass in the name of your Azure Storage Container. After that, if you want to just parse a single file, pass in its blob name. Note that if the file is nested in a subdirectory, the blob name should contain the path such as subdirectory/input.txt
. This loader is a thin wrapper over the Azure Blob Storage Client for Python, see ContainerClient for detailed parameter usage options.
Using a Storage Account SAS URL
from llama_index.readers.azstorage_blob import AzStorageBlobReader
loader = AzStorageBlobReader(
container="scrabble-dictionary",
blob="dictionary.txt",
account_url="<SAS_URL>",
)
documents = loader.load_data()
Using a Storage Account with connection string
The sample below will download all files in a container, by only specifying the storage account's connection string and the container name.
from llama_index.readers.azstorage_blob import AzStorageBlobReader
loader = AzStorageBlobReader(
container_name="<CONTAINER_NAME>",
connection_string="<STORAGE_ACCOUNT_CONNECTION_STRING>",
)
documents = loader.load_data()
Using Azure AD
Ensure the Azure Identity library is available pip install azure-identity
The sample below downloads all files in the container using the default credential, alternative credential options are available such as a service principal ClientSecretCredential
from azure.identity import DefaultAzureCredential
default_credential = DefaultAzureCredential()
from llama_index.readers.azstorage_blob import AzStorageBlobReader
loader = AzStorageBlobReader(
container_name="scrabble-dictionary",
account_url="https://<storage account name>.blob.core.windows.net",
credential=default_credential,
)
documents = loader.load_data()
This loader is designed to be used as a way to load data into LlamaIndex and/or subsequently used as a Tool in a LangChain Agent. See here for examples.
Updates
[2023-12-14] by JAlexMcGraw (#765)
- Added functionality to allow user to connect to blob storage with connection string
- Changed temporary file names from random to back to original names
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for llama_index_readers_azstorage_blob-0.1.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad0f507aeb93144499c4e0743812faaed8991f9ca49e946ef9c007926e32dda3 |
|
MD5 | 14c6bb47fbf073bce850667770da29c5 |
|
BLAKE2b-256 | 79a2d090c1fc1f57cd1033f013e09860ec0639d06d3793e990f882654c94195c |
Hashes for llama_index_readers_azstorage_blob-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1cb5deddc9b08ed25335ddc096712d47f73b6eb0b837cd233bce4ab4644d5ba |
|
MD5 | a3f95a38db3958e2fc361ed40cef6c08 |
|
BLAKE2b-256 | 489e1b95005501612d95686ac1a623f3236f22724a9c035970403623045872e3 |