Azure Blob ObjectStorage plugin for mistralai-search-toolkit
Project description
Azure Blob Storage Plugin for Search Toolkit
Azure Blob Storage backend for mistralai-search-toolkit.
This plugin implements the Search Toolkit's ObjectStorage interface, enabling the ingestion pipeline to load files directly from Azure Blob Storage.
Installation
pip install mistralai-search-toolkit-storage-azure
Or as an optional dependency of the core package:
pip install mistralai-search-toolkit[storage-azure]
Quick Start: Load Files from Azure in Ingestion Pipeline
1. Upload a File to Azure Blob Storage
import asyncio
from mistralai.search.toolkit.plugins.storage.azure import AzureBlobStorage
async def upload_file():
storage = AzureBlobStorage(
container_name="documents",
account_name="your-account",
)
# Upload a file
with open("document.pdf", "rb") as f:
data = f.read()
await storage.put(key="documents/document.pdf", data=data)
asyncio.run(upload_file())
2. Load Files from Azure in Ingestion Pipeline
import asyncio
import os
from mistralai.search.toolkit.ingestion.loaders import FileLoader
from mistralai.search.toolkit.ingestion.pipelines import Pipeline
from mistralai.search.toolkit.ingestion.text_splitters import CharacterTextSplitter
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING
from mistralai.client import Mistral
from mistralai.search.toolkit.plugins.storage.azure import AzureBlobStorage
from mistralai.search.toolkit.plugins.vespa import VespaClientConfig
from vespa_app import app
async def ingest_from_azure():
# Create Azure storage factory
def azure_storage_factory():
return AzureBlobStorage(
container_name="documents",
account_name="your-account",
)
# Create FileLoader backed by Azure
file_loader = FileLoader(storage_factory=azure_storage_factory)
# Create ingestion pipeline
mistral_client = Mistral(api_key=os.environ.get("MISTRAL_API_KEY"))
vespa_config = VespaClientConfig(
endpoint=os.environ.get("VESPA_ENDPOINT", "http://localhost:8080"),
)
vector_store = app.get_search_index(vespa_config, collection_name="articles")
pipeline = Pipeline(
loader=file_loader,
text_splitter=CharacterTextSplitter(chunk_size=512),
embedder=MistralEmbedder(client=mistral_client, model_name=MODEL_1024_EMBEDDING),
stores=vector_store,
)
# Ingest documents from Azure
num_chunks = await pipeline.run(documents=[
"documents/document1.pdf",
"documents/document2.pdf",
])
print(f"Indexed {num_chunks} chunks")
asyncio.run(ingest_from_azure())
Configuration
Basic Setup
storage = AzureBlobStorage(
container_name="documents",
account_name="your-account",
)
Using Connection String
storage = AzureBlobStorage(
container_name="documents",
connection_string="DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...",
)
Using Account Key
storage = AzureBlobStorage(
container_name="documents",
account_name="your-account",
account_key="your-key",
)
Using Managed Identity
from azure.identity.aio import DefaultAzureCredential
storage = AzureBlobStorage(
container_name="documents",
account_name="your-account",
credential=DefaultAzureCredential(),
)
Local Development
For local testing, use Azurite:
docker run -p 10000:10000 mcr.microsoft.com/azure-storage/azurite azurite-blob --blobHost 0.0.0.0
Configure to use local emulator:
storage = AzureBlobStorage(
container_name="documents",
connection_string="DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=<key>;BlobEndpoint=http://127.0.0.1:10000/devstoreaccount1/;",
)
License
This plugin is licensed under the Apache License 2.0.
Support
For Search Toolkit issues, refer to the Search Toolkit documentation.
For Azure Blob Storage documentation, visit Azure Blob Storage Docs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mistralai_search_toolkit_storage_azure-0.0.8.tar.gz.
File metadata
- Download URL: mistralai_search_toolkit_storage_azure-0.0.8.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0f06e2608c95befca4914903d1ce2645c1b7fcaa1aec4acb4f2e0e8a84c23f2
|
|
| MD5 |
486435ad8aa054a07c0b210a80fbba38
|
|
| BLAKE2b-256 |
bcfc2d5084d4eaaaf434f7b74c2120bc66552b875b82cc7a19aeab2c3ee99437
|
Provenance
The following attestation bundles were made for mistralai_search_toolkit_storage_azure-0.0.8.tar.gz:
Publisher:
search-toolkit-plugins.yaml on mistralai/dashboard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mistralai_search_toolkit_storage_azure-0.0.8.tar.gz -
Subject digest:
b0f06e2608c95befca4914903d1ce2645c1b7fcaa1aec4acb4f2e0e8a84c23f2 - Sigstore transparency entry: 1602164978
- Sigstore integration time:
-
Permalink:
mistralai/dashboard@332aa6d4009c7344bf659e5f96e6ef904c672fbb -
Branch / Tag:
refs/tags/search-toolkit/v0.0.8 - Owner: https://github.com/mistralai
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
search-toolkit-plugins.yaml@332aa6d4009c7344bf659e5f96e6ef904c672fbb -
Trigger Event:
push
-
Statement type:
File details
Details for the file mistralai_search_toolkit_storage_azure-0.0.8-py3-none-any.whl.
File metadata
- Download URL: mistralai_search_toolkit_storage_azure-0.0.8-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97f7bf7ecada7ff93c8fe4fec2895df0486fdcf4ebd1ba31c2d22307ee992093
|
|
| MD5 |
cb602bfe823a394f53d47a8b4aa91865
|
|
| BLAKE2b-256 |
e3655899ee1634e97edb312587886d2e6065e759428c8694de231fe0ac5fab2f
|
Provenance
The following attestation bundles were made for mistralai_search_toolkit_storage_azure-0.0.8-py3-none-any.whl:
Publisher:
search-toolkit-plugins.yaml on mistralai/dashboard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mistralai_search_toolkit_storage_azure-0.0.8-py3-none-any.whl -
Subject digest:
97f7bf7ecada7ff93c8fe4fec2895df0486fdcf4ebd1ba31c2d22307ee992093 - Sigstore transparency entry: 1602165025
- Sigstore integration time:
-
Permalink:
mistralai/dashboard@332aa6d4009c7344bf659e5f96e6ef904c672fbb -
Branch / Tag:
refs/tags/search-toolkit/v0.0.8 - Owner: https://github.com/mistralai
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
search-toolkit-plugins.yaml@332aa6d4009c7344bf659e5f96e6ef904c672fbb -
Trigger Event:
push
-
Statement type: