Skip to main content

Azure Blob ObjectStorage plugin for mistralai-search-toolkit

Project description

Azure Blob Storage Plugin for Search Toolkit

Azure Blob Storage backend for mistralai-search-toolkit.

This plugin implements the Search Toolkit's ObjectStorage interface, enabling the ingestion pipeline to load files directly from Azure Blob Storage.

Installation

pip install mistralai-search-toolkit-storage-azure

Or as an optional dependency of the core package:

pip install mistralai-search-toolkit[storage-azure]

Quick Start: Load Files from Azure in Ingestion Pipeline

1. Upload a File to Azure Blob Storage

import asyncio
from mistralai.search.toolkit.plugins.storage.azure import AzureBlobObjectStorage

async def upload_file():
    storage = AzureBlobObjectStorage(
        container_name="documents",
        account_name="your-account",
    )

    # Upload a file
    with open("document.pdf", "rb") as f:
        data = f.read()

    await storage.put(key="documents/document.pdf", data=data)

asyncio.run(upload_file())

2. Load Files from Azure in Ingestion Pipeline

import asyncio
import os
from mistralai.search.toolkit.ingestion.loaders import FileLoader
from mistralai.search.toolkit.ingestion.pipelines import Pipeline
from mistralai.search.toolkit.ingestion.text_splitters import CharacterTextSplitter
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING
from mistralai.client import Mistral
from mistralai.search.toolkit.plugins.storage.azure import AzureBlobObjectStorage
from mistralai.search.toolkit.plugins.vespa import VespaClientConfig
from vespa_app import app

async def ingest_from_azure():
    # Create Azure storage factory
    def azure_storage_factory():
        return AzureBlobObjectStorage(
            container_name="documents",
            account_name="your-account",
        )

    # Create FileLoader backed by Azure
    file_loader = FileLoader(storage_factory=azure_storage_factory)

    # Create ingestion pipeline
    mistral_client = Mistral(api_key=os.environ.get("MISTRAL_API_KEY"))
    vespa_config = VespaClientConfig(
        endpoint=os.environ.get("VESPA_ENDPOINT", "http://localhost:8080"),
    )
    vector_store = app.get_search_index(vespa_config, collection_name="articles")

    pipeline = Pipeline(
        loader=file_loader,
        text_splitter=CharacterTextSplitter(chunk_size=512),
        embedder=MistralEmbedder(client=mistral_client, model_name=MODEL_1024_EMBEDDING),
        stores=vector_store,
    )

    # Ingest documents from Azure
    num_chunks = await pipeline.run(documents=[
        "documents/document1.pdf",
        "documents/document2.pdf",
    ])

    print(f"Indexed {num_chunks} chunks")

asyncio.run(ingest_from_azure())

Configuration

Basic Setup

storage = AzureBlobObjectStorage(
    container_name="documents",
    account_name="your-account",
)

Using Connection String

storage = AzureBlobObjectStorage(
    container_name="documents",
    connection_string="DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...",
)

Using Account Key

storage = AzureBlobObjectStorage(
    container_name="documents",
    account_name="your-account",
    account_key="your-key",
)

Using Managed Identity

from azure.identity.aio import DefaultAzureCredential

storage = AzureBlobObjectStorage(
    container_name="documents",
    account_name="your-account",
    credential=DefaultAzureCredential(),
)

Local Development

For local testing, use Azurite:

docker run -p 10000:10000 mcr.microsoft.com/azure-storage/azurite azurite-blob --blobHost 0.0.0.0

Configure to use local emulator:

storage = AzureBlobObjectStorage(
    container_name="documents",
    connection_string="DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=<key>;BlobEndpoint=http://127.0.0.1:10000/devstoreaccount1/;",
)

License

This plugin is licensed under the Apache License 2.0.

Support

For Search Toolkit issues, refer to the Search Toolkit documentation.

For Azure Blob Storage documentation, visit Azure Blob Storage Docs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mistralai_search_toolkit_storage_azure-0.0.6.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file mistralai_search_toolkit_storage_azure-0.0.6.tar.gz.

File metadata

File hashes

Hashes for mistralai_search_toolkit_storage_azure-0.0.6.tar.gz
Algorithm Hash digest
SHA256 2c978b838dd49f6882eb2e5ce2c13befedf1c32cf45ccff9f2212f45d63fb54e
MD5 735196e35478d5c74c54cfa715ce3733
BLAKE2b-256 c116096949ccd694c825ea6d934b949787ef652785a3721e2bffe849ac33ce36

See more details on using hashes here.

File details

Details for the file mistralai_search_toolkit_storage_azure-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for mistralai_search_toolkit_storage_azure-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1c4691e829da26780a6bdec422894e3d248ddb94f503dc52c94ef50362a11ccc
MD5 c1cc5102ecbfa8dbc17fee7a34024a6c
BLAKE2b-256 3d558d6aee72e5797472522a6b0df28eb56ab641d02dcf1f37b1cdc198f561da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page