Skip to main content

Azure Blob ObjectStorage plugin for mistralai-search-toolkit

Project description

Azure Blob Storage Plugin for Search Toolkit

Azure Blob Storage backend for mistralai-search-toolkit.

This plugin implements the Search Toolkit's ObjectStorage interface, enabling the ingestion pipeline to load files directly from Azure Blob Storage.

Installation

pip install mistralai-search-toolkit-storage-azure

Or as an optional dependency of the core package:

pip install mistralai-search-toolkit[storage-azure]

Quick Start: Load Files from Azure in Ingestion Pipeline

1. Upload a File to Azure Blob Storage

import asyncio
from mistralai.search.toolkit.plugins.storage.azure import AzureBlobStorage

async def upload_file():
    storage = AzureBlobStorage(
        container_name="documents",
        account_name="your-account",
    )

    # Upload a file
    with open("document.pdf", "rb") as f:
        data = f.read()

    await storage.put(key="documents/document.pdf", data=data)

asyncio.run(upload_file())

2. Load Files from Azure in Ingestion Pipeline

import asyncio
import os
from mistralai.search.toolkit.ingestion.loaders import FileLoader
from mistralai.search.toolkit.ingestion.pipelines import Pipeline
from mistralai.search.toolkit.ingestion.text_splitters import CharacterTextSplitter
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING
from mistralai.client import Mistral
from mistralai.search.toolkit.plugins.storage.azure import AzureBlobStorage
from mistralai.search.toolkit.plugins.vespa import VespaClientConfig
from vespa_app import app

async def ingest_from_azure():
    # Create Azure storage factory
    def azure_storage_factory():
        return AzureBlobStorage(
            container_name="documents",
            account_name="your-account",
        )

    # Create FileLoader backed by Azure
    file_loader = FileLoader(storage_factory=azure_storage_factory)

    # Create ingestion pipeline
    mistral_client = Mistral(api_key=os.environ.get("MISTRAL_API_KEY"))
    vespa_config = VespaClientConfig(
        endpoint=os.environ.get("VESPA_ENDPOINT", "http://localhost:8080"),
    )
    vector_store = app.get_search_index(vespa_config, collection_name="articles")

    pipeline = Pipeline(
        loader=file_loader,
        text_splitter=CharacterTextSplitter(chunk_size=512),
        embedder=MistralEmbedder(client=mistral_client, model_name=MODEL_1024_EMBEDDING),
        stores=vector_store,
    )

    # Ingest documents from Azure
    num_chunks = await pipeline.run(documents=[
        "documents/document1.pdf",
        "documents/document2.pdf",
    ])

    print(f"Indexed {num_chunks} chunks")

asyncio.run(ingest_from_azure())

Configuration

Basic Setup

storage = AzureBlobStorage(
    container_name="documents",
    account_name="your-account",
)

Using Connection String

storage = AzureBlobStorage(
    container_name="documents",
    connection_string="DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...",
)

Using Account Key

storage = AzureBlobStorage(
    container_name="documents",
    account_name="your-account",
    account_key="your-key",
)

Using Managed Identity

from azure.identity.aio import DefaultAzureCredential

storage = AzureBlobStorage(
    container_name="documents",
    account_name="your-account",
    credential=DefaultAzureCredential(),
)

Local Development

For local testing, use Azurite:

docker run -p 10000:10000 mcr.microsoft.com/azure-storage/azurite azurite-blob --blobHost 0.0.0.0

Configure to use local emulator:

storage = AzureBlobStorage(
    container_name="documents",
    connection_string="DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=<key>;BlobEndpoint=http://127.0.0.1:10000/devstoreaccount1/;",
)

License

This plugin is licensed under the Apache License 2.0.

Support

For Search Toolkit issues, refer to the Search Toolkit documentation.

For Azure Blob Storage documentation, visit Azure Blob Storage Docs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mistralai_search_toolkit_storage_azure-0.0.8.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file mistralai_search_toolkit_storage_azure-0.0.8.tar.gz.

File metadata

File hashes

Hashes for mistralai_search_toolkit_storage_azure-0.0.8.tar.gz
Algorithm Hash digest
SHA256 b0f06e2608c95befca4914903d1ce2645c1b7fcaa1aec4acb4f2e0e8a84c23f2
MD5 486435ad8aa054a07c0b210a80fbba38
BLAKE2b-256 bcfc2d5084d4eaaaf434f7b74c2120bc66552b875b82cc7a19aeab2c3ee99437

See more details on using hashes here.

Provenance

The following attestation bundles were made for mistralai_search_toolkit_storage_azure-0.0.8.tar.gz:

Publisher: search-toolkit-plugins.yaml on mistralai/dashboard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mistralai_search_toolkit_storage_azure-0.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for mistralai_search_toolkit_storage_azure-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 97f7bf7ecada7ff93c8fe4fec2895df0486fdcf4ebd1ba31c2d22307ee992093
MD5 cb602bfe823a394f53d47a8b4aa91865
BLAKE2b-256 e3655899ee1634e97edb312587886d2e6065e759428c8694de231fe0ac5fab2f

See more details on using hashes here.

Provenance

The following attestation bundles were made for mistralai_search_toolkit_storage_azure-0.0.8-py3-none-any.whl:

Publisher: search-toolkit-plugins.yaml on mistralai/dashboard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page