Google Cloud Storage ObjectStorage plugin for mistralai-search-toolkit
Project description
Google Cloud Storage Plugin for Search Toolkit
Google Cloud Storage backend for mistralai-search-toolkit.
This plugin implements the Search Toolkit's ObjectStorage interface, enabling the ingestion pipeline to load files directly from Google Cloud Storage.
Installation
pip install mistralai-search-toolkit-storage-gcs
Or as an optional dependency of the core package:
pip install mistralai-search-toolkit[storage-gcs]
Quick Start: Load Files from GCS in Ingestion Pipeline
1. Upload a File to GCS
import asyncio
from mistralai.search.toolkit.plugins.storage.gcs import GCSBlobStorage
async def upload_file():
storage = GCSBlobStorage(
bucket_name="your-bucket",
project_id="your-project",
)
# Upload a file
with open("document.pdf", "rb") as f:
data = f.read()
await storage.put(key="documents/document.pdf", data=data)
asyncio.run(upload_file())
2. Load Files from GCS in Ingestion Pipeline
import asyncio
import os
from mistralai.search.toolkit.ingestion.loaders import FileLoader
from mistralai.search.toolkit.ingestion.pipelines import Pipeline
from mistralai.search.toolkit.ingestion.text_splitters import CharacterTextSplitter
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING
from mistralai.client import Mistral
from mistralai.search.toolkit.plugins.storage.gcs import GCSBlobStorage
from mistralai.search.toolkit.plugins.vespa import VespaClientConfig
from vespa_app import app
async def ingest_from_gcs():
# Create GCS storage factory
def gcs_storage_factory():
return GCSBlobStorage(
bucket_name="your-bucket",
project_id="your-project",
)
# Create FileLoader backed by GCS
file_loader = FileLoader(storage_factory=gcs_storage_factory)
# Create ingestion pipeline
mistral_client = Mistral(api_key=os.environ.get("MISTRAL_API_KEY"))
vespa_config = VespaClientConfig(
endpoint=os.environ.get("VESPA_ENDPOINT", "http://localhost:8080"),
)
vector_store = app.get_search_index(vespa_config, collection_name="articles")
pipeline = Pipeline(
loader=file_loader,
text_splitter=CharacterTextSplitter(chunk_size=512),
embedder=MistralEmbedder(client=mistral_client, model_name=MODEL_1024_EMBEDDING),
stores=vector_store,
)
# Ingest documents from GCS
num_chunks = await pipeline.run(documents=[
"documents/document1.pdf",
"documents/document2.pdf",
])
print(f"Indexed {num_chunks} chunks")
asyncio.run(ingest_from_gcs())
Configuration
Basic Setup
storage = GCSBlobStorage(
bucket_name="your-bucket",
project_id="your-project",
)
Using Service Account
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(
"/path/to/service-account-key.json"
)
storage = GCSBlobStorage(
bucket_name="your-bucket",
project_id="your-project",
credentials=credentials,
)
Authentication
Environment Variables
Set credentials using environment variables:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
Or authenticate with gcloud CLI:
gcloud auth application-default login
The plugin will automatically use credentials from:
GOOGLE_APPLICATION_CREDENTIALSenvironment variable- Application Default Credentials (if running in GCP)
License
This plugin is licensed under the Apache License 2.0.
Support
For Search Toolkit issues, refer to the Search Toolkit documentation.
For Google Cloud Storage documentation, visit GCS Docs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mistralai_search_toolkit_storage_gcs-0.0.8.tar.gz.
File metadata
- Download URL: mistralai_search_toolkit_storage_gcs-0.0.8.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4134cb1c44cb56933fed9d00ae5adcf9ce7afc4aa32738f837014967086db03a
|
|
| MD5 |
7fcfd0c096c25ce07cbb62795a2452f2
|
|
| BLAKE2b-256 |
7fa48f21fd382b40cef09353f53cb8514b4a1e2aaae4f971f582c929f594b61e
|
Provenance
The following attestation bundles were made for mistralai_search_toolkit_storage_gcs-0.0.8.tar.gz:
Publisher:
search-toolkit-plugins.yaml on mistralai/dashboard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mistralai_search_toolkit_storage_gcs-0.0.8.tar.gz -
Subject digest:
4134cb1c44cb56933fed9d00ae5adcf9ce7afc4aa32738f837014967086db03a - Sigstore transparency entry: 1602164448
- Sigstore integration time:
-
Permalink:
mistralai/dashboard@332aa6d4009c7344bf659e5f96e6ef904c672fbb -
Branch / Tag:
refs/tags/search-toolkit/v0.0.8 - Owner: https://github.com/mistralai
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
search-toolkit-plugins.yaml@332aa6d4009c7344bf659e5f96e6ef904c672fbb -
Trigger Event:
push
-
Statement type:
File details
Details for the file mistralai_search_toolkit_storage_gcs-0.0.8-py3-none-any.whl.
File metadata
- Download URL: mistralai_search_toolkit_storage_gcs-0.0.8-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a90f46ca3ceca85202c572a9f9db298fec6377dbeb5afac129ea12751848e47
|
|
| MD5 |
b794bd98d234974b23a1dcd824fa8e11
|
|
| BLAKE2b-256 |
7906b88a678f7e4d1584f2d69a538f4639c513866dc074edcaf00a7205c3cb62
|
Provenance
The following attestation bundles were made for mistralai_search_toolkit_storage_gcs-0.0.8-py3-none-any.whl:
Publisher:
search-toolkit-plugins.yaml on mistralai/dashboard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mistralai_search_toolkit_storage_gcs-0.0.8-py3-none-any.whl -
Subject digest:
0a90f46ca3ceca85202c572a9f9db298fec6377dbeb5afac129ea12751848e47 - Sigstore transparency entry: 1602164456
- Sigstore integration time:
-
Permalink:
mistralai/dashboard@332aa6d4009c7344bf659e5f96e6ef904c672fbb -
Branch / Tag:
refs/tags/search-toolkit/v0.0.8 - Owner: https://github.com/mistralai
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
search-toolkit-plugins.yaml@332aa6d4009c7344bf659e5f96e6ef904c672fbb -
Trigger Event:
push
-
Statement type: