Vespa integration for mistralai-search-toolkit
Project description
Vespa Plugin for Search Toolkit
Vespa integration plugin for mistralai-search-toolkit.
This plugin provides a production-ready Vespa search backend implementation for the Search Toolkit, enabling powerful vector, keyword, and hybrid search capabilities.
Installation
pip install mistralai-search-toolkit-plugins-vespa
Or as an optional dependency of the core package:
pip install mistralai-search-toolkit[vespa]
Quick Start
1. Bootstrap Your Application
Create the application structure with an initial migration:
uv run mistral-vespa generate-migration --app-dir ./vespa_app initial_schema
This creates the ./vespa_app/ directory and generates a migration file. Fill it with your schema definition:
from mistralai.search.toolkit.plugins.vespa.app.schemas.app import SearchMode
from mistralai.search.toolkit.plugins.vespa.migration import VespaMigration, create_default_schema, set_app_name
class InitialSchema(VespaMigration):
def migrate(self) -> None:
set_app_name("articles")
create_default_schema(
name="articles",
mode=SearchMode.INDEX,
embedding_dimensions=1024, # Adjust based on your embedder
schema_version=1,
)
2. Start a Local Vespa Instance
uv run mistral-vespa local up --query-port 18080 --config-port 19171 --name vespa-dev
3. Deploy Your Application
Deploy the migrations to generate the vespa_app module:
uv run mistral-vespa migrate \
--app-dir ./vespa_app \
--config-server http://localhost:19171 \
--query-port 18080
This generates the vespa_app Python module that you can now import.
4. Index Documents
import os
from mistralai.search.toolkit.ingestion.pipelines import Pipeline
from mistralai.search.toolkit.ingestion.loaders import FilesystemFileLoader
from mistralai.search.toolkit.ingestion.text_splitters import CharacterTextSplitter
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING
from mistralai.client import Mistral
from mistralai.search.toolkit.plugins.vespa import VespaClientConfig
from vespa_app import app
# Setup
mistral_client = Mistral(api_key=os.environ.get("MISTRAL_API_KEY"))
vespa_config = VespaClientConfig(
endpoint=os.environ.get("VESPA_ENDPOINT", "http://localhost:8080"),
)
collection_name = "articles"
# Connect to Vespa
vector_store = app.get_search_index(vespa_config, collection_name=collection_name)
# Index documents
pipeline = Pipeline(
loader=FilesystemFileLoader(),
text_splitter=CharacterTextSplitter(chunk_size=512),
embedder=MistralEmbedder(client=mistral_client, model_name=MODEL_1024_EMBEDDING),
stores=vector_store,
)
num_chunks = await pipeline.run(documents=["doc1.pdf", "doc2.pdf"])
4. Search
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING
from mistralai.search.toolkit.retrieval import QueryEngine
from mistralai.search.toolkit.retrieval.retrievers import VectorRetriever
# Setup search
embedder = MistralEmbedder(client=mistral_client, model_name=MODEL_1024_EMBEDDING)
query_engine = QueryEngine(
retriever=[VectorRetriever(client=vector_store, embedder=embedder)],
)
# Search documents
results = await query_engine.search(query="What is machine learning?", top_k=10)
# Display results
for result in results.results:
print(f"Score: {result.score}")
print(f"Content: {result.content}\n")
Configuration
Quick Setup
Use app.get_search_index() for the common case where a single endpoint serves both query and feed APIs:
import os
from mistralai.search.toolkit.plugins.vespa import VespaClientConfig
from vespa_app import app
vespa_config = VespaClientConfig(
endpoint=os.environ.get("VESPA_ENDPOINT", "http://localhost:8080"),
)
vector_store = app.get_search_index(vespa_config, collection_name="articles")
Advanced Setup
Use separate query and feed endpoints for production deployments:
from mistralai.search.toolkit.plugins.vespa import VespaClientConfig
from vespa_app import app
client_config = VespaClientConfig(
query_endpoint=os.environ.get("VESPA_QUERY_ENDPOINT", "https://query.vespa.example.com"),
feed_endpoint=os.environ.get("VESPA_FEED_ENDPOINT", "https://feed.vespa.example.com"),
)
vector_store = app.get_search_index(
client_config=client_config,
collection_name="articles",
)
License
This plugin is licensed under the Apache License 2.0.
Support
For issues related to the Search Toolkit, refer to the Search Toolkit documentation.
For Vespa-specific questions, visit Vespa documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mistralai_search_toolkit_plugins_vespa-0.0.6.tar.gz.
File metadata
- Download URL: mistralai_search_toolkit_plugins_vespa-0.0.6.tar.gz
- Upload date:
- Size: 128.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fdea617d7e642e647d518c52991d8a8b16009030e3c8e1662c26f675c50b8cc
|
|
| MD5 |
fa69cc0146e16ab6adf8b49c7a14200a
|
|
| BLAKE2b-256 |
3256f34874550d1340da4fd804e94b67ca2612fa668d071c67c3845d881e88d7
|
File details
Details for the file mistralai_search_toolkit_plugins_vespa-0.0.6-py3-none-any.whl.
File metadata
- Download URL: mistralai_search_toolkit_plugins_vespa-0.0.6-py3-none-any.whl
- Upload date:
- Size: 86.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebc003c7e4f39dc4f0b21cb430a490af0e722daf565d63bff737a0d21d48db22
|
|
| MD5 |
ab11e7fcba6b42b9d5484a8353449260
|
|
| BLAKE2b-256 |
c706a8d8cc3684b7e1b49368ad902f9d306dddc0fde4ec147e1d6c234f230ce1
|