Modular framework for building IR systems

These details have not been verified by PyPI

Project description

Search Toolkit

Modular, backend-agnostic framework for building and evaluating Information Retrieval systems.

Overview

Search Toolkit provides plug-and-play, extensible components for building production-ready IR pipelines. Every component is swappable and customizable — build exactly what your use case needs.

What's Included

Core Components

Ingestion: Document loaders, extractors, text splitters, enrichment, and indexing pipelines
Retrieval: Vector (semantic), keyword (BM25), and hybrid search with RRF fusion
Query Processing: LLM reformulation and custom preprocessing
Reranking: Rerank results to surface the most relevant information
Embedders: Generate embeddings for documents and queries using Mistral's embedding models
Storage: Abstract object storage interface for document persistence

Backend Agnostic

The toolkit is designed to work with different search backends through plugins. You can use it with any vector database or search engine by installing the appropriate plugin.

Installation

Install the base package:

pip install mistralai-search-toolkit

Install optional components:

# Text extraction from PDFs (requires pymupdf-pro)
pip install mistralai-search-toolkit[extractor-pymupdf]

# HTML to markdown conversion
pip install mistralai-search-toolkit[html-converter-markdownify]

# Email extraction
pip install mistralai-search-toolkit[extractor-email]

# Spreadsheet parsing
pip install mistralai-search-toolkit[extractor-spreadsheet]

# LangChain text splitting
pip install mistralai-search-toolkit[text-splitter-langchain]

Quick Start

1. Load and Process Documents

import os
from mistralai.search.toolkit.ingestion.loaders import FilesystemFileLoader
from mistralai.search.toolkit.ingestion.text_splitters import CharacterTextSplitter
from mistralai.client import Mistral

# Load documents from a directory
loader = FilesystemFileLoader()
documents = loader.load(path="/path/to/documents")

# Split into chunks
splitter = CharacterTextSplitter(chunk_size=512)
chunks = splitter.split(documents)

2. Generate Embeddings

from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING

# Create embedder (uses Mistral's API)
mistral_client = Mistral(api_key=os.environ.get("MISTRAL_API_KEY", "your-api-key"))
embedder = MistralEmbedder(client=mistral_client, model_name=MODEL_1024_EMBEDDING)

# Embed your chunks
embedded_chunks = embedder.embed(chunks)

3. Create an Index and Search

The toolkit supports multiple search backends through plugins. See the Vespa Plugin section below for a complete example.

Vespa Plugin: Creating a Search Index

Vespa is an open-source search engine that integrates seamlessly with the toolkit.

Prerequisites

The Vespa plugin: pip install mistralai-search-toolkit-plugins-vespa
Docker for local development

Getting Started with Vespa

Step 1: Bootstrap Your Vespa Application

First, create the application structure with an initial migration:

uv run mistral-vespa generate-migration --app-dir ./vespa_app initial_schema

This creates ./vespa_app/ and generates a migration file. Fill it with your schema definition:

from mistralai.search.toolkit.plugins.vespa.app.schemas.app import SearchMode
from mistralai.search.toolkit.plugins.vespa.migration import VespaMigration, create_default_schema, set_app_name

class InitialSchema(VespaMigration):
    def migrate(self) -> None:
        set_app_name("articles")
        create_default_schema(
            name="articles",
            mode=SearchMode.INDEX,
            embedding_dimensions=1024,  # Match your embedder's dimensions
            schema_version=1,
        )

Step 2: Start a Local Vespa Instance

uv run mistral-vespa local up --query-port 18080 --config-port 19171 --name vespa-dev

Step 3: Deploy Your Application

Deploy the migrations to the running Vespa instance:

uv run mistral-vespa migrate \
  --app-dir ./vespa_app \
  --config-server http://localhost:19171 \
  --query-port 18080

This generates the vespa_app module that you can import.

Step 4: Ingest and Search Documents

After deployment, use the generated vespa_app to index and search:

import os
from mistralai.search.toolkit.ingestion.pipelines import Pipeline
from mistralai.search.toolkit.ingestion.loaders import FilesystemFileLoader
from mistralai.search.toolkit.ingestion.text_splitters import CharacterTextSplitter
from mistralai.search.toolkit.embedders import MistralEmbedder, MODEL_1024_EMBEDDING
from mistralai.client import Mistral
from mistralai.search.toolkit.plugins.vespa import VespaClientConfig
from mistralai.search.toolkit.retrieval import QueryEngine
from mistralai.search.toolkit.retrieval.retrievers import VectorRetriever
from vespa_app import app  # Generated by migration deployment

# Configuration
mistral_client = Mistral(api_key=os.environ.get("MISTRAL_API_KEY", "your-api-key"))
vespa_config = VespaClientConfig(
    endpoint=os.environ.get("VESPA_ENDPOINT", "http://localhost:8080"),
)
collection_name = "articles"

# Connect to Vespa
vector_store = app.get_search_index(vespa_config, collection_name=collection_name)

# INGESTION: Index your documents
pipeline = Pipeline(
    loader=FilesystemFileLoader(),
    text_splitter=CharacterTextSplitter(chunk_size=512),
    embedder=MistralEmbedder(client=mistral_client, model_name=MODEL_1024_EMBEDDING),
    stores=vector_store,
)

num_chunks = await pipeline.run(documents=["doc1.pdf", "doc2.pdf"])

# RETRIEVAL: Search your documents
embedder = MistralEmbedder(client=mistral_client, model_name=MODEL_1024_EMBEDDING)
query_engine = QueryEngine(
    retriever=[VectorRetriever(client=vector_store, embedder=embedder)],
)

results = await query_engine.search(query="What is RAG?", top_k=5)

# Print results
for result in results.results:
    print(f"Score: {result.score}")
    print(f"Content: {result.content}\n")

Plugins

Extend the toolkit with specialized backends:

Plugin	Package	Description
Vespa Plugin	`mistralai-search-toolkit-plugins-vespa`	Vespa search backend
AWS S3 Storage	`mistralai-search-toolkit-storage-s3`	AWS S3 storage backend
Azure Blob Storage	`mistralai-search-toolkit-storage-azure`	Azure Blob Storage backend
Google Cloud Storage	`mistralai-search-toolkit-storage-gcs`	Google Cloud Storage backend

License

This package is licensed under the Apache License 2.0.

Support

For more information and examples, visit Vespa documentation.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.8

May 22, 2026

This version

0.0.6

May 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mistralai_search_toolkit-0.0.6.tar.gz (149.2 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mistralai_search_toolkit-0.0.6-py3-none-any.whl (140.1 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file mistralai_search_toolkit-0.0.6.tar.gz.

File metadata

Download URL: mistralai_search_toolkit-0.0.6.tar.gz
Upload date: May 21, 2026
Size: 149.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for mistralai_search_toolkit-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`dc1482b5879e99b912bd6819d9b8d44557e4225e8c4310da1b12620e29f22c60`
MD5	`073c9fd4f06c7781d25640c4e7b097e1`
BLAKE2b-256	`105651a46aaed8446f7d2285d12b2d1c9d921cbd724f4a3b850bd73d578486f7`

See more details on using hashes here.

File details

Details for the file mistralai_search_toolkit-0.0.6-py3-none-any.whl.

File metadata

Download URL: mistralai_search_toolkit-0.0.6-py3-none-any.whl
Upload date: May 21, 2026
Size: 140.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for mistralai_search_toolkit-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`83e830d0f369e6751e4edbc8243768c0d0c39e3bd3e09d16728ca490bbee2523`
MD5	`82775ebbf05e118ea9b887fa8ff2f167`
BLAKE2b-256	`bbcc8639be42a3648a3e1865f95edc0a0aa1b46af6a050c96b583cb1c6f4cab9`

See more details on using hashes here.

mistralai-search-toolkit 0.0.6

Navigation

Verified details

Owner

Maintainers

Unverified details

Meta

Classifiers

Project description

Search Toolkit

Overview

What's Included

Core Components

Backend Agnostic

Installation

Quick Start

1. Load and Process Documents

2. Generate Embeddings

3. Create an Index and Search

Vespa Plugin: Creating a Search Index

Prerequisites

Getting Started with Vespa

Step 1: Bootstrap Your Vespa Application

Step 2: Start a Local Vespa Instance

Step 3: Deploy Your Application

Step 4: Ingest and Search Documents

Plugins

License

Support

Project details

Verified details

Owner

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes