Skip to main content

High-performance vector database with Mistral AI embeddings support

Project description

README.md

Mistral VectorDB

A high-performance vector database optimized for Mistral AI embeddings, featuring efficient similarity search, storage, and retrieval capabilities.

Features

  • Optimized for Mistral AI: Built specifically for Mistral's embedding model
  • Efficient Vector Search: Uses FAISS with HNSW and IVF for fast similarity search
  • Advanced Storage: Compressed storage with LSM-tree inspired design
  • Rich Querying: Metadata filtering and customizable search parameters
  • Batch Processing: Efficient handling of bulk operations
  • Caching System: Smart caching for frequently accessed embeddings
  • Easy Integration: Simple API for seamless integration

Installation

pip install mistral-vectordb

Quick Start

from mistral_vectordb import VectorDatabase, MistralEmbeddings

# Initialize with your Mistral API key
embeddings = MistralEmbeddings(api_key="your-api-key")
db = VectorDatabase("db_path", dimension=embeddings.dimension)

# Add documents
text = "Sample document"
embedding = embeddings.embed(text)
doc_id = db.add_document(
    content=text,
    embedding=embedding[0],
    metadata={"category": "tech"}
)

# Search
query = "similar document"
query_embedding = embeddings.embed(query)
results = db.search(
    query_embedding=query_embedding[0],
    k=10,
    threshold=0.7,
    metadata_filters={"category": "tech"}
)

Advanced Usage

Batch Processing

# Embed multiple documents
texts = ["Document 1", "Document 2", "Document 3"]
embeddings_array = embeddings.bulk_embed(
    texts,
    batch_size=32,
    show_progress=True
)

# Add to database
for text, embedding in zip(texts, embeddings_array):
    db.add_document(
        content=text,
        embedding=embedding,
        metadata={"batch": "example"}
    )

Custom Search Parameters

results = db.search(
    query_embedding=query_embedding[0],
    k=5,                    # Number of results
    threshold=0.8,          # Minimum similarity score
    metadata_filters={      # Filter by metadata
        "category": "tech",
        "language": "en"
    }
)

API Reference

MistralEmbeddings

embeddings = MistralEmbeddings(
    api_key="your-api-key",
    model="mistral-embed",    # Embedding model to use
    cache_dir="path/to/cache", # Optional cache directory
    cache_duration=24         # Cache duration in hours
)

# Generate embeddings
embedding = embeddings.embed("text")
embeddings_array = embeddings.bulk_embed(["text1", "text2"])

VectorDatabase

db = VectorDatabase(
    path="db_path",           # Database storage path
    dimension=1024            # Embedding dimension
)

# Add document
doc_id = db.add_document(
    content="text",           # Original text
    embedding=vector,         # NumPy array
    metadata={"key": "value"} # Optional metadata
)

# Search
results = db.search(
    query_embedding=vector,   # Query vector
    k=10,                    # Number of results
    threshold=0.7,           # Similarity threshold
    metadata_filters={}      # Optional filters
)

Contributing

Contributions are welcome! Please read our Contributing Guidelines for details on how to submit pull requests, report issues, and contribute to the project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mistral_vectordb-0.1.0.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mistral_vectordb-0.1.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file mistral_vectordb-0.1.0.tar.gz.

File metadata

  • Download URL: mistral_vectordb-0.1.0.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.4

File hashes

Hashes for mistral_vectordb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e7e4fa02c95cfc0ca5798abfc90394d60a45c3b3e17bad97107fe31daa7fa642
MD5 551a403eae27df6aca00516753c0d2a7
BLAKE2b-256 356fb92424efb610ed14802311cbb9e4143e5dd6f989bcebb73e816c4293de76

See more details on using hashes here.

File details

Details for the file mistral_vectordb-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mistral_vectordb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5234baca4f32919816206693d225ae1ae9f7f98781476cc9709100f53973a816
MD5 d0b5e2e8d6243501f6de06aa627765ae
BLAKE2b-256 da6df782bd7716c42022dd30992b2ade0d2a4cd54095e57ecfc570adf14dbfb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page