High-performance vector database with Mistral AI embeddings support
Project description
README.md
Mistral VectorDB
A high-performance vector database optimized for Mistral AI embeddings, featuring efficient similarity search, storage, and retrieval capabilities.
Features
- Optimized for Mistral AI: Built specifically for Mistral's embedding model
- Efficient Vector Search: Uses FAISS with HNSW and IVF for fast similarity search
- Advanced Storage: Compressed storage with LSM-tree inspired design
- Rich Querying: Metadata filtering and customizable search parameters
- Batch Processing: Efficient handling of bulk operations
- Caching System: Smart caching for frequently accessed embeddings
- Easy Integration: Simple API for seamless integration
Installation
pip install mistral-vectordb
Quick Start
from mistral_vectordb import VectorDatabase, MistralEmbeddings
# Initialize with your Mistral API key
embeddings = MistralEmbeddings(api_key="your-api-key")
db = VectorDatabase("db_path", dimension=embeddings.dimension)
# Add documents
text = "Sample document"
embedding = embeddings.embed(text)
doc_id = db.add_document(
content=text,
embedding=embedding[0],
metadata={"category": "tech"}
)
# Search
query = "similar document"
query_embedding = embeddings.embed(query)
results = db.search(
query_embedding=query_embedding[0],
k=10,
threshold=0.7,
metadata_filters={"category": "tech"}
)
Advanced Usage
Batch Processing
# Embed multiple documents
texts = ["Document 1", "Document 2", "Document 3"]
embeddings_array = embeddings.bulk_embed(
texts,
batch_size=32,
show_progress=True
)
# Add to database
for text, embedding in zip(texts, embeddings_array):
db.add_document(
content=text,
embedding=embedding,
metadata={"batch": "example"}
)
Custom Search Parameters
results = db.search(
query_embedding=query_embedding[0],
k=5, # Number of results
threshold=0.8, # Minimum similarity score
metadata_filters={ # Filter by metadata
"category": "tech",
"language": "en"
}
)
API Reference
MistralEmbeddings
embeddings = MistralEmbeddings(
api_key="your-api-key",
model="mistral-embed", # Embedding model to use
cache_dir="path/to/cache", # Optional cache directory
cache_duration=24 # Cache duration in hours
)
# Generate embeddings
embedding = embeddings.embed("text")
embeddings_array = embeddings.bulk_embed(["text1", "text2"])
VectorDatabase
db = VectorDatabase(
path="db_path", # Database storage path
dimension=1024 # Embedding dimension
)
# Add document
doc_id = db.add_document(
content="text", # Original text
embedding=vector, # NumPy array
metadata={"key": "value"} # Optional metadata
)
# Search
results = db.search(
query_embedding=vector, # Query vector
k=10, # Number of results
threshold=0.7, # Similarity threshold
metadata_filters={} # Optional filters
)
Contributing
Contributions are welcome! Please read our Contributing Guidelines for details on how to submit pull requests, report issues, and contribute to the project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mistral_vectordb-0.1.0.tar.gz.
File metadata
- Download URL: mistral_vectordb-0.1.0.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7e4fa02c95cfc0ca5798abfc90394d60a45c3b3e17bad97107fe31daa7fa642
|
|
| MD5 |
551a403eae27df6aca00516753c0d2a7
|
|
| BLAKE2b-256 |
356fb92424efb610ed14802311cbb9e4143e5dd6f989bcebb73e816c4293de76
|
File details
Details for the file mistral_vectordb-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mistral_vectordb-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5234baca4f32919816206693d225ae1ae9f7f98781476cc9709100f53973a816
|
|
| MD5 |
d0b5e2e8d6243501f6de06aa627765ae
|
|
| BLAKE2b-256 |
da6df782bd7716c42022dd30992b2ade0d2a4cd54095e57ecfc570adf14dbfb8
|