LangChain vector store implementation for Azure Data Explorer (Kusto) - PRE-ALPHA VERSION
Project description
LangChain Kusto Vector Store
⚠️ PRE-ALPHA VERSION - This package is in very early development and not recommended for production use.
A LangChain vector store implementation for Azure Data Explorer (Kusto), Microsoft Fabric Eventhouse, and other Kusto-compatible databases.
Current Status
This is a very initial version with only retrieval capabilities. Document storage functionality is not yet implemented.
Features
- ✅ Retrieve vector embeddings from Azure Data Explorer (Kusto) or Microsoft Fabric Eventhouse
- ✅ Compatible with LangChain's vector store interface
- ✅ Similarity search with cosine similarity metric
- ❌ Document storage (not yet implemented)
- ❌ Batch operations (not yet implemented)
Installation
pip install langchain-kusto
Quick Start
from langchain_kusto import KustoVectorStore
from langchain_openai import AzureOpenAIEmbeddings
from azure.identity import DefaultAzureCredential
# Initialize embeddings
embeddings = AzureOpenAIEmbeddings(
azure_endpoint="your-openai-endpoint",
azure_deployment="your-embedding-deployment",
openai_api_version="2023-05-15"
)
# Initialize the vector store (retrieval only)
vector_store = KustoVectorStore(
connection="https://your-cluster.kusto.windows.net", # or KustoConnectionStringBuilder
database="your_database",
collection_name="your_table",
embedding=embeddings,
embedding_column="embedding_text", # optional, defaults to "embedding"
id_column="vector_id", # optional, defaults to "id"
content_column="doc_text" # optional, defaults to "text"
)
# Search for similar documents (this requires pre-existing data in Kusto)
results = vector_store.similarity_search("your query text", k=5)
Complete Example
See demo.py for a complete working example using Azure OpenAI embeddings and a RAG (Retrieval-Augmented Generation) pipeline.
Requirements
- Python >= 3.8
- Azure Data Explorer cluster or Microsoft Fabric Eventhouse with pre-existing vector data
- LangChain Core >= 0.1.0
- Azure authentication (DefaultAzureCredential)
Data Prerequisites
Since this version only supports retrieval, you need to have your vector embeddings already stored in a Kusto table with the following structure:
.create table your_table (
vector_id: string,
doc_text: string,
embedding_text: dynamic // Array of float values representing the vector
// ... other metadata columns
)
Development Status
This package is currently in pre-alpha development. APIs may change significantly between versions. The current version only supports reading existing vector data from Kusto - document ingestion and storage capabilities will be added in future releases.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_kusto-0.0.1.tar.gz.
File metadata
- Download URL: langchain_kusto-0.0.1.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf12c8775d8cd8ea63512c1db3142e71b64f142f8fbb35827d17a7eef9a99505
|
|
| MD5 |
0ce7fc2f84fff0a9216f59c12c394721
|
|
| BLAKE2b-256 |
eeb375b4bf4e15510a1a8392f9a3d192b621cfe2c508a74646994f2462ef69ea
|
File details
Details for the file langchain_kusto-0.0.1-py3-none-any.whl.
File metadata
- Download URL: langchain_kusto-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b655af8012a53f1e51ca9a0a6713b90cec3639e6b0e79e51bc9ee8935f9b4796
|
|
| MD5 |
0bdb0cf076bc87227cdfff71791cb859
|
|
| BLAKE2b-256 |
8c73276621be9504da933f331d09cb48e1041c9dc9adb5ec61a8d2acd42e0dd6
|