LangChain VectorStore integration for VAST Database
Project description
langchain-vastdb
LangChain VectorStore integration for VAST Database.
langchain-vastdb provides a VastDBVectorStore class that implements the
LangChain VectorStore interface, enabling similarity search, document storage,
and retrieval-augmented generation (RAG) workflows backed by VAST Database's
native vector indexing.
Compatibility: Python 3.10 - 3.13 | langchain-core >= 1.0, < 2 | vastdb >= 2.0.3
Status: Alpha (v0.0.1). API may change between minor releases.
License: Apache-2.0
Requirements
- Python 3.10+
- A running VAST Database cluster with vector index support
vastdbSDK >= 2.0.3langchain-core>= 1.0, < 2- An
Embeddingsmodel (e.g., OpenAI, HuggingFace, or any LangChain-compatible embeddings)
Installation
pip install langchain-vastdb
Or with uv:
uv add langchain-vastdb
Quickstart
Option 1: Pass a pre-built session
import vastdb
from langchain_vastdb import VastDBVectorStore
session = vastdb.connect(
endpoint="http://vast-cluster:8070",
access="YOUR_ACCESS_KEY",
secret="YOUR_SECRET_KEY",
)
store = VastDBVectorStore(
embedding=my_embeddings,
session=session,
bucket="my-bucket",
schema="my-schema",
table_name="my-table",
)
# Add documents and search
ids = store.add_texts(["Paris is the capital of France."])
results = store.similarity_search("capital city", k=1)
print(results[0].page_content)
Option 2: Use the convenience factory
from langchain_vastdb import VastDBVectorStore
store = VastDBVectorStore.from_connection_params(
embedding=my_embeddings,
endpoint="http://vast-cluster:8070",
access_key="YOUR_ACCESS_KEY",
secret_key="YOUR_SECRET_KEY",
bucket="my-bucket",
schema="my-schema",
table_name="my-table",
)
Credentials are passed directly to vastdb.connect() and are not stored on
the instance.
Option 3: Create a store and add texts in one call
import vastdb
from langchain_vastdb import VastDBVectorStore
session = vastdb.connect(
endpoint="http://vast-cluster:8070",
access="YOUR_ACCESS_KEY",
secret="YOUR_SECRET_KEY",
)
store = VastDBVectorStore.from_texts(
texts=["Paris is the capital of France.", "Berlin is the capital of Germany."],
embedding=my_embeddings,
session=session,
bucket="my-bucket",
schema="my-schema",
table_name="my-table",
)
CRUD Operations
# Add documents with metadata
ids = store.add_texts(
["Some text", "More text"],
metadatas=[{"source": "wiki"}, {"source": "blog"}],
)
# Similarity search by text query
docs = store.similarity_search("capital city", k=2)
# Similarity search with distance scores
scored = store.similarity_search_with_score("capital city", k=2)
for doc, score in scored:
print(f"{doc.page_content} (distance: {score})")
# Search with a pre-computed vector
docs = store.similarity_search_by_vector([0.1, 0.2, ...], k=2)
# Retrieve documents by ID
docs = store.get_by_ids(ids)
# Delete by ID
store.delete(ids=ids)
Using as a retriever
VastDBVectorStore integrates directly with LangChain's retriever interface:
retriever = store.as_retriever(search_kwargs={"k": 3})
docs = retriever.invoke("What is the capital of France?")
This works seamlessly in LCEL RAG chains:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
retriever = store.as_retriever(search_kwargs={"k": 3})
prompt = ChatPromptTemplate.from_template(
"Answer based on context:\n{context}\n\nQuestion: {question}"
)
def format_docs(docs):
return "\n".join(d.page_content for d in docs)
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm # any LangChain-compatible LLM
| StrOutputParser()
)
answer = chain.invoke("What is the capital of France?")
Cache management
VastDBVectorStore caches table metadata after the first access to avoid
repeated bucket/schema/table round trips. If you alter the table structure
externally, invalidate the cache:
store.invalidate_table_cache()
Configuration Reference
Constructor: VastDBVectorStore(...)
| Parameter | Type | Default | Description |
|---|---|---|---|
embedding |
Embeddings |
required | The embeddings model used to generate vectors. |
session |
vastdb.Session |
required | A pre-built session connected to the VAST cluster. |
bucket |
str |
required | The VAST bucket name containing the target table. |
schema |
str |
required | The schema name within the bucket. |
table_name |
str |
required | The table name for vector operations. |
id_column |
str |
"id" |
Column name for document IDs. |
text_column |
str |
"text" |
Column name for document text. |
vector_column |
str |
"vector" |
Column name for embedding vectors. |
metadata_column |
str |
"metadata" |
Column name for document metadata (stored as JSON). |
adbc_driver_path |
str | None |
None |
Path to libadbc_driver_vastdb.so. Enables native ADBC vector search via array_distance() SQL. |
adbc_endpoint |
str | None |
None |
ADBC/QueryEngine endpoint (hostname or IP). Separate from the HTTP REST endpoint. |
access_key |
str | None |
None |
Access key for ADBC connection. |
secret_key |
str | None |
None |
Secret key for ADBC connection. |
Custom column names
Column names default to id, text, vector, and metadata. Override them at
construction time:
store = VastDBVectorStore(
embedding=my_embeddings,
session=session,
bucket="my-bucket",
schema="my-schema",
table_name="my-table",
id_column="doc_id",
text_column="content",
vector_column="emb",
metadata_column="meta",
)
Factory classmethod: from_connection_params(...)
Creates a VastDBVectorStore by building a vastdb.Session internally from
connection parameters.
| Parameter | Type | Default | Description |
|---|---|---|---|
embedding |
Embeddings |
required | The embeddings model. |
endpoint |
str |
required | The VAST cluster HTTP endpoint URL. |
access_key |
str |
required | Access key for authentication. |
secret_key |
str |
required | Secret key for authentication. |
bucket |
str |
required | The VAST bucket name. |
schema |
str |
required | The schema name within the bucket. |
table_name |
str |
required | The table name for vector operations. |
adbc_driver_path |
str | None |
None |
Path to ADBC driver shared library. |
adbc_endpoint |
str | None |
None |
ADBC/QueryEngine endpoint. |
**kwargs |
Additional keyword arguments forwarded to the constructor (e.g., custom column names). |
ADBC vector search
When adbc_driver_path and adbc_endpoint are both provided, the store uses
native ADBC SQL with array_distance() for server-side vector search. This does
not require a vector index on the table. If ADBC is unavailable or fails, the
store falls back to an in-memory L2Sq distance scan.
store = VastDBVectorStore(
embedding=my_embeddings,
session=session,
bucket="my-bucket",
schema="my-schema",
table_name="my-table",
adbc_driver_path="/usr/lib/libadbc_driver_vastdb.so",
adbc_endpoint="query-engine.example.com",
access_key="YOUR_ACCESS_KEY",
secret_key="YOUR_SECRET_KEY",
)
Subclassing Guide
VastDBVectorStore uses the Template Method pattern. Public methods like
add_texts and similarity_search handle embedding, filter conversion, and
result formatting, then delegate storage operations to five protected hook
methods. Override these hooks to customize behavior without reimplementing the
full LangChain interface.
Hook methods
| Hook | Purpose | Returns |
|---|---|---|
_insert_vectors |
Customize record insertion | list[str] (IDs) |
_build_metadata_columns |
Customize column layout for metadata | dict[str, list] |
_select_columns |
Customize columns retrieved during search | list[str] |
_vector_search |
Customize similarity search | list[tuple[dict, float]] |
_delete_by_ids |
Customize document deletion | bool |
_get_by_ids |
Customize document retrieval | list[dict] |
_row_to_document |
Customize row-to-Document conversion | Document |
Hook signatures
def _insert_vectors(
self,
texts: list[str],
embeddings: list[list[float]],
metadatas: list[dict],
ids: list[str],
*,
tx: Transaction | None = None,
) -> list[str]: ...
def _vector_search(
self,
query_vector: list[float],
k: int,
predicate: ibis.Expr | None = None,
*,
filter_dict: dict | None = None,
tx: Transaction | None = None,
) -> list[tuple[dict, float]]: ...
def _delete_by_ids(
self,
ids: list[str],
*,
tx: Transaction | None = None,
) -> bool: ...
def _get_by_ids(
self,
ids: list[str],
*,
tx: Transaction | None = None,
) -> list[dict]: ...
def _row_to_document(
self,
row: dict,
score: float | None = None,
) -> Document: ...
Transaction reuse
Each hook opens and closes its own transaction by default. The optional tx
parameter lets subclasses pass in an existing transaction for multi-step atomic
operations:
with self._session.transaction() as tx:
self._insert_vectors(texts, embeddings, metadatas, ids, tx=tx)
# additional operations in the same transaction
Example: typed metadata columns
The base class stores metadata as a single JSON string column. If you need typed
columns for performance-critical filtering, set _typed_metadata_columns:
from langchain_vastdb import TypedColumn, VastDBVectorStore
class TypedMetadataStore(VastDBVectorStore):
"""Store with typed 'category' and 'priority' metadata columns."""
_typed_metadata_columns = {
"category": TypedColumn(),
"priority": TypedColumn(),
}
This automatically extracts category and priority into separate typed columns
on insert, preserves any extra metadata in the JSON column, and merges everything
back together on read. The public LangChain interface (add_texts,
similarity_search, etc.) stays unchanged.
Use TypedColumn fields for custom defaults, PyArrow type coercion, or
controlling which columns are backfilled on read
(see the Migration Guide for details).
Examples
See the examples/ directory for runnable scripts:
basic_usage.py-- add texts, search, retrieverag_pipeline.py--as_retriever()+ LCEL RAG chainsubclassing.py-- declarative typed metadata columnsfiltered_search.py-- metadata filtering patterns
Migration Guide
Migrating an existing VectorStore subclass to VastDBVectorStore? See the
Migration Guide for step-by-step instructions,
a hook mapping table, and a before/after code comparison.
Development
Clone the repository and install dependencies with uv:
uv sync
Run the linter:
uv run ruff check .
Run unit tests:
uv run pytest tests/unit_tests/
Run integration tests (requires a VAST cluster):
uv run pytest tests/integration_tests/
License
Apache-2.0 -- see LICENSE for details. test sync
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_vastdb-0.0.4.tar.gz.
File metadata
- Download URL: langchain_vastdb-0.0.4.tar.gz
- Upload date:
- Size: 148.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
993ac2d93b77fbb65ff8bb1e80c42ec2006e8a87b1a14a73756713f5a01fd95e
|
|
| MD5 |
0ba05b6d25cf277d34b9ff57a405fa27
|
|
| BLAKE2b-256 |
5e6cfcf2012b0ede514d70cc46acf3bf7fb34f5be746d2a5f0114728f6015ec5
|
File details
Details for the file langchain_vastdb-0.0.4-py3-none-any.whl.
File metadata
- Download URL: langchain_vastdb-0.0.4-py3-none-any.whl
- Upload date:
- Size: 25.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c72d283d01cb560e2a0563f743a160b1cd18c7c73fd780110b7e08ea71189bed
|
|
| MD5 |
5c149f1918f5e112e489df0585c16546
|
|
| BLAKE2b-256 |
3a639d16cc934f824accba6964fb9277b11c6230bca07e7c91443d4b19f356d8
|