LangChain VectorStore integration for VAST Database

These details have not been verified by PyPI

Project description

langchain-vastdb

LangChain VectorStore integration for VAST Database.

langchain-vastdb provides a VastDBVectorStore class that implements the LangChain VectorStore interface, enabling similarity search, document storage, and retrieval-augmented generation (RAG) workflows backed by VAST Database's native vector indexing.

Compatibility: Python 3.10 - 3.13 | langchain-core >= 1.0, < 2 | vastdb >= 2.0.3

Status: Alpha (v0.0.1). API may change between minor releases.

License: Apache-2.0

Requirements

Python 3.10+
A running VAST Database cluster with vector index support
vastdb SDK >= 2.0.3
langchain-core >= 1.0, < 2
An Embeddings model (e.g., OpenAI, HuggingFace, or any LangChain-compatible embeddings)

Installation

pip install langchain-vastdb

Or with uv:

uv add langchain-vastdb

Quickstart

Option 1: Pass a pre-built session

import vastdb
from langchain_vastdb import VastDBVectorStore

session = vastdb.connect(
    endpoint="http://vast-cluster:8070",
    access="YOUR_ACCESS_KEY",
    secret="YOUR_SECRET_KEY",
)

store = VastDBVectorStore(
    embedding=my_embeddings,
    session=session,
    bucket="my-bucket",
    schema="my-schema",
    table_name="my-table",
)

# Add documents and search
ids = store.add_texts(["Paris is the capital of France."])
results = store.similarity_search("capital city", k=1)
print(results[0].page_content)

Option 2: Use the convenience factory

from langchain_vastdb import VastDBVectorStore

store = VastDBVectorStore.from_connection_params(
    embedding=my_embeddings,
    endpoint="http://vast-cluster:8070",
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY",
    bucket="my-bucket",
    schema="my-schema",
    table_name="my-table",
)

Credentials are passed directly to vastdb.connect() and are not stored on the instance.

Option 3: Create a store and add texts in one call

import vastdb
from langchain_vastdb import VastDBVectorStore

session = vastdb.connect(
    endpoint="http://vast-cluster:8070",
    access="YOUR_ACCESS_KEY",
    secret="YOUR_SECRET_KEY",
)

store = VastDBVectorStore.from_texts(
    texts=["Paris is the capital of France.", "Berlin is the capital of Germany."],
    embedding=my_embeddings,
    session=session,
    bucket="my-bucket",
    schema="my-schema",
    table_name="my-table",
)

CRUD Operations

# Add documents with metadata
ids = store.add_texts(
    ["Some text", "More text"],
    metadatas=[{"source": "wiki"}, {"source": "blog"}],
)

# Similarity search by text query
docs = store.similarity_search("capital city", k=2)

# Similarity search with distance scores
scored = store.similarity_search_with_score("capital city", k=2)
for doc, score in scored:
    print(f"{doc.page_content} (distance: {score})")

# Search with a pre-computed vector
docs = store.similarity_search_by_vector([0.1, 0.2, ...], k=2)

# Retrieve documents by ID
docs = store.get_by_ids(ids)

# Delete by ID
store.delete(ids=ids)

Using as a retriever

VastDBVectorStore integrates directly with LangChain's retriever interface:

retriever = store.as_retriever(search_kwargs={"k": 3})
docs = retriever.invoke("What is the capital of France?")

This works seamlessly in LCEL RAG chains:

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

retriever = store.as_retriever(search_kwargs={"k": 3})
prompt = ChatPromptTemplate.from_template(
    "Answer based on context:\n{context}\n\nQuestion: {question}"
)

def format_docs(docs):
    return "\n".join(d.page_content for d in docs)

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm  # any LangChain-compatible LLM
    | StrOutputParser()
)
answer = chain.invoke("What is the capital of France?")

Cache management

VastDBVectorStore caches table metadata after the first access to avoid repeated bucket/schema/table round trips. If you alter the table structure externally, invalidate the cache:

store.invalidate_table_cache()

Configuration Reference

Constructor: `VastDBVectorStore(...)`

Parameter	Type	Default	Description
`embedding`	`Embeddings`	required	The embeddings model used to generate vectors.
`session`	`vastdb.Session`	required	A pre-built session connected to the VAST cluster.
`bucket`	`str`	required	The VAST bucket name containing the target table.
`schema`	`str`	required	The schema name within the bucket.
`table_name`	`str`	required	The table name for vector operations.
`id_column`	`str`	`"id"`	Column name for document IDs.
`text_column`	`str`	`"text"`	Column name for document text.
`vector_column`	`str`	`"vector"`	Column name for embedding vectors.
`metadata_column`	`str`	`"metadata"`	Column name for document metadata (stored as JSON).
`adbc_driver_path`	`str \| None`	`None`	Path to `libadbc_driver_vastdb.so`. Enables native ADBC vector search via `array_distance()` SQL.
`adbc_endpoint`	`str \| None`	`None`	ADBC/QueryEngine endpoint (hostname or IP). Separate from the HTTP REST endpoint.
`access_key`	`str \| None`	`None`	Access key for ADBC connection.
`secret_key`	`str \| None`	`None`	Secret key for ADBC connection.

Custom column names

Column names default to id, text, vector, and metadata. Override them at construction time:

store = VastDBVectorStore(
    embedding=my_embeddings,
    session=session,
    bucket="my-bucket",
    schema="my-schema",
    table_name="my-table",
    id_column="doc_id",
    text_column="content",
    vector_column="emb",
    metadata_column="meta",
)

Factory classmethod: `from_connection_params(...)`

Creates a VastDBVectorStore by building a vastdb.Session internally from connection parameters.

Parameter	Type	Default	Description
`embedding`	`Embeddings`	required	The embeddings model.
`endpoint`	`str`	required	The VAST cluster HTTP endpoint URL.
`access_key`	`str`	required	Access key for authentication.
`secret_key`	`str`	required	Secret key for authentication.
`bucket`	`str`	required	The VAST bucket name.
`schema`	`str`	required	The schema name within the bucket.
`table_name`	`str`	required	The table name for vector operations.
`adbc_driver_path`	`str \| None`	`None`	Path to ADBC driver shared library.
`adbc_endpoint`	`str \| None`	`None`	ADBC/QueryEngine endpoint.
`**kwargs`			Additional keyword arguments forwarded to the constructor (e.g., custom column names).

ADBC vector search

When adbc_driver_path and adbc_endpoint are both provided, the store uses native ADBC SQL with array_distance() for server-side vector search. This does not require a vector index on the table. If ADBC is unavailable or fails, the store falls back to an in-memory L2Sq distance scan.

store = VastDBVectorStore(
    embedding=my_embeddings,
    session=session,
    bucket="my-bucket",
    schema="my-schema",
    table_name="my-table",
    adbc_driver_path="/usr/lib/libadbc_driver_vastdb.so",
    adbc_endpoint="query-engine.example.com",
    access_key="YOUR_ACCESS_KEY",
    secret_key="YOUR_SECRET_KEY",
)

Subclassing Guide

VastDBVectorStore uses the Template Method pattern. Public methods like add_texts and similarity_search handle embedding, filter conversion, and result formatting, then delegate storage operations to five protected hook methods. Override these hooks to customize behavior without reimplementing the full LangChain interface.

Hook methods

Hook	Purpose	Returns
`_insert_vectors`	Customize record insertion	`list[str]` (IDs)
`_build_metadata_columns`	Customize column layout for metadata	`dict[str, list]`
`_select_columns`	Customize columns retrieved during search	`list[str]`
`_vector_search`	Customize similarity search	`list[tuple[dict, float]]`
`_delete_by_ids`	Customize document deletion	`bool`
`_get_by_ids`	Customize document retrieval	`list[dict]`
`_row_to_document`	Customize row-to-Document conversion	`Document`

Hook signatures

def _insert_vectors(
    self,
    texts: list[str],
    embeddings: list[list[float]],
    metadatas: list[dict],
    ids: list[str],
    *,
    tx: Transaction | None = None,
) -> list[str]: ...

def _vector_search(
    self,
    query_vector: list[float],
    k: int,
    predicate: ibis.Expr | None = None,
    *,
    filter_dict: dict | None = None,
    tx: Transaction | None = None,
) -> list[tuple[dict, float]]: ...

def _delete_by_ids(
    self,
    ids: list[str],
    *,
    tx: Transaction | None = None,
) -> bool: ...

def _get_by_ids(
    self,
    ids: list[str],
    *,
    tx: Transaction | None = None,
) -> list[dict]: ...

def _row_to_document(
    self,
    row: dict,
    score: float | None = None,
) -> Document: ...

Transaction reuse

Each hook opens and closes its own transaction by default. The optional tx parameter lets subclasses pass in an existing transaction for multi-step atomic operations:

with self._session.transaction() as tx:
    self._insert_vectors(texts, embeddings, metadatas, ids, tx=tx)
    # additional operations in the same transaction

Example: typed metadata columns

The base class stores metadata as a single JSON string column. If you need typed columns for performance-critical filtering, set _typed_metadata_columns:

from langchain_vastdb import TypedColumn, VastDBVectorStore


class TypedMetadataStore(VastDBVectorStore):
    """Store with typed 'category' and 'priority' metadata columns."""

    _typed_metadata_columns = {
        "category": TypedColumn(),
        "priority": TypedColumn(),
    }

This automatically extracts category and priority into separate typed columns on insert, preserves any extra metadata in the JSON column, and merges everything back together on read. The public LangChain interface (add_texts, similarity_search, etc.) stays unchanged.

Use TypedColumn fields for custom defaults, PyArrow type coercion, or controlling which columns are backfilled on read (see the Migration Guide for details).

Examples

See the examples/ directory for runnable scripts:

basic_usage.py -- add texts, search, retrieve
rag_pipeline.py -- as_retriever() + LCEL RAG chain
subclassing.py -- declarative typed metadata columns
filtered_search.py -- metadata filtering patterns

Migration Guide

Migrating an existing VectorStore subclass to VastDBVectorStore? See the Migration Guide for step-by-step instructions, a hook mapping table, and a before/after code comparison.

Development

Clone the repository and install dependencies with uv:

uv sync

Run the linter:

uv run ruff check .

Run unit tests:

uv run pytest tests/unit_tests/

Run integration tests (requires a VAST cluster):

uv run pytest tests/integration_tests/

License

Apache-2.0 -- see LICENSE for details. test sync

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.5

May 18, 2026

This version

0.0.4

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_vastdb-0.0.4.tar.gz (148.2 kB view details)

Uploaded May 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langchain_vastdb-0.0.4-py3-none-any.whl (25.4 kB view details)

Uploaded May 14, 2026 Python 3

File details

Details for the file langchain_vastdb-0.0.4.tar.gz.

File metadata

Download URL: langchain_vastdb-0.0.4.tar.gz
Upload date: May 14, 2026
Size: 148.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for langchain_vastdb-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`993ac2d93b77fbb65ff8bb1e80c42ec2006e8a87b1a14a73756713f5a01fd95e`
MD5	`0ba05b6d25cf277d34b9ff57a405fa27`
BLAKE2b-256	`5e6cfcf2012b0ede514d70cc46acf3bf7fb34f5be746d2a5f0114728f6015ec5`

See more details on using hashes here.

File details

Details for the file langchain_vastdb-0.0.4-py3-none-any.whl.

File metadata

Download URL: langchain_vastdb-0.0.4-py3-none-any.whl
Upload date: May 14, 2026
Size: 25.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for langchain_vastdb-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c72d283d01cb560e2a0563f743a160b1cd18c7c73fd780110b7e08ea71189bed`
MD5	`5c149f1918f5e112e489df0585c16546`
BLAKE2b-256	`3a639d16cc934f824accba6964fb9277b11c6230bca07e7c91443d4b19f356d8`

See more details on using hashes here.

langchain-vastdb 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

langchain-vastdb

Requirements

Installation

Quickstart

Option 1: Pass a pre-built session

Option 2: Use the convenience factory

Option 3: Create a store and add texts in one call

CRUD Operations

Using as a retriever

Cache management

Configuration Reference

Constructor: VastDBVectorStore(...)

Custom column names

Factory classmethod: from_connection_params(...)

ADBC vector search

Subclassing Guide

Hook methods

Hook signatures

Transaction reuse

Example: typed metadata columns

Examples

Migration Guide

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Constructor: `VastDBVectorStore(...)`

Factory classmethod: `from_connection_params(...)`