A unified Python SDK for vector databases supporting pgvector and tcvector

These details have not been verified by PyPI

Project links

Project description

Vector DB SDK

A unified Python SDK for vector databases supporting both pgvector and tcvector backends.

Installation

pip install vector-db-sdk

Quick Start

from vector_db_sdk import VectorDB, DistanceStrategy

# Initialize with pgvector
db = VectorDB(
    connection_type="pgvector",
    connection_info={
        "username": "your_username",
        "password": "your_password", 
        "host": "localhost",
        "port": 5432,
        "database": "your_db",
        "schema": "public"  # optional
    },
    distance=DistanceStrategy.COSINE
)

# Search for similar vectors
results = db.similarity_search_with_score(
    embedding=[0.1, 0.2, 0.3, ...],  # your query vector
    table_name="your_table",
    k=5,
    score_threshold=0.8
)

Features

🔄 Unified Interface: Single API for both pgvector and tcvector
🚀 High Performance: Optimized for large-scale vector operations
🛡️ Type Safety: Full type hints support
📊 Flexible Filtering: Advanced condition-based filtering
🔍 Multiple Distance Metrics: Cosine, Euclidean, and more

Background for tcvector

tcvector does not have a CLI, cannot execute raw SQL queries, and does not support modifying index parameters.
tcvector does not support using the same table name on different database schema, and cannot rename tables.
tcvector does not allow WHERE on columns that are not marked as filters. See VectorDB.similarity_search_with_score conditions.
To sample data for checking purposes, use VectorDB.query VectorDB.row_count.

Testing (for maintainers)

python -m unittest -v

Classes

vector_db.VectorDB

param connection_type: str Type of connection to use for vector db. Currently only supports pgvector and tcvector.

param connection_info: Dict[str, str | int] connection_info keys:

required username
required password
required host
required port
required database
optional schema

param distance: util.DistanceStrategy = DistanceStrategy.COSINE Distance metric used for vector comparison. Default cosine (was euclidean l2 but found cosine was the fastest).

param col_names: Dict[str, str] = {} Dictionary used to map old sdk column names for flexibility. To map an old column name to an existing name in your table, use {"old_col_name": "new_col_name"}. Current old columns are:

contents libvector.CONTENT_COL: Usually the textual content represented by the embedding.
embeddings libvector.EMBEDDING_COL: Embedding representing the data.
metadatas libvector.METADATA_COL: Dictionary of metadata.

param timeout: float = None Timeout seconds for all database related operations. Applies only to tcvector.

Methods

VectorDB.execute

Used to execute raw sql. Similar to native .execute(). Note that tcvector does not support this method.

param query: str | bytes Sql query to be executed. Values can be represented in the following ways: INSERT INTO table (id, value) VALUES (?, ?) INSERT INTO table (id, value) VALUES (%(id)s, %(value)s)

param vars: Union[List[any], Dict[str, any], List[Dict[str, any]]] Variables to be substituted, in the following ways: [1, "some value"] {"id": 1, "value": "some value"} [{"id": 1, "value": "some value"}]

param fetchall bool = False If true, will return List[List[any]] result.

param commit bool = True If true, will immediately commit changes to database.

returns Union[None, List[List[any]]] If fetchall is set to True, will return a list of rows.

VectorDB.find_table_schemas

List all schemas containing table_name.

param table_name str

returns List[str]

VectorDb.list_schemas

List all schemas.

returns List[str]

VectorDb.list_tables

param schema str = "" If empty, will use schema provided by conf.

returns List[str]

VectorDB.similarity_search_with_score

Retrieve top k results by index's distance similarity >= score_threshold (<= score_threshold if using L2, or pgvector), either from table catalogue or specified table.

param embedding List[float] Use None if using internal tc embedding model. To use internal model, you must first vector_db.create_table with the desired model_name.

param tags List[str] Deprecated. Search all tables from catalogue in pgvector, matching specified tags.

param probes int = None Number of pgvector probes to use. Default to be computed by sdk.

param k *int = 1 Number of results to return.

param score_threshold *int = 1 Return results less than score_threshold

param conditions List[Dict[str, any]] = [] List of conditions to filter by. Best performance when filters are performed on partitioned columns. Note the supported conditions are different between tcvector and pgvector tcvector A condition consists of:

field string: Column name.
operator string: Comparison operator between field and values.
values any: The value to be compared against. If it is a list, the operator should be IN. Operator values:
For string, =, !=, in, not in. in operations only apply to string to list comparison.
For uint64, >, !=, >=, =, <, <=.
For array, in, include, exclude, include all. pgvector A condition consists of:
field string: Column name.
operator string: Comparison operator between field and values. Some examples are <, ==, IN.
values any: The value to be compared against. If it is a list, the operator should be IN.

param operators *List[str] = [] Deprecated.

param distance *DistanceStrategy = DistanceStrategy.EUCLIDEAN Deprecated.

param table_name str = ""

param search_fields List[str] = [] List of column names to select and include in results.

param content str = "" Provide value if using internal tc embedding model.

returns List[Dict[str, any]] Results will always include the following fields, in addition to those specified in search_fields:

text: From contents column.
metadata: From metadatas column.
score: Computed similarity score between embeddings column and given embedding using distance metric. Lower score means higher similarity.

VectorDB.similarity_search_with_score_multiple

Retrieve top k results by index's distance similarity >= score_threshold (<= score_threshold if using L2, or pgvector), either from table catalogue or specified table.

param embeddings List[List[float]] Use None if using internal tc embedding model. To use internal model, you must first vector_db.create_table with the desired model_name.

param tags List[str] Deprecated. Search all tables from catalogue in pgvector, matching specified tags.

param probes int = None Number of pgvector probes to use. Default to be computed by sdk.

param k *int = 1 Number of results to return.

param score_threshold *int = 1 Return results less than score_threshold

param conditions List[Dict[str, any]] = []

param operators *List[str] = [] Deprecated.

param distance *DistanceStrategy = DistanceStrategy.EUCLIDEAN Deprecated.

param table_name str = ""

param search_fields List[str] = [] List of column names to select and include in results.

param contents List[str] = [] Provide value if using internal tc embedding model.

returns List[Dict[str, any]] Results will always include the following fields, in addition to those specified in search_fields:

text: From contents column.
metadata: From metadatas column.
score: Computed similarity score between embeddings column and given embedding using distance metric. Lower score means higher similarity.

VectorDB.reindex

Should always be used after inserting data. Now using FLAT index. ~~tcvector requires that there are between [30 * nlist, 256 * nlist] rows of data.~~

param table_name str

param force bool = False

VectorDB.insert_custom_data_table

Suggested method for inserting single row of data into table.

param table_name str

param embedding List[float] | None Use None if using internal tc embedding model. To use internal model, you must first vector_db.create_table with the desired model_name.

param partitions List[str] = [] List of columns that the table is partitioned by. Beneficial for search and index speed when filtering partitioned columns.

param filters List[str] = [] For tcvector. List of columns that the table can be filtered by, not used in uniqueness tests.

param uses_primary_key bool = True Handles primary keys conflicts with UPDATE instead of INSERT.

param build_index bool = False Add vector to index upon insertion, recommended only if inserts and updates are frequent. Default False.

param **extra Keyword arguments for additional columns, ..., column_name=column_value, ....

VectorDB.insert_custom_data_table_multiple

Suggested method for inserting multiple rows of data into table.

param table_name str

param embeddings List[List[float]] | None Use None if using internal tc embedding model. To use internal model, you must first vector_db.create_table with the desired model_name. Max batch_size for tc is 20.

param partitions List[str] = [] List of columns that the table is partitioned by. Beneficial for search and index speed when filtering partitioned columns.

param filters List[str] = [] For tcvector. List of columns that the table can be filtered by, not used in uniqueness tests.

param uses_primary_key bool = True Handles primary keys conflicts with UPDATE instead of INSERT.

param build_index bool = False Add vector to index upon insertion, recommended only if inserts and updates are frequent. Default False.

param **extra Keyword arguments for additional columns, all values must be arrays with equal length to embeddings, ..., column_name=column_value, ....

VectorDB.from_documents

Deprecated. Used to insert data from langchain into table_name under general table catalogue.

param table_name str

param documents List[langchain.docstore.document.Document] List of langchain documents.

param embeddingModel List[List[float]] Model used for computing embeddings.

param tags List[str] List of tags to describe the table.

param dimensions int Length of embedding. OpenAI ada002 embedding length is 1536.

VectorDB.from_existing_documents

Deprecated. Used to insert data under general table catalogue.

param table_name str

param contents List[str] List of text contents.

param metadatas Optional[List[Dict[str, any]]] List of metadatas. Set None if not used.

param embeddings List[List[float]] List of embeddings.

param tags List[str] List of tags to describe the table.

param dimensions int Length of embedding. OpenAI ada002 embedding length is 1536.

VectorDB.create_table

Creates a table, only for tcvector.

param table_name str

param indices Dict[str, vector_db_sdk.constants.IndexType] contents have been included by default. A dictionary mapping of index column names to their types. Required for any column used in filtering.

param description str

param vector_length int

param num_rows int = 1 Estimated lower bound for number of rows in table, used to compute n_lists. If unsure, use a value of 1. Note that if there are less rows in the table than the specified value, index building may fail. The purpose of setting this number is to try to maximize nlists used in building the index, which will affect the query speed of bigger tables.

param model_name str = "" Model name for internal tc embedding model, leave empty if using external model. See values under tcvectordb.model.enum.EmbeddingModel, suggested BAAI/bge-m3.

VectorDB.delete_row_by_id

Delete single row.

param table_name str

param partitions_list List[Dict[str, any]] A list of partition key mappings of the row, of the column name to its value. Should be same length as contents.

param contents List[str] The text content of the row.

param ids List[str] = [] The raw id of the row, for tcvector.

returns int Affected rows.

VectorDB.delete_rows

Delete rows following condition.

param table_name str

param conditions List[Dict[str, any]] = [] List of conditions to filter by. See VectorDB.similarity_search_with_score conditions.

returns int Affected rows.

VectorDB.delete_table

Delete table

param table_name str

VectorDB.query

Send a query for tcvector to receive a list of rows.

param table_name str

param limit int = 16384 Limits the number of rows returned, must be within [1, 16384].

param offset int = 0 Number of rows to skip. To be used for retrieving rows in batches, when total number of rows exceed 16384.

param conditions List[Dict[str, any]] = [] List of conditions to filter by. See VectorDB.similarity_search_with_score conditions.

param output_fields str = [] List of column names to be selected in output. If empty, will select all columns except vector. Note the following columns are compulsory and fixed in tcvector: id, vector

returns List[Dict[str, any]] Every dict in the list represents 1 row of data mapping the column name to its value.

VectorDB.row_count

Returns row count of the table, for tcvector

param table_name str

returns int

VectorDB.delete_collection

Deprecated. Deletes a table from catalogue.

VectorDB.retrieve_all_collection

Deprecated. Returns all tables and tags in catalogue.

VectorDB.custom_similarity_search

Deprecated.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.19

Jul 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vector_db_sdk-0.0.19.tar.gz (23.6 kB view details)

Uploaded Jul 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vector_db_sdk-0.0.19-py3-none-any.whl (21.1 kB view details)

Uploaded Jul 29, 2025 Python 3

File details

Details for the file vector_db_sdk-0.0.19.tar.gz.

File metadata

Download URL: vector_db_sdk-0.0.19.tar.gz
Upload date: Jul 29, 2025
Size: 23.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for vector_db_sdk-0.0.19.tar.gz
Algorithm	Hash digest
SHA256	`9abc764b918273649311b92ffd86a84a7e49508dececfc94cb02ca263295f46b`
MD5	`be6de83c706e86130e4565ec9511a08d`
BLAKE2b-256	`1e800f84454605e584ce608a9423ac09d74c3424e34ade828720ef11073c8567`

See more details on using hashes here.

File details

Details for the file vector_db_sdk-0.0.19-py3-none-any.whl.

File metadata

Download URL: vector_db_sdk-0.0.19-py3-none-any.whl
Upload date: Jul 29, 2025
Size: 21.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for vector_db_sdk-0.0.19-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ce61922f1e349d3f0ba4c3871d0dd605519a8104bd121f9577e582ced2d537c2`
MD5	`c53643fd712209f293037c26c31cc70f`
BLAKE2b-256	`29c67f77c61a88191bba6376f24d6fa1c16b68ce2157dc7b840e517097d2541d`

See more details on using hashes here.

vector-db-sdk 0.0.19

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Vector DB SDK

Installation

Quick Start

Features

Background for tcvector

Testing (for maintainers)

Classes

vector_db.VectorDB

Methods

VectorDB.execute

VectorDB.find_table_schemas

VectorDb.list_schemas

VectorDb.list_tables

VectorDB.similarity_search_with_score

VectorDB.similarity_search_with_score_multiple

VectorDB.reindex

VectorDB.insert_custom_data_table

VectorDB.insert_custom_data_table_multiple

VectorDB.from_documents

VectorDB.from_existing_documents

VectorDB.create_table

VectorDB.delete_row_by_id

VectorDB.delete_rows

VectorDB.delete_table

VectorDB.query

VectorDB.row_count

VectorDB.delete_collection

VectorDB.retrieve_all_collection

VectorDB.custom_similarity_search

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes