Skip to main content

A Python client for TiDB Vector

Project description

tidb-vector-python

This is a Python client for TiDB Vector.

Now only TiDB Cloud Serverless cluster support vector data type, see this docs for more information.

Installation

pip install tidb-vector

Usage

TiDB vector supports below distance functions:

  • L1Distance
  • L2Distance
  • CosineDistance
  • NegativeInnerProduct

It also supports using hnsw index with l2 or cosine distance to speed up the search, for more details see Vector Search Indexes in TiDB

Supports following orm or framework:

SQLAlchemy

Learn how to connect to TiDB Serverless in the TiDB Cloud documentation.

Define table with vector field

from sqlalchemy import Column, Integer, create_engine
from sqlalchemy.orm import declarative_base
from tidb_vector.sqlalchemy import VectorType

engine = create_engine('mysql://****.root:******@gateway01.xxxxxx.shared.aws.tidbcloud.com:4000/test')
Base = declarative_base()

class Test(Base):
    __tablename__ = 'test'
    id = Column(Integer, primary_key=True)
    embedding = Column(VectorType(3))

# or add hnsw index when creating table
class TestWithIndex(Base):
    __tablename__ = 'test_with_index'
    id = Column(Integer, primary_key=True)
    embedding = Column(VectorType(3), comment="hnsw(distance=l2)")

Base.metadata.create_all(engine)

Insert vector data

test = Test(embedding=[1, 2, 3])
session.add(test)
session.commit()

Get the nearest neighbors

session.scalars(select(Test).order_by(Test.embedding.l2_distance([1, 2, 3.1])).limit(5))

Get the distance

session.scalars(select(Test.embedding.l2_distance([1, 2, 3.1])))

Get within a certain distance

session.scalars(select(Test).filter(Test.embedding.l2_distance([1, 2, 3.1]) < 0.2))

Django

To use vector field in Django, you need to use django-tidb.

Peewee

Define peewee table with vector field

from peewee import Model, MySQLDatabase
from tidb_vector.peewee import VectorField

# Using `pymysql` as the driver
connect_kwargs = {
    'ssl_verify_cert': True,
    'ssl_verify_identity': True,
}

# Using `mysqlclient` as the driver
connect_kwargs = {
    'ssl_mode': 'VERIFY_IDENTITY',
    'ssl': {
        # Root certificate default path
        # https://docs.pingcap.com/tidbcloud/secure-connections-to-serverless-clusters/#root-certificate-default-path
        'ca': '/etc/ssl/cert.pem'  # MacOS
    },
}

db = MySQLDatabase(
    'peewee_test',
    user='xxxxxxxx.root',
    password='xxxxxxxx',
    host='xxxxxxxx.shared.aws.tidbcloud.com',
    port=4000,
    **connect_kwargs,
)

class TestModel(Model):
    class Meta:
        database = db
        table_name = 'test'

    embedding = VectorField(3)

# or add hnsw index when creating table
class TestModelWithIndex(Model):
    class Meta:
        database = db
        table_name = 'test_with_index'

    embedding = VectorField(3, constraints=[SQL("COMMENT 'hnsw(distance=l2)'")])


db.connect()
db.create_tables([TestModel, TestModelWithIndex])

Insert vector data

TestModel.create(embedding=[1, 2, 3])

Get the nearest neighbors

TestModel.select().order_by(TestModel.embedding.l2_distance([1, 2, 3.1])).limit(5)

Get the distance

TestModel.select(TestModel.embedding.cosine_distance([1, 2, 3.1]).alias('distance'))

Get within a certain distance

TestModel.select().where(TestModel.embedding.l2_distance([1, 2, 3.1]) < 0.5)

TiDB Vector Client

Within the framework, you can directly utilize the built-in TiDBVectorClient, as demonstrated by integrations like Langchain and Llama index, to seamlessly interact with TiDB Vector. This approach abstracts away the need to manage the underlying ORM, simplifying your interaction with the vector store.

We provide TiDBVectorClient which is based on sqlalchemy, you need to use pip install tidb-vector[client] to install it.

Create a TiDBVectorClient instance:

from tidb_vector.integrations import TiDBVectorClient

TABLE_NAME = 'vector_test'
CONNECTION_STRING = 'mysql+pymysql://<USER>:<PASSWORD>@<HOST>:4000/<DB>?ssl_verify_cert=true&ssl_verify_identity=true'

tidb_vs = TiDBVectorClient(
    # the table which will store the vector data
    table_name=TABLE_NAME,
    # tidb connection string
    connection_string=CONNECTION_STRING,
    # the dimension of the vector, in this example, we use the ada model, which has 1536 dimensions
    vector_dimension=1536,
    # if recreate the table if it already exists
    drop_existing_table=True,
)

Bulk insert:

ids = [
    "f8e7dee2-63b6-42f1-8b60-2d46710c1971",
    "8dde1fbc-2522-4ca2-aedf-5dcb2966d1c6",
    "e4991349-d00b-485c-a481-f61695f2b5ae",
]
documents = ["foo", "bar", "baz"]
embeddings = [
    text_to_embedding("foo"),
    text_to_embedding("bar"),
    text_to_embedding("baz"),
]
metadatas = [
    {"page": 1, "category": "P1"},
    {"page": 2, "category": "P1"},
    {"page": 3, "category": "P2"},
]

tidb_vs.insert(
    ids=ids,
    texts=documents,
    embeddings=embeddings,
    metadatas=metadatas,
)

Query:

tidb_vs.query(text_to_embedding("foo"), k=3)

# query with filter
tidb_vs.query(text_to_embedding("foo"), k=3, filter={"category": "P1"})

Bulk delete:

tidb_vs.delete(["f8e7dee2-63b6-42f1-8b60-2d46710c1971"])

# delete with filter
tidb_vs.delete(["f8e7dee2-63b6-42f1-8b60-2d46710c1971"], filter={"category": "P1"})

Examples

There are some examples to show how to use the tidb-vector-python to interact with TiDB Vector in different scenarios.

for more examples, see the examples directory.

Contributing

Please feel free to reach out to the maintainers if you have any questions or need help with the project. Before contributing, please read the CONTRIBUTING.md file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidb_vector-0.0.12.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

tidb_vector-0.0.12-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file tidb_vector-0.0.12.tar.gz.

File metadata

  • Download URL: tidb_vector-0.0.12.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.5.0

File hashes

Hashes for tidb_vector-0.0.12.tar.gz
Algorithm Hash digest
SHA256 27dd7bfd0504df36221651a43ee5c7865b80cbc9fb24f2e3833d2d819dcedc11
MD5 cfaa4012728d6babe1a9cbd67fdf2ea4
BLAKE2b-256 5e481fe45d1d88c5cfd2070de94588bea578bb13bf720070f4945df284650726

See more details on using hashes here.

File details

Details for the file tidb_vector-0.0.12-py3-none-any.whl.

File metadata

  • Download URL: tidb_vector-0.0.12-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.5.0

File hashes

Hashes for tidb_vector-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 f4886f8b317365aba18b8bbab606d026ba70576b854f465eb6e6b498140224b5
MD5 6668c82fe9d19760230b7661187e0da2
BLAKE2b-256 efcf47ca8b8dc1590dbe82f8e9eae784c14807b75e737ca93b8bb983bfb127e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page