Skip to main content

A Python client for TiDB Vector

Project description

tidb-vector-python

This is a Python client for TiDB Vector.

Now only TiDB Cloud Serverless cluster support vector data type, see this docs for more information.

Installation

pip install tidb-vector

Usage

TiDB vector supports below distance functions:

  • L1Distance
  • L2Distance
  • CosineDistance
  • NegativeInnerProduct

It also supports using hnsw index with l2 or cosine distance to speed up the search, for more details see Vector Search Indexes in TiDB

Supports following orm or framework:

SQLAlchemy

Learn how to connect to TiDB Serverless in the TiDB Cloud documentation.

Define table with vector field

from sqlalchemy import Column, Integer, create_engine
from sqlalchemy.orm import declarative_base
from tidb_vector.sqlalchemy import VectorType

engine = create_engine('mysql://****.root:******@gateway01.xxxxxx.shared.aws.tidbcloud.com:4000/test')
Base = declarative_base()

class Test(Base):
    __tablename__ = 'test'
    id = Column(Integer, primary_key=True)
    embedding = Column(VectorType(3))

# or add hnsw index when creating table
class TestWithIndex(Base):
    __tablename__ = 'test_with_index'
    id = Column(Integer, primary_key=True)
    embedding = Column(VectorType(3), comment="hnsw(distance=l2)")

Base.metadata.create_all(engine)

Insert vector data

test = Test(embedding=[1, 2, 3])
session.add(test)
session.commit()

Get the nearest neighbors

session.scalars(select(Test).order_by(Test.embedding.l2_distance([1, 2, 3.1])).limit(5))

Get the distance

session.scalars(select(Test.embedding.l2_distance([1, 2, 3.1])))

Get within a certain distance

session.scalars(select(Test).filter(Test.embedding.l2_distance([1, 2, 3.1]) < 0.2))

Django

To use vector field in Django, you need to use django-tidb.

Peewee

Define peewee table with vector field

from peewee import Model, MySQLDatabase
from tidb_vector.peewee import VectorField

# Using `pymysql` as the driver
connect_kwargs = {
    'ssl_verify_cert': True,
    'ssl_verify_identity': True,
}

# Using `mysqlclient` as the driver
connect_kwargs = {
    'ssl_mode': 'VERIFY_IDENTITY',
    'ssl': {
        # Root certificate default path
        # https://docs.pingcap.com/tidbcloud/secure-connections-to-serverless-clusters/#root-certificate-default-path
        'ca': '/etc/ssl/cert.pem'  # MacOS
    },
}

db = MySQLDatabase(
    'peewee_test',
    user='xxxxxxxx.root',
    password='xxxxxxxx',
    host='xxxxxxxx.shared.aws.tidbcloud.com',
    port=4000,
    **connect_kwargs,
)

class TestModel(Model):
    class Meta:
        database = db
        table_name = 'test'

    embedding = VectorField(3)

# or add hnsw index when creating table
class TestModelWithIndex(Model):
    class Meta:
        database = db
        table_name = 'test_with_index'

    embedding = VectorField(3, constraints=[SQL("COMMENT 'hnsw(distance=l2)'")])


db.connect()
db.create_tables([TestModel, TestModelWithIndex])

Insert vector data

TestModel.create(embedding=[1, 2, 3])

Get the nearest neighbors

TestModel.select().order_by(TestModel.embedding.l2_distance([1, 2, 3.1])).limit(5)

Get the distance

TestModel.select(TestModel.embedding.cosine_distance([1, 2, 3.1]).alias('distance'))

Get within a certain distance

TestModel.select().where(TestModel.embedding.l2_distance([1, 2, 3.1]) < 0.5)

TiDB Vector Client

Within the framework, you can directly utilize the built-in TiDBVectorClient, as demonstrated by integrations like Langchain and Llama index, to seamlessly interact with TiDB Vector. This approach abstracts away the need to manage the underlying ORM, simplifying your interaction with the vector store.

We provide TiDBVectorClient which is based on sqlalchemy, you need to use pip install tidb-vector[client] to install it.

Create a TiDBVectorClient instance:

from tidb_vector.integrations import TiDBVectorClient

TABLE_NAME = 'vector_test'
CONNECTION_STRING = 'mysql+pymysql://<USER>:<PASSWORD>@<HOST>:4000/<DB>?ssl_verify_cert=true&ssl_verify_identity=true'

tidb_vs = TiDBVectorClient(
    # the table which will store the vector data
    table_name=TABLE_NAME,
    # tidb connection string
    connection_string=CONNECTION_STRING,
    # the dimension of the vector, in this example, we use the ada model, which has 1536 dimensions
    vector_dimension=1536,
    # if recreate the table if it already exists
    drop_existing_table=True,
)

Bulk insert:

ids = [
    "f8e7dee2-63b6-42f1-8b60-2d46710c1971",
    "8dde1fbc-2522-4ca2-aedf-5dcb2966d1c6",
    "e4991349-d00b-485c-a481-f61695f2b5ae",
]
documents = ["foo", "bar", "baz"]
embeddings = [
    text_to_embedding("foo"),
    text_to_embedding("bar"),
    text_to_embedding("baz"),
]
metadatas = [
    {"page": 1, "category": "P1"},
    {"page": 2, "category": "P1"},
    {"page": 3, "category": "P2"},
]

tidb_vs.insert(
    ids=ids,
    texts=documents,
    embeddings=embeddings,
    metadatas=metadatas,
)

Query:

tidb_vs.query(text_to_embedding("foo"), k=3)

# query with filter
tidb_vs.query(text_to_embedding("foo"), k=3, filter={"category": "P1"})

Bulk delete:

tidb_vs.delete(["f8e7dee2-63b6-42f1-8b60-2d46710c1971"])

# delete with filter
tidb_vs.delete(["f8e7dee2-63b6-42f1-8b60-2d46710c1971"], filter={"category": "P1"})

Examples

There are some examples to show how to use the tidb-vector-python to interact with TiDB Vector in different scenarios.

for more examples, see the examples directory.

Contributing

Please feel free to reach out to the maintainers if you have any questions or need help with the project. Before contributing, please read the CONTRIBUTING.md file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidb_vector-0.0.11.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

tidb_vector-0.0.11-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file tidb_vector-0.0.11.tar.gz.

File metadata

  • Download URL: tidb_vector-0.0.11.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.5.0

File hashes

Hashes for tidb_vector-0.0.11.tar.gz
Algorithm Hash digest
SHA256 bfb77562e8f52a932b82d3c1c17008b78560aa172de3ce53fd5c3dc3b7ec56aa
MD5 41af7e77e6a16225e5c775c4923a9f02
BLAKE2b-256 734aa9f29747b8ef4a5ecf6c8129c0dae1ba079034101506f8b6a93b8b6b4f01

See more details on using hashes here.

File details

Details for the file tidb_vector-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: tidb_vector-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.2 Darwin/23.5.0

File hashes

Hashes for tidb_vector-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 b33ae42025e26769975e62747fd063d73b7f6ad777b421c6186ac3efff703f2f
MD5 7f5507f807aacdd3360affe7cbd18b58
BLAKE2b-256 5c1d654250014659b4277576870d40dcda7fe49ac21dab631de023d37a367c75

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page