Skip to main content

A CLI tool for migrating data from third-party vector databases to TiDB

Project description

vec2tidb

A CLI tool for migrating vector data from vector databases to TiDB.

Python License

┌──────────────────┐    ┌────────────────┐    ┌────────────┐
│ Vector Database  │───▶│  vec2tidb CLI  │───▶│    TiDB    │
└──────────────────┘    └────────────────┘    └────────────┘

Supported vector databases:

  • Qdrant

Why migrate from vector database to TiDB?

TiDB is an open-source, distributed SQL database for modern AI applications:

  • 🗄️ Unified storage: Store vector embeddings, documents, knowledge graphs, and operational data in a single database to reduce maintenance overhead.
  • 🔍 Native SQL support: Run complex queries with full SQL capabilities, including joins, subqueries, aggregations, and advanced analytics.
  • 📈 Effortless scalability: Scale out horizontally with ease to handle dynamic and growing workloads.
  • 🔒 Strong consistency: Ensure data integrity and reliability with ACID transactions and strong consistency guarantees.

Installation

To install the latest version, you can use the following command:

pip install vec2tidb

Usage

To show all commands, use the following command:

vec2tidb --help

Qdrant Commands

To show all qdrant subcommands, use the following command:

vec2tidb qdrant --help

[!NOTE]

It is recommended to use qdrant dump to export the vector data to CSV file, upload the CSV file to S3 (or other cloud storage), and then use Import feature in TiDB Cloud to import the data to TiDB.

Command: qdrant migrate

To migrate vectors from Qdrant collection to a new TiDB table, use create mode.

vec2tidb qdrant migrate \
  --qdrant-api-url http://localhost:6333 \
  --qdrant-collection-name test_collection \
  --tidb-database-url mysql+pymysql://root:@localhost:4000/test

To migrate the vectors from Qdrant collection to an existing TiDB table, use update mode.

vec2tidb qdrant migrate \
  --qdrant-api-url http://localhost:6333 \
  --qdrant-collection-name test_collection \
  --tidb-database-url mysql+pymysql://root:@localhost:4000/test \
  --mode update \
  --table-name test_table \
  --id-column id \
  --vector-column vector \
  --payload-column payload

Command Options

Option Description
--mode Migration mode: create (create new table) or update (update existing table by ID). Default: create
--qdrant-api-url Qdrant API endpoint. Default: http://localhost:6333
--qdrant-api-key Qdrant API key (if authentication is enabled)
--qdrant-collection-name Name of the source Qdrant collection (required)
--tidb-database-url TiDB connection string. Default: mysql+pymysql://root:@localhost:4000/test
--table-name Target TiDB table name. Required in update mode; defaults to collection name in create mode
--id-column ID column name in TiDB table. Required in update mode; default: id in create mode
--id-column-type ID column type in TiDB table. Default: BIGINT
--vector-column Vector column name in TiDB table. Required in update mode; default: vector in create mode
--payload-column Payload column name in TiDB table. Optional in update mode; default: payload in create mode
--batch-size Batch size for migration. Default: 100
--workers Number of concurrent workers for migration. Default: 1
--drop-table Drop the target table if it exists (flag)

Environment Variables:

The following options can also be set via environment variables:

Variable Description
QDRANT_API_URL Qdrant API endpoint. Default: http://localhost:6333
QDRANT_API_KEY Qdrant API key (if authentication is enabled)
QDRANT_COLLECTION_NAME Qdrant collection name
TIDB_DATABASE_URL TiDB connection string. Default: mysql+pymysql://root:@localhost:4000/test

For example:

export QDRANT_API_URL="http://localhost:6333"
export QDRANT_API_KEY="your-api-key"
export QDRANT_COLLECTION_NAME="my_collection"
export TIDB_DATABASE_URL="mysql+pymysql://root:@localhost:4000/test"

Command: qdrant load-sample

To load a sample dataset into Qdrant collection.

vec2tidb qdrant load-sample \
  --qdrant-api-url http://localhost:6333 \
  --qdrant-collection-name sample_collection \
  --dataset midlib

Command Options

Option Description
--qdrant-api-url Qdrant API endpoint. Default: http://localhost:6333
--qdrant-api-key Qdrant API key (if authentication is enabled)
--qdrant-collection-name Name of the target Qdrant collection (required)
--dataset Sample dataset to load: midlib, qdrant-docs, prefix-cache. Default: midlib (required)
--snapshot-uri Custom snapshot URI (auto-determined from dataset if not provided)

Command: qdrant dump

Export Qdrant collection data to CSV format with optimized performance.

vec2tidb qdrant dump \
  --qdrant-collection-name my_collection \
  --output-file export.csv

Command Options

Option Description
--qdrant-collection-name Qdrant collection name (required)
--output-file Output CSV file path (required)
--limit Maximum number of records to export
--no-vectors Exclude vector data from export
--no-payload Exclude payload data from export
--batch-size Batch size for processing (default: 500)
--max-concurrent-batches Concurrent batch requests (default: 5)
--buffer-size File buffer size in bytes (default: 10000)

Command: qdrant benchmark

To run performance benchmarks with different configurations.

vec2tidb qdrant benchmark \
  --qdrant-api-url http://localhost:6333 \
  --qdrant-collection-name test_collection \
  --tidb-database-url mysql+pymysql://root:@localhost:4000/test \
  --dataset midlib \
  --workers 1,2,4 \
  --batch-sizes 100,500

Command Options

Option Description
--qdrant-api-url Qdrant API endpoint. Default: http://localhost:6333
--qdrant-api-key Qdrant API key (if authentication is enabled)
--qdrant-collection-name Name of the source Qdrant collection (required)
--tidb-database-url TiDB connection string. Default: mysql+pymysql://root:@localhost:4000/test
--dataset Auto-load sample dataset: midlib, qdrant-docs, prefix-cache
--snapshot-uri Custom snapshot URI for auto-loading data (overrides --dataset)
--workers Comma-separated list of worker counts to test. Default: 1,2,4,8
--batch-sizes Comma-separated list of batch sizes to test. Default: 100,500,1000
--table-prefix Prefix for benchmark table names. Default: benchmark_test

Development

For development setup and contribution guidelines, see DEVELOPMENT.md.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vec2tidb-0.0.3.tar.gz (285.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vec2tidb-0.0.3-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file vec2tidb-0.0.3.tar.gz.

File metadata

  • Download URL: vec2tidb-0.0.3.tar.gz
  • Upload date:
  • Size: 285.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.8

File hashes

Hashes for vec2tidb-0.0.3.tar.gz
Algorithm Hash digest
SHA256 12140681fe84494ea18feb706ed7b37a804fa1748a233198214d314573e39c46
MD5 e4ef5d29dd3378dbb82f0eadd16b0f4e
BLAKE2b-256 8eecd14ad4bd718fc0afe556ebadd738a5427d590c68303fe8297f96d6b2b07f

See more details on using hashes here.

File details

Details for the file vec2tidb-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: vec2tidb-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.8

File hashes

Hashes for vec2tidb-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 49718f87422b4231192104047f22afb2d4e93e40e309fc082ba1d225e7e43c51
MD5 25d85dc13284d8d3a04a2a8991e4d283
BLAKE2b-256 93b42cfc4f56e19032ecd6f06292749e4d5b401545c0599e0baadcef783b1d9b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page