Sync Milvus data to PostgreSQL for validation

Project description

PyMilvus PostgreSQL

pymilvus-pg is a Python library primarily designed for validating Milvus data correctness. It achieves this by synchronizing Milvus write operations (inserts, deletes, upserts) to a PostgreSQL database in real-time. By comparing the data in Milvus with the shadow data in PostgreSQL, users can verify the consistency and accuracy of their Milvus deployments. While it facilitates data synchronization, its core utility lies in providing a robust mechanism for data validation.

Features

Milvus Client Extension: Extends the MilvusClient functionality.
Data Synchronization: Keeps data in Milvus and a PostgreSQL shadow database synchronized.
Data Export: Allows exporting collection data from the shadow PostgreSQL instance.
Query Correctness Validation: Enables verification of Milvus query results by comparing them against PostgreSQL.
Milvus Data Correctness Validation: Enables full data comparison between Milvus and PostgreSQL.

Installation

To install pymilvus-pg, you can use pip after installing PDM or directly if the package is published:

# Ensure you have pdm installed if you are working with the source
# pip install pdm

# Install dependencies using pdm (from project root)
# pdm install

# Or install the package if available on PyPI (example)
# pip install pymilvus-pg

Usage

Here's a basic example of how to use pymilvus-pg:

from pymilvus_pg import MilvusPGClient as MilvusClient
from pymilvus.milvus_client import IndexParams
from pymilvus import DataType
import random
import time

# Initialize the client
# Replace with your Milvus URI and PostgreSQL connection string
milvus_client = MilvusClient(
    uri="http://localhost:19530",
    pg_conn_str="postgresql://user:password@localhost:5432/milvus_shadow",
)

collection_name = f"my_collection_{int(time.time())}"

# 1. Create schema
schema = milvus_client.create_schema()
schema.add_field("id", DataType.INT64, is_primary=True, auto_id=False)
schema.add_field("name", DataType.VARCHAR, max_length=100)
schema.add_field("age", DataType.INT64)
schema.add_field("json_field", DataType.JSON)
schema.add_field("array_field", DataType.ARRAY, element_type=DataType.INT64, max_capacity=10)
schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=8)

# 2. Create collection
milvus_client.create_collection(collection_name, schema)

# 3. Create index for the vector field
index_params = IndexParams()
index_params.add_index("embedding", metric_type="L2", index_type="IVF_FLAT", params={"nlist": 128})
milvus_client.create_index(collection_name, index_params)

# 4. Load collection
milvus_client.load_collection(collection_name)

# 5. Insert data
data_to_insert = [
    {
        "id": i,
        "name": f"item_{i}",
        "age": 20 + i,
        "json_field": {"category": f"cat_{i%3}", "value": i * 10},
        "array_field": [i, i + 1, i + 2],
        "embedding": [random.random() for _ in range(8)]
    } for i in range(10)
]
milvus_client.insert(collection_name, data_to_insert)
print(f"Inserted {len(data_to_insert)} entities.")

# 6. Query data (from Milvus, synchronized to PostgreSQL)
# Wait a bit for synchronization if operations are very fast
time.sleep(1) 
query_res = milvus_client.query(collection_name, filter_expression="age > 25")
print("Query results (age > 25):")
for entity in query_res:
    print(entity)

# 7. Delete data
ids_to_delete = [0, 1, 2]
milvus_client.delete(collection_name, ids=ids_to_delete)
print(f"Deleted entities with IDs: {ids_to_delete}")

# 8. Upsert data
data_to_upsert = [
    {
        "id": i,
        "name": f"updated_item_{i}",
        "age": 30 + i,
        "json_field": {"category": f"cat_updated_{i%3}", "value": i * 100},
        "array_field": [i*2, i*2 + 1, i*2 + 2],
        "embedding": [random.random() for _ in range(8)]
    } for i in range(3, 7) # Upserting IDs 3,4,5,6 (some new, some existing)
]
milvus_client.upsert(collection_name, data_to_upsert)
print(f"Upserted {len(data_to_upsert)} entities.")

# 9. Export data (from PostgreSQL)
# Wait for sync
time.sleep(1)
exported_data = milvus_client.export(collection_name)
print(f"Exported data from PostgreSQL for collection '{collection_name}':")
for row in exported_data:
    print(row)

# Clean up (optional)
# milvus_client.drop_collection(collection_name)

print("Demo finished.")

License

This project is licensed under the MIT License. See the LICENSE file for details (if one exists, otherwise specified in pyproject.toml).

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

Project details

Release history Release notifications | RSS feed

0.1.5rc5 pre-release

Jul 23, 2025

0.1.5rc4 pre-release

Jul 23, 2025

0.1.5rc3 pre-release

Jul 23, 2025

0.1.5rc2 pre-release

Jul 23, 2025

0.1.5rc1 pre-release

Jul 21, 2025

0.1.4

Jul 17, 2025

0.1.4rc8 pre-release

Jul 4, 2025

0.1.4rc7 pre-release

Jul 4, 2025

0.1.4rc6 pre-release

Jul 3, 2025

0.1.4rc5 pre-release

Jul 3, 2025

0.1.4rc4 pre-release

Jul 3, 2025

0.1.4rc3 pre-release

Jul 3, 2025

This version

0.1.4rc2 pre-release

Jul 3, 2025

0.1.4rc1 pre-release

Jul 2, 2025

0.1.3

Jun 23, 2025

0.1.3rc3 pre-release

Jun 20, 2025

0.1.3rc2 pre-release

Jun 19, 2025

0.1.3rc1 pre-release

Jun 18, 2025

0.1.2

Jun 17, 2025

0.1.1

Jun 12, 2025

0.1.0

Jun 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymilvus_pg-0.1.4rc2.tar.gz (25.7 kB view details)

Uploaded Jul 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pymilvus_pg-0.1.4rc2-py3-none-any.whl (25.5 kB view details)

Uploaded Jul 3, 2025 Python 3

File details

Details for the file pymilvus_pg-0.1.4rc2.tar.gz.

File metadata

Download URL: pymilvus_pg-0.1.4rc2.tar.gz
Upload date: Jul 3, 2025
Size: 25.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: pdm/2.22.3 CPython/3.13.4 Darwin/22.6.0

File hashes

Hashes for pymilvus_pg-0.1.4rc2.tar.gz
Algorithm	Hash digest
SHA256	`dec5ec7f0b710662928b513180132821c943777d18c7ae313b00b3e613eca834`
MD5	`9d0391a40403c9966599a944bd2ff858`
BLAKE2b-256	`a4b2ca3dffe3618e1fe5832e6acaf49a14fca5c57a2eabd92a6467f937295c8c`

See more details on using hashes here.

File details

Details for the file pymilvus_pg-0.1.4rc2-py3-none-any.whl.

File metadata

Download URL: pymilvus_pg-0.1.4rc2-py3-none-any.whl
Upload date: Jul 3, 2025
Size: 25.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: pdm/2.22.3 CPython/3.13.4 Darwin/22.6.0

File hashes

Hashes for pymilvus_pg-0.1.4rc2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0b72d5294c73b783e7fcc010d6a714496bd1ea85f5c996797282185d7777924b`
MD5	`768fadb87cda35e068170499f51af9c9`
BLAKE2b-256	`3210dc2f19e66c890eef906453ce981967ce32e36d65f012005d0656f0da3bed`

See more details on using hashes here.

pymilvus-pg 0.1.4rc2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

PyMilvus PostgreSQL

Features

Installation

Usage

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes