MinVectorDB is a simple vector storage and query database implementation, providing clear and concise Python APIs aimed at lowering the barrier to using vector databases.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

MinVectorDB is a simple vector storage and query database implementation, providing clear and concise Python APIs aimed at lowering the barrier to using vector databases. More practical features will be added in the future.

It is important to note that MinVectorDB is not designed for efficiency and thus does not include built-in algorithms like approximate nearest neighbors for efficient searching.

It originated from the author's need to demonstrate a large language model demo, designed for 100% recall.

Additionally, it has not undergone rigorous code testing, so caution is advised when using it in a production environment.

MinVectorDB 是简易实现的向量存储和查询数据库，提供简洁明了的python API，旨在降低向量数据库的使用门槛。未来将添加更多实用功能。

需要注意的是，MinVectorDB并非为追求效率而生，因此，并没有内置近似最近邻等高效查找算法。

它起源于作者需要演示大语言模型Demo的契机，为了追求100%召回率而设计，此外，也没有经过严格的代码测试，因此如果将其用于生产环境需要特别谨慎。

Install

pip install MinVectorDB

Quick start

Demo 1-2

from IPython.display import display_markdown

import numpy as np

from spinesUtils.utils import Timer
from min_vec import MinVectorDB

timer = Timer()

# ===================================================================
# ========================= DEMO 1 ==================================
# ===================================================================
# Demo 1 -- Sequentially add vectors.
# Create a MinVectorDB instance.
display_markdown("*Demo 1* -- **Sequentially add vectors**", raw=True)

timer.start()
db = MinVectorDB(dim=1024, database_path='test.mvdb', chunk_size=100)

np.random.seed(23)

# Define the initial ID.
id = 0
for t in np.random.random((1000, 1024)):
    # Vectors need to be normalized before writing to the database.
    t = t / np.linalg.norm(t)
    db.add_item(t, id=id)

    # ID increments by 1 with each loop iteration.
    id += 1
db.commit()
print(f"\n* [Insert data] Time cost {timer.last_timestamp_diff():>.4f} s.")
timer.middle_point()

res = db.query(db.head(10)[0], k=10)
print("  - Query vector: ", db.head(10)[0])
print("  - Database index of top 10 results: ", res[0])
print("  - Cosine similarity of top 10 results: ", res[1])
print(f"\n* [Query data] Time cost {timer.last_timestamp_diff():>.4f} s.")
timer.middle_point()

# For demonstrating Demo2, the database created in Demo1 needs to be deleted, but this operation is not required in actual use.
db.delete()

del db

display_markdown("------", raw=True)

# ===================================================================
# ========================= DEMO 2 ==================================
# ===================================================================
# Demo 2 -- Bulk add vectors.
display_markdown("*Demo 2* -- **Bulk add vectors**", raw=True)
# print("# This is the demonstration area for Demo 2 -- Bulk add vectors.")

timer.middle_point()

db = MinVectorDB(dim=1024, database_path='test.mvdb', chunk_size=100)

np.random.seed(23)

# Define the initial ID.
id = 0
vectors = []
for t in np.random.random((1000, 1024)):
    # Vectors need to be normalized before writing to the database.
    t = t / np.linalg.norm(t)
    vectors.append((t, id))
    # ID increments by 1 with each loop iteration.
    id += 1

db.bulk_add_items(vectors)
db.commit()

print(f"\n* [Insert data] Time cost {timer.last_timestamp_diff():>.4f} s.")
timer.middle_point()

res = db.query(db.head(10)[0], k=10)
print("  - Query vector: ", db.head(10)[0])
print("  - Database index of top 10 results: ", res[0])
print("  - Cosine similarity of top 10 results: ", res[1])
print(f"\n* [Query data] Time cost {timer.last_timestamp_diff():>.4f} s.")

timer.end()
# This operation is not required in actual use.
db.delete()

Demo 3

import numpy as np
from IPython.display import display_markdown

from spinesUtils.utils import Timer
from min_vec import MinVectorDB

timer = Timer()

# ===================================================================
# ========================= DEMO 3 ==================================
# ===================================================================
# Demo 3 -- Use field to improve Searching Recall
display_markdown("*Demo 3* -- **Use field to improve Searching Recall**", raw=True)

timer.start()

db = MinVectorDB(dim=1024, database_path='test.mvdb', chunk_size=100)

np.random.seed(23)

# Define the initial ID.
id = 0
vectors = []
for t in np.random.random((1000, 1024)):
    # Vectors need to be normalized before writing to the database.
    t = t / np.linalg.norm(t)
    vectors.append((t, id, 'test_' + str(id // 100)))
    # ID increments by 1 with each loop iteration.
    id += 1

db.bulk_add_items(vectors)
db.commit()

print(f"\n* [Insert data] Time cost {timer.last_timestamp_diff():>.4f} s.")
timer.middle_point()

res = db.query(db.head(10)[0], k=10, field=['test_0', 'test_3'])
print("  - Query vector: ", db.head(10)[0])
print("  - Database index of top 10 results: ", res[0])
print("  - Cosine similarity of top 10 results: ", res[1])
print(f"\n* [Query data] Time cost {timer.last_timestamp_diff():>.4f} s.")

timer.end()
db.delete()

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.3.5

May 10, 2024

0.3.4

May 9, 2024

0.3.3

May 8, 2024

0.3.2

Apr 26, 2024

0.3.1

Apr 24, 2024

0.3.0

Apr 23, 2024

0.2.7

Apr 17, 2024

0.2.6

Apr 16, 2024

0.2.5

Apr 15, 2024

0.2.4

Apr 2, 2024

0.2.3

Mar 7, 2024

0.2.2

Feb 26, 2024

0.2.1

Feb 23, 2024

0.2.0

Feb 23, 2024

0.1.5

Jan 29, 2024

0.1.4

Jan 29, 2024

0.1.3

Jan 26, 2024

0.1.2

Jan 25, 2024

0.1.1

Jan 25, 2024

0.1.0

Jan 16, 2024

0.0.13

Jan 8, 2024

0.0.12

Jan 7, 2024

0.0.11

Jan 7, 2024

0.0.10

Jan 5, 2024

0.0.9

Jan 2, 2024

0.0.8

Dec 22, 2023

0.0.7

Dec 21, 2023

0.0.6

Dec 19, 2023

0.0.5

Dec 19, 2023

0.0.4

Dec 18, 2023

This version

0.0.3

Dec 17, 2023

0.0.2

Dec 17, 2023

0.0.1

Dec 17, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MinVectorDB-0.0.3.tar.gz (11.9 kB view hashes)

Uploaded Dec 17, 2023 Source

Built Distribution

MinVectorDB-0.0.3-py3-none-any.whl (12.6 kB view hashes)

Uploaded Dec 17, 2023 Python 3

Hashes for MinVectorDB-0.0.3.tar.gz

Hashes for MinVectorDB-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`09a4dff6f6e1e2a1c1e29e1df7c7d3e132a843390b8e6197d65828663c462db5`
MD5	`9e36c76755527847f67a0d78e0b34d91`
BLAKE2b-256	`eed50596764007c276487201fc63c9829a08a4c383cb35620aae323d417e79a5`

Hashes for MinVectorDB-0.0.3-py3-none-any.whl

Hashes for MinVectorDB-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d9890adc32470445040b8c2ecd9ab5debb48d2d061ddfa24232f930607273faa`
MD5	`52302763f8ec563f92908b67ac7eba60`
BLAKE2b-256	`3404be158052d6910a6dda930eedf1df7c46cd89274de8c497d4a18484891406`