Skip to main content

A simple, easy-to-hack Vector Database implementation

Project description

nano-VectorDB

A simple, easy-to-hack Vector Database

🌬️ A vector database implementation with single-dependency (numpy).

🎁 It can handle a query from 100,000 vectors and return in 100 milliseconds.

🏃 It's okay for your prototypes, maybe even more.

🏃 Support naive multi-tenancy.

Install

Install from PyPi

pip install nano-vectordb

Install from source

# clone this repo first
cd nano-vectordb
pip install -e .

Quick Start

Faking your data:

from nano_vectordb import NanoVectorDB
import numpy as np

data_len = 100_000
fake_dim = 1024
fake_embeds = np.random.rand(data_len, fake_dim)    

fakes_data = [{"__vector__": fake_embeds[i], **ANYFIELDS} for i in range(data_len)]

You can add any fields to a data. But there are two keywords:

  • __id__: If passed, NanoVectorDB will use your id, otherwise a generated id will be used.
  • __vector__: must pass, your embedding np.ndarray.

Init a DB

vdb = NanoVectorDB(fake_dim, storage_file="fool.json")

Next time you init vdb from fool.json, NanoVectorDB will load the index automatically.

Upsert

r = vdb.upsert(fakes_data)
print(r["update"], r["insert"])

Query

# query with embedding 
vdb.query(np.random.rand(fake_dim))

# arguments:
vdb.query(np.random.rand(fake_dim), top_k=5, better_than_threshold=0.01)

Conditional filter

vdb.query(np.random.rand(fake_dim), filter_lambda=lambda x: x["any_field"] == "any_value")

Save

# will create/overwrite 'fool.json'
vdb.save()

Get, Delete

# get and delete the inserted data
print(vdb.get(r["insert"]))
vdb.delete(r["insert"])

Additional Data

vdb.store_additional_data(a=1, b=2, c=3)
print(vdb.get_additional_data())

Multi-Tenancy

If you have multiple vectorDB to use, you can use MultiTenantNanoVDB to manage:

from nano_vectordb import NanoVectorDB, MultiTenantNanoVDB

multi_tenant = MultiTenantNanoVDB(1024)
tenant_id = multi_tenant.create_tenant()

# tenant is a NanoVectorDB, you can upsert, query, get... on this.
tenant: NanoVectorDB = multi_tenant.get_tenant(tenant_id)

# some chores:
multi_tenant.delete_tenant(tenant_id)
multi_tenant.contain_tenant(tenant_id)

# save it
multi_tenant.save()

MultiTenantNanoVDB use a queue to manage the total vector dbs in memory, you can adjust the parameter:

# There will be only `max_capacity` NanoVectorDB in the memory.
multi_tenant = MultiTenantNanoVDB(1024, max_capacity=1)

Benchmark

Embedding Dim: 1024. Device: MacBook M3 Pro

  • Save a index with 100,000 vectors will generate a roughly 520M json file.
  • Insert 100,000 vectors will cost roughly 2s
  • Query from 100,000 vectors will cost roughly 0.1s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nano_vectordb-0.0.4.3.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

nano_vectordb-0.0.4.3-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file nano_vectordb-0.0.4.3.tar.gz.

File metadata

  • Download URL: nano_vectordb-0.0.4.3.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for nano_vectordb-0.0.4.3.tar.gz
Algorithm Hash digest
SHA256 3d13074476f2b739e51261974ed44aa467725579966219734c03502c929ed3b5
MD5 24cbf8f8f34b058754901c9ecd570587
BLAKE2b-256 cbffed9ff1c4e5b0418687c17d02fdc453c212e7550c62622914ba0243c106bc

See more details on using hashes here.

File details

Details for the file nano_vectordb-0.0.4.3-py3-none-any.whl.

File metadata

File hashes

Hashes for nano_vectordb-0.0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1b70401a54c02fabf76515b5dfb630076434547ed3c6861828ee8771b6dd7c19
MD5 89a2412ad1d2705125ad9e4b839db010
BLAKE2b-256 9bd8f1876f59916da0a2147e63066650c46bf7992828a9e92f1b4e3b695f1fb0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page