Skip to main content

A simple, easy-to-hack Vector Database implementation

Project description

nano-VectorDB

A simple, easy-to-hack Vector Database

🌬️ A vector database implementation with single-dependency (numpy).

🎁 It can handle a query from 100,000 vectors and return in 100 milliseconds.

🏃 It's okay for your prototypes, maybe even more.

Install

Install from PyPi

pip install nano-vectordb

Install from source

# clone this repo first
cd nano-vectordb
pip install -e .

Quick Start

Faking your data:

from nano_vectordb import NanoVectorDB
import numpy as np

data_len = 100_000
fake_dim = 1024
fake_embeds = np.random.rand(data_len, fake_dim)    

fakes_data = [{"__vector__": fake_embeds[i], **ANYFIELDS} for i in range(data_len)]

You can add any fields to a data. But there are two keywords:

  • __id__: If passed, NanoVectorDB will use your id, otherwise a generated id will be used.
  • __vector__: must pass, your embedding np.ndarray.

Init a DB:

vdb = NanoVectorDB(fake_dim, storage_file="fool.json")

Next time you init vdb from fool.json, NanoVectorDB will load the index automatically.

Upsert:

r = vdb.upsert(fakes_data)
print(r["update"], r["insert"])

Query:

print(vdb.query(np.random.rand(fake_dim)))

Save:

# will create/overwrite 'fool.json'
vdb.save()

Get, Delete:

# get and delete the inserted data
print(vdb.get(r["insert"]))
vdb.delete(r["insert"])

Benchmark

Embedding Dim: 1024. Device: MacBook M3 Pro

  • Save a index with 100,000 vectors will generate a roughly 520M json file.
  • Insert 100,000 vectors will cost roughly 2s
  • Query from 100,000 vectors will cost roughly 0.1s

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nano_vectordb-0.0.4.1.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

nano_vectordb-0.0.4.1-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file nano_vectordb-0.0.4.1.tar.gz.

File metadata

  • Download URL: nano_vectordb-0.0.4.1.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for nano_vectordb-0.0.4.1.tar.gz
Algorithm Hash digest
SHA256 24f4c6cb35f6300a6c5ce344ea8deb6b016e0d88c1b4b6b1fc18d88a8fb64fac
MD5 b76b2294f8b1ee45dc4f10b003fcefcc
BLAKE2b-256 c0bf4296b9e304670fbeae76ffba56f7687b9284e3ee3869deb3eafd1c7cdc22

See more details on using hashes here.

File details

Details for the file nano_vectordb-0.0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for nano_vectordb-0.0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d646d885687ce70fd7c010172ec05f8deccc1a15f0fe3b4d65e1149dea196795
MD5 6c107e1ebd5c4259328eb6cad1aeaddf
BLAKE2b-256 dc2d22f3a6baca6a3c53bbc8758d4cc898e6820413ebb4c204c82424d8368790

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page