Skip to main content

A tiny and fast local vector database in Python.

Project description

picovdb


An extremely fast, ultra-lightweight local vector database in Python.

"extremely fast": sub-millisecond query

"ultra-lighweight": One file with only Numpy and one optional dependency faiss-cpu.

Install

pip install picovdb

Usage

Create a db:

(Use SentenceTransformer embedding as example)

from sentence_transformers import SentenceTransformer
from picovdb import PicoVectorDB

CHUNK_SIZE = 256
model = SentenceTransformer('all-MiniLM-L6-v2')
dim = model.get_sentence_embedding_dimension()

with open('A_Christmas_Carol.txt', encoding='UTF8') as f:
    content = f.read()
    num_chunks = len(content) // CHUNK_SIZE + 1
    chunks = [content[i * CHUNK_SIZE: (i + 1) * CHUNK_SIZE] for i in range(num_chunks)]
    embeddings = model.encode(chunks)
    data = [
        {
            "_vector_": embeddings[i],
            "_id_": i,
            "content": chunks[i],
        }
        for i in range(num_chunks)
    ]
    db = PicoVectorDB(embedding_dim=dim, storage_file='_acc')
    db.upsert(data)
    db.save()

Query

db = PicoVectorDB(embedding_dim=dim, storage_file='_acc')
txt = "Are there no prisons? Are there no workhouses?"
emb = model.encode(txt)
q = db.query(emb, top_k=3)
print('query results:', q)

Benchmark

Embedding Dim: 1024.

Hardware: M3 MacBook Air

  1. Pure Python:

    • Inserting 100,000 vectors took about 0.5s
    • Doing 100 queries from 100,000 vectors took roughly 0.6s (0.006s per quiry).
  2. With FAISS(cpu):

    • Inserting 100,000 vectors took 110s
    • Doing 100 queries from 100,000 vectors took 0.05s (0.0005s or 0.5 millisecond per quiry).
    • Doing 1000 queries from 100,000 vectors in batch mode took 0.2s (0.0002s or 0.2 millisecond per quiry).

Hardware: PC with CPU Core i7-12700k and old-gen M2 Nvme SSD

  1. Pure Python:

    • Inserting 100,000 vectors took about 0.7s
    • Doing 100 queries from 100,000 vectors took roughly 1.3s (0.013s per quiry).
  2. With FAISS(cpu):

    • Inserting 100,000 vectors took 50s
    • Doing 100 queries from 100,000 vectors took 0.05s (0.0005s or 0.5 millisecond per quiry).
    • Doing 1000 queries from 100,000 vectors in batch mode took 0.3s (0.0003s or 0.3 millisecond per quiry).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

picovdb-0.1.2.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

picovdb-0.1.2-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file picovdb-0.1.2.tar.gz.

File metadata

  • Download URL: picovdb-0.1.2.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.10.12 Linux/5.15.0-135-generic

File hashes

Hashes for picovdb-0.1.2.tar.gz
Algorithm Hash digest
SHA256 1e12fecc73a4a3c95ecf6b9ec7aeb1bd5798e58b641c57b8815cc87b2955edd6
MD5 4f8620c5951d852db2660b4a15f9624d
BLAKE2b-256 66e9943e1b0bbbd9e010dffa87263a4b542811dc8fcb3e70b08cf996f921b8ef

See more details on using hashes here.

File details

Details for the file picovdb-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: picovdb-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.10.12 Linux/5.15.0-135-generic

File hashes

Hashes for picovdb-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fe766a4e61b9d2a4f79e5beb706f87110ae00192bacf284ee7d24d195e38e31b
MD5 e3bc5b676692c40adfec6e2476356143
BLAKE2b-256 0cac0c97db36be7d0f97d870babf498db26c9a5e0d4088df8757e0fdb427cb50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page