Skip to main content

No project description provided

Project description

picovdb


An extremely fast, ultra-lightweight local vector database in Python.

"extremely fast": sub-millisecond query

"ultra-lighweight": One file with Numpy and one optional dependency faiss-cpu.

Install

pip install picovdb

Usage

Create a db:

(Use SentenceTransformer embedding as example)

from sentence_transformers import SentenceTransformer
from picovdb import PicoVectorDB

CHUNK_SIZE = 256
model = SentenceTransformer('all-MiniLM-L6-v2')
dim = model.get_sentence_embedding_dimension()

with open('A_Christmas_Carol.txt', encoding='UTF8') as f:
    content = f.read()
    num_chunks = len(content) // CHUNK_SIZE + 1
    chunks = [content[i * CHUNK_SIZE: (i + 1) * CHUNK_SIZE] for i in range(num_chunks)]
    embeddings = model.encode(chunks)
    data = [
        {
            "_vector_": embeddings[i],
            "_id_": i,
            "content": chunks[i],
        }
        for i in range(num_chunks)
    ]
    db = PicoVectorDB(embedding_dim=dim, storage_file='_acc')
    db.upsert(data)
    db.save()

Query

db = PicoVectorDB(embedding_dim=dim, storage_file='_acc')
txt = "Are there no prisons? Are there no workhouses?"
emb = model.encode(txt)
q = db.query(emb, top_k=3)
print('query results:', q)

Benchmark

Embedding Dim: 1024.

Hardware: M3 MacBook Air

  1. Pure Python:

    • Inserting 100,000 vectors took about 0.5s
    • Doing 100 queries from 100,000 vectors took roughly 0.6s (0.006s per quiry).
  2. With FAISS(cpu):

    • Inserting 100,000 vectors took 110s
    • Doing 100 queries from 100,000 vectors took 0.05s (0.0005s or 0.5 millisecond per quiry).
    • Doing 1000 queries from 100,000 vectors in batch mode took 0.2s (0.0002s or 0.2 millisecond per quiry).

Hardware: PC with CPU Core i7-12700k and old-gen M2 Nvme SSD

  1. Pure Python:

    • Inserting 100,000 vectors took about 0.7s
    • Doing 100 queries from 100,000 vectors took roughly 1.3s (0.013s per quiry).
  2. With FAISS(cpu):

    • Inserting 100,000 vectors took 50s
    • Doing 100 queries from 100,000 vectors took 0.05s (0.0005s or 0.5 millisecond per quiry).
    • Doing 1000 queries from 100,000 vectors in batch mode took 0.3s (0.0003s or 0.3 millisecond per quiry).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

picovdb-0.1.1.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

picovdb-0.1.1-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file picovdb-0.1.1.tar.gz.

File metadata

  • Download URL: picovdb-0.1.1.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.10.12 Linux/5.15.0-135-generic

File hashes

Hashes for picovdb-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0939fa40f6bded3a87cb7ad01e61158078280641975cc5e265761b811c3746bf
MD5 cae96ec8a0dd25d5770b8fc2bbfcf82f
BLAKE2b-256 332cdf02c41eafc8dd9c2dc11200764c112b95940466bfaa7c887aa6be6dadf2

See more details on using hashes here.

File details

Details for the file picovdb-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: picovdb-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.10.12 Linux/5.15.0-135-generic

File hashes

Hashes for picovdb-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fe341abf87cf21aee79e2a1a20185e104ce3359d00a423ef05d7854c54634117
MD5 a9db3c491333fb7391d1dd74ff06de9f
BLAKE2b-256 bd268b419f2dfac861aaffc53d105a940c57478523a5d096ba9fe03f3c84f793

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page