Skip to main content

A tiny and fast local vector database in Python.

Project description

picovdb


An extremely fast, ultra-lightweight local vector database in Python.

"extremely fast": sub-millisecond query

"ultra-lighweight": One file with only Numpy and one optional dependency faiss-cpu.

Install

pip install picovdb

Usage

Create a db:

(Use SentenceTransformer embedding as example)

from sentence_transformers import SentenceTransformer
from picovdb import PicoVectorDB

CHUNK_SIZE = 256
model = SentenceTransformer('all-MiniLM-L6-v2')
dim = model.get_sentence_embedding_dimension()

with open('A_Christmas_Carol.txt', encoding='UTF8') as f:
    content = f.read()
    num_chunks = len(content) // CHUNK_SIZE + 1
    chunks = [content[i * CHUNK_SIZE: (i + 1) * CHUNK_SIZE] for i in range(num_chunks)]
    embeddings = model.encode(chunks)
    data = [
        {
            "_vector_": embeddings[i],
            "_id_": i,
            "content": chunks[i],
        }
        for i in range(num_chunks)
    ]
    db = PicoVectorDB(embedding_dim=dim, storage_file='_acc')
    db.upsert(data)
    db.save()

Query

db = PicoVectorDB(embedding_dim=dim, storage_file='_acc')
txt = "Are there no prisons? Are there no workhouses?"
emb = model.encode(txt)
q = db.query(emb, top_k=3)
print('query results:', q)

Benchmark

Embedding Dim: 1024.

Environment: M3 MacBook Air

  1. Pure Python:

    • Inserting 100,000 vectors took about 0.5s
    • Doing 100 queries from 100,000 vectors took roughly 0.8s (0.008s per quiry).
    • Doing 1000 queries from 100,000 vectors in batch mode took 1.0s (0.001s or 1 millisecond per quiry).
  2. With FAISS(cpu):

    • Inserting 100,000 vectors took 110s
    • Doing 100 queries from 100,000 vectors took 0.04s (0.0004s or 0.4 millisecond per quiry).
    • Doing 1000 queries from 100,000 vectors in batch mode took 0.1s (0.0001s or 0.1 millisecond per quiry).

Environment: Windows PC with CPU Core i7-12700k and old-gen M2 Nvme SSD

  1. Pure Python:

    • Inserting 100,000 vectors took about 0.7s
    • Doing 100 queries from 100,000 vectors took roughly 1.5s (0.015s per quiry).
    • Doing 1000 queries from 100,000 vectors in batch mode took 1.0s (0.001s or 1 millisecond per quiry).
  2. With FAISS(cpu):

    • Inserting 100,000 vectors took 50s
    • Doing 100 queries from 100,000 vectors took 0.04s (0.0004s or 0.4 millisecond per quiry).
    • Doing 1000 queries from 100,000 vectors in batch mode took 0.16s (0.00016s or 0.16 millisecond per quiry).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

picovdb-0.2.0.tar.gz (13.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

picovdb-0.2.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file picovdb-0.2.0.tar.gz.

File metadata

  • Download URL: picovdb-0.2.0.tar.gz
  • Upload date:
  • Size: 13.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.12.5 Linux/5.15.0-153-generic

File hashes

Hashes for picovdb-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d510f8616ae1c435578db4db3725c2d7784f194767f51b09b3bfcde716ce0e83
MD5 b9c34fa310897b6fd94973fba711e105
BLAKE2b-256 f84e26708d6fa10f26852f73e370a75f2d8ebc13bac6440445933cdce1dcd8de

See more details on using hashes here.

File details

Details for the file picovdb-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: picovdb-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.12.5 Linux/5.15.0-153-generic

File hashes

Hashes for picovdb-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9faf1b2c203301d46f5b7a0ffeba8c45920a402b89d3f73b063f363623d558db
MD5 e2cde5d315261ae042faa1ac4ab34bc7
BLAKE2b-256 f60a57edee8d3b1bae5aea34e30742b736b360403585c7c81d09665e74d7c932

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page